-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitwise operations #4733
Bitwise operations #4733
Conversation
Let us know when you want reviews, or comment on things that you're unsure about! |
I'm getting line-length related blowups here. I used @michaelpj: Am I missing something here? I've literally has |
I'm also getting utterly bizarre linker errors on MinGW:
This also never used to be a problem. @michaelpj - did something change? |
The line length check is new. We had it in editorconfig before, but now it's enforced. Indeed, We're going to try and ratchet towards general conformance, but a lot of files are ignored. I would be fine with you including your files in that, just add
That looks like you're building GHC. Which looks like it's because you've updated both |
@michaelpj - the problem with the line length check is that import Data.Bits (FiniteBits, bit, complement, popCount, rotate, shift, shiftL, xor, zeroBits, (.&.), (.|.)) This is over 100 characters wide. If I reformat it to this (which is within an 80-character width limit): import Data.Bits (
FiniteBits, bit, complement, popCount, rotate, shift, shiftL, xor, zeroBits,
(.&.), (.|.)
) When I run import Data.Bits (FiniteBits, bit, complement, popCount, rotate, shift, shiftL, xor, zeroBits, (.&.), (.|.)) This is fundamentally a problem with I can certainly ignore the checker as you suggested, but this is a lurking problem that won't go away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at the modules that were reasonably reviewable; the implementation module and the test module need some comments before they're reviewable.
import System.IO.Unsafe (unsafeDupablePerformIO) | ||
|
||
{-# NOINLINE rotateByteString #-} | ||
rotateByteString :: ByteString -> Integer -> ByteString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haddock, and throughout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably include a significant fraction of the explanation about behaviour and semantics from the CIP in comments in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you've put it in PlutusTx.Builtins
. This is a bit awkward, we'd really like the haddock in multiple places. I'm not sure where the best place to put it is. But I do think that we need enough explanation in this file to make it independently comprehensible and maintainable, which it currently isn't. We can probably be shorter, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We can do this in two ways:
- Duplicate (with some simplification) the Haddocks in
plutus-tx
; or - Link to the CIP.
Happy to do either.
{-# LANGUAGE TypeApplications #-} | ||
{-# OPTIONS_GHC -Werror #-} | ||
|
||
module Bitwise ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file has one comment. It's full of fiddly manipulation of bits and unsafe operations. We regard commenting the code properly as important, and reviewing this will take me several times as long if there are no explanations.
Please write some explanations, and then I'll take a look at this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These have now been added. I also refactored some of the code to hopefully make it easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just started looking at this and it's enormously better, thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another high level wonder: perhaps this is something that should live in its own package. It's quite nice and independently useful to have bitwise operations on bytestrings in this way.
(Also linking to haskell/bytestring#383 for reference)
plutus-tx/src/PlutusTx/Builtins.hs
Outdated
-- Furthermore, the length of any result of 'integerToByteString' must be | ||
-- strictly positive: | ||
-- | ||
-- @'greaterThanInteger' ('lengthByteString' '.' 'integerToByteString' i) 0@ @=@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I think this section would be clearer if you just used the mathematical notation. I appreciate that you're trying to use the actual functions, which is nice in a way but I think overall less clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is probably right. I wasn't sure which way to go: I erred on the side of using the functions because I figured someone reading this documentation will have those definitions 'close by'.
plutus-tx/src/PlutusTx/Builtins.hs
Outdated
-- * /Decomposition/: Let @i, j :: 'Integer'@ such that either at least one of | ||
-- @i@, @j@ is zero or @i@ and @j@ have the same sign. Then @'shiftByteString' | ||
-- bs ('addInteger' i j)@ @=@ @'shiftByteString' ('shiftByteString' bs i) j@ | ||
-- * /Erasure/: If @greaterThanEqualsInteger ('absInteger' i) '.' 'lengthByteString' '$' bs@, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- * /Erasure/: If @greaterThanEqualsInteger ('absInteger' i) '.' 'lengthByteString' '$' bs@, | |
-- * /Erasure/: If @'greaterThanEqualsInteger' ('absInteger' i) '.' 'lengthByteString' '$' bs@, |
plutus-tx/src/PlutusTx/Builtins.hs
Outdated
-- * /Decomposition/: @'rotateByteString' bs ('addInteger' i j)@ @=@ | ||
-- @'rotateByteString' ('rotateByteString' bs i) j@ | ||
-- * /Wraparound/: Let @i :: Integer@ be nonzero. Then @'rotateByteString' bs i@ | ||
-- @=@ @'rotateByteString' bs ('remainderInteger' i ('timesInteger' 8 '.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- @=@ @'rotateByteString' bs ('remainderInteger' i ('timesInteger' 8 '.' | |
-- @=@ @'rotateByteString' bs ('remainderInteger' i ('multiplyInteger' 8 '.' |
plutus-tx/src/PlutusTx/Builtins.hs
Outdated
-- * /Identity/: @'rotateByteString' bs 0@ @=@ @bs@ | ||
-- * /Decomposition/: @'rotateByteString' bs ('addInteger' i j)@ @=@ | ||
-- @'rotateByteString' ('rotateByteString' bs i) j@ | ||
-- * /Wraparound/: Let @i :: Integer@ be nonzero. Then @'rotateByteString' bs i@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- * /Wraparound/: Let @i :: Integer@ be nonzero. Then @'rotateByteString' bs i@ | |
-- * /Wraparound/: Let @i :: 'Integer'@ be nonzero. Then @'rotateByteString' bs i@ |
plutus-tx/src/PlutusTx/Builtins.hs
Outdated
@@ -202,6 +215,276 @@ verifySchnorrSecp256k1Signature | |||
verifySchnorrSecp256k1Signature vk msg sig = | |||
fromBuiltin (BI.verifySchnorrSecp256k1Signature vk msg sig) | |||
|
|||
-- | Converts an 'Integer' into its 'BuiltinByteString' representation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good haddock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked at this quite carefully now and it generally looks pretty good: I think it needs a bit of finishing off, but nothing too major. I'll repeat a couple of earlier comments.
-
I think we really do need both big and little-endian bytestring/integer conversions. My initial thought was to add a
reverseByteString
operation, but having looked at the implementationd (which are simpler than I was expecting because they're able to delegate a lot of work to existing library functions) I now wonder if adding a boolean flag to select endianness would be better. I did some crude benchmarking experiments that suggested that reversal imposes quite a lot of overhead. -
I think we also need some means of padding bytestrings to a given length. We should do some benchmarking to see how adding a function for doing that compares with doing it manually using the existing bytestring operations.
In addition, we should have some conformance tests for these operations in plutus-conformance
, like these ones here. Currently we only have fairly simple unit tests: adding new ones is straightforward but tedious.
It's a bit late in the day, but we're wondering if we should have some realistic test case that makes heavy use of the bitwise operations to see how feasible it would be to use them for genuine on-chain operations, maybe some cryptographic or ZK-related application. To get real costs for such a program and see if fits into the on-chain limits we'd have to generate complete cost models for all of the functions first (which we have to do anyway), so it's not something we could do right away. I wonder what the impact of doing lots of bit-writing instructions would be, for example: immutability means that you have to construct a whole new bytestring for every bit that you change, so if you had a big bytestring that you needed to do hundreds of bit operations on the CPU and memory costs would soon mount up. It's possible that the costs might be prohibitive for large applications.
toBuiltinMeaning _ver IntegerToByteString = | ||
makeBuiltinMeaning integerToByteStringPlc mempty | ||
where | ||
integerToByteStringPlc :: SomeConstant uni Integer -> EvaluationResult BS.ByteString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be replaced by
integerToByteStringPlc n =
case integerToByteString n of
Just bs -> EvaluationSuccess bs
Nothing -> EvaluationFailure
{-# INLINE integerToByteStringPlc #-}
However, if you made integerToByteString
return an Emitter
(like many of the other bytestring functions) instead of Maybe
then that wouldn't be necessary; I think you could just have integerToByteString
here.
complementByteString :: BuiltinByteString -> BuiltinByteString | ||
complementByteString = BI.complementByteString | ||
|
||
-- | Shifts the input bytestring left by the specified (possibly negative) amount. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A brief example might be helpful here to make the behaviour clear ("shifting left by a possibly negative amount" is maybe a bit confusing).
complementByteString = BI.complementByteString | ||
|
||
-- | Shifts the input bytestring left by the specified (possibly negative) amount. | ||
-- If positive, shifts left/to higher significance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit dubious about the use of the word "significance" here. I think that's only really meaningful when you're interpreting a string of bits as a number. If you're going to use a little-endian encoding for integers then more signifcant bytes are actually at the right of a bytetstring. Even worse, the bits inside a byte become more significant towards the left: in 0x01C0 considered as a little-endian encoding of an integer, bit 7 (from the right, counting from 0) is more significant than both bit 6 and bit 8!
shiftByteString :: BuiltinByteString -> Integer -> BuiltinByteString | ||
shiftByteString bs i = BI.shiftByteString bs (toBuiltin i) | ||
|
||
-- | Rotates the input bytestring left by the specified (possibly negative) amount. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again an illustrative example might be helpful.
@@ -85,7 +85,12 @@ builtinsIntroducedIn = Map.fromList [ | |||
]), | |||
-- Chang is protocolversion=8.0 | |||
((PlutusV2, changPV), Set.fromList [ | |||
VerifyEcdsaSecp256k1Signature, VerifySchnorrSecp256k1Signature | |||
VerifyEcdsaSecp256k1Signature, VerifySchnorrSecp256k1Signature, | |||
IntegerToByteString, ByteStringToInteger, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bytestring operations definitely shouldn't be in PlutusV2
since that's already been deployed (although maybe there was no other option when this was originally written). I think the best thing is to put them in PlutusV3
for the time being until we work out exactly what's getting released when.
rotateByteString :: ByteString -> Integer -> ByteString | ||
rotateByteString bs i | ||
-- If a ByteString is completely homogenous, rotating won't change it. This | ||
-- also covers emptiness, since empty ByteStrings are homogenous vacuously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also covers emptiness, since empty ByteStrings are homogenous vacuously
Thanks for pointing that out: I was wondering how length 0 was handled.
-- We compute the read similarly to how we determine the change when we write. | ||
-- The only difference is that the mask is used on the input to read it, rather | ||
-- than to modify anything. | ||
dangerousRead :: ByteString -> Integer -> Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume that what's dangerous about this is that you get an error if the index is out of bounds. Interestingly, you get "arithmetic overflow" if the index is negative, I think this is because then quotRem
gives you a negative smallOffset
and bit
doesn't like that. If you replace quotRem
by divMod
then you get an out of bounds error in all cases and the tests still pass. I'm not confident that that's the right thing to do though.
Anyway, we really don't want errors on the chain, so it would be worth saying what the preconditions for calling this safely are.
bitLen = fromIntegral $ BS.length bs * 8 | ||
addBit :: Int -> Integer -> Word8 -> Integer -> Word8 | ||
addBit start displacement acc offset = | ||
let oldIx = (offset + fromIntegral start + bitLen - displacement) `rem` bitLen in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm kind of thinking aloud here to convince myself that this is OK...)
The use of rem
rather than mod
had me worried here. oldIx
will always be less than bitLen
, so there's no danger of going past the end of the input. However, if the stuff inside the brackets happened to be negative then rem
would give us a negative index and dangerousRead
would crash. Looking carefully, I see that displacement
has already been reduced modulo bitLen
, so we're good (I hope).
|
||
-- | See 'PlutusTx.Builtins.integerToByteString. | ||
{-# INLINE integerToByteString #-} | ||
integerToByteString :: Integer -> Maybe ByteString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I 'm not very familiar with GHC.Num.Integer
, but I was hoping that we'd be able to use integerToAddr
here since (a) that would be more symmetric with byteStringToInteger
and (b) it would open up the possibility of adding an extra boolean parameter to both conversion functions to specify the endianness. The problem is finding an address to write the output bytestring to. I experimented by constructing a bytestring of zeros of length 1000 and then writing the result to that and trimming the extra zeros off the end. This appeared to work (the roundtrip tests passed) and increased the time by about 3-7% (inputs up to about 2000 decimal digits, outputs up to about 800 bytes) compared to the existing implementation. Telling it to use the big-endian encoding instead seemed to add up to another 10% on top (whereas sticking BS.reverse
in front of the implementation here increased the time by up to 50%, which is more than I was expecting). This was a pretty crude experiment and a proper implementation would require a lot more care, but it might be worth considering. There seem to be quite a lot of functions for doing this kind of thing and it's difficult to work out which would be best: I see that we also have exportIntegerToAddr
in GHC.Integer.GMP.Internals, for example.
-- memcpy, which is much faster, especially on larger ByteStrings. | ||
_ -> case displacement `quotRem` 8 of | ||
(bigMove, 0) -> do | ||
let mainLen :: CSize = fromIntegral $ bigMove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mainLen :: CSize = fromIntegral $ bigMove | |
let mainLen :: CSize = fromIntegral bigMove |
Note that Sundae would benefit dramatically from these bitwise operations; In particular, because transaction inputs get re-ordered, we need to provide an "indexing set" in the redeemer; but because of that, we need to check that each index is distinct, which involves an O(n^2) algorithm, and eats up a huge portion of our execution budget. With cheap bitwise operators, we could use a bit mask to track indexes we've seen before. |
@Quantumplation do you have a sketch of the code you would use? It's very important for us to actually try this out and make sure that the budget improvements do materialize! |
@michaelpj I can provide some psuedocode for what i'm doing now, how many execution units that costs, and some psuedocode for how I was thinking of using bitwise operators, but anything higher fidelity will have to wait until I have some spare time; Here's what I'm currently doing, more or less: pub fn has_in_first_n(items: List<Int>, elem: Int, idx: Int) -> Bool {
if idx == 0 {
False
} else {
when items is {
[] -> False,
[head, ..rest] -> {
if head == elem {
True
} else if idx == 1 {
False
} else {
has_in_first_n(items, elem, idx - 1)
}
}
}
}
}
pub fn get_sorted_orders(
indexing_set: List<Int>,
inputs: List<Input>,
) -> List<Output> {
list.indexed_map(
indexing_set,
fn(idx, elem) {
// Make sure the elements of the indexing set aren't repeated
// by checking for each element in all the elements leading up to it
expect !has_in_first_n(indexing_set, elem, idx)
// Index into this list; in the real implementation I use an
// unsafe_fast_index_skip that indescriminately skips in large chunks
// before actually bounds-checking the last few steps
let input = list.at(inputs, elem)
input.output
},
)
} In a particular benchmark, the above takes a total of If I comment out the If I then change it to just loop over the list and return each input (to loosely measure the unsafe_fast_index_skip), it drops to It's worth noting, I am always bottlenecked on memory, and it's not even close. Here's what I was thinking I could try with bitwise operators: pub fn do_get_sorted_orders(
indexing_set: List<Int>,
inputs: List<Input>,
seen_indexes: ByteArray,
) -> List<Output> {
let [head, ..tail] = indexing_set
let idx_bit = (1 << head)
expect seen_indexes && idx_bit == 0
[list.at(inputs, head), ..do_get_sorted_orders(tail, inputs, seen_indexes || idx_bit)]
}
pub fn get_sorted_orders(
indexing_set: List<Int>,
inputs: List<Input>,
) -> List<Output> {
do_get_sorted_orders(indexing_set, inputs, 0)
} Or something similar, depending on how the bitwise operators work. |
Hopefully the above is at least somewhat helpful to you 😅 On an unrelated note, it's really the O(n) list indexing that is absolutely wrecking performance everywhere. If we had O(1) list indexing, or dictionaries, it feels like it would open up entirely new possibilities for us. |
@kozross I've been doing some preliminary costing experiments with this branch by benchmarking the Plutus Core versions of the bitwise functions with inputs of varying lengths. Most of them seem to behave as you'd expect (execution time is approximately linear (or constant) in the length of the bytestring), but the results for The time increases linearly up to length 3247 (bytes) and then drops to a constant. I could believe that it might be constant because the layout of the bytes doesn't change when you convert to a bytestring (I tried looking at the For
I've produced similar results on three different computers and with different inputs, and I'm pretty sure the benchmarks aren't doing anything stupid (although I'll check again). Do you have any insight into what might be going on? |
@kwxm - I have a few suspicions. 8KiB is the page size: this means you're paying an extra penalty on paging, which may explain that discontinuity. The conversion difference is much harder to explain: I'd have to check what I wrote originally. |
Ok, thanks. I'll check my benchmarks again to make sure that I've not made some silly mistake. We haven't seen anything like this for any of the existing builtins, but I'll re-benchmark some of them too in case some change (in the GHC runtime for example) has introduced some new behaviour. |
@kwxm - do the benches I wrote for those operations reproduce this behaviour? I haven't run them in a while, but I don't recall seeing such jumps. Maybe I just didn't test with large enough values? |
It looks like they do, for |
Definitely something I'd use - I use bitwise operators all the time in javascript and other languages. One particular use case that seems relevant - for my recent Smart Life NFT project I am generating Wolfram's rule numbers for 1-d cellular automata by directly converting between 8 individual boolean fields into an 8 bit int which represents the rule number. |
I now think the weirdness in the execution time for |
I've added costing code (and a few other things) in this branch and updated the cost model so that it assigns a non-zero cost to the bitwise operations. We're in the process of transitioning to a new benchmarking machine so the bitwise costs won't be totally consistent with the costs for existing operations; however, it'll at least give us ballpark estimates of costs for programs involving the bitwise operations. |
Good to know that this CIP is making progress. I have implemented a pseudo-random number generator that could be a good use case for testing. It's a linear-shift feedback register, and currently, it's prohibitive to run it on-chain, having bit-wise operations would benefit this a lot. |
@michaelpj , @kozross , @perturbing. @kwxm I have doubts if bitwise operations on byte strings is an optimal choice for cryptographic applications. Perhaps it's worth revising the design choice by allowing bitwise operations on Plutus integers directly. libgmp which is used as runtime for haskell and Plutus integers has direct support for bitwise operations on integers. |
I brought up these concerns long ago somewhere and made an alternate design. |
The ticket was created: cardano-foundation/CIPs#794. Let's continue the discussion there. |
This is a work-in-progress implementation of this CIP, which ultimately aims to solve #4252. I have left the pre-submit checklist in place (below) to be filled in as this gets finished.
Edit: This is now complete. @michaelpj - requesting a review.
Below is the checklist of tasks that have been done, and those that still remain outstanding:
integerToByteString
PlutusTx
primitivebyteStringToInteger
PlutusTx
primitiveandByteString
PlutusTx
primitiveiorByteString
PlutusTx
primitivexorByteString
PlutusTx
primitivecomplementByteString
PlutusTx
primitiveshiftByteString
PlutusTx
primitiverotateByteString
PlutusTx
primitivepopCountByteString
PlutusTx
primitivetestBitByteString
PlutusTx
primitivewriteBitByteString
PlutusTx
primitivefindFirstSetByteString
PlutusTx
primitivePre-submit checklist: