Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floats are encodedwith sign extension while doubles are encoded without #300

Open
sfackler opened this issue Oct 9, 2021 · 4 comments
Open
Labels

Comments

@sfackler
Copy link

sfackler commented Oct 9, 2021

I'm working on a Smile implementation in Rust, and am trying to exactly match Jackson's behavior to be able to test against it. While testing floats and doubles, I've noticed an inconsistency in how Jackson's serializer behaves. Specifically, the float and double serialization logic differ in what ends up in the unused high bits of the MSB of the encoded representation.

The float implementation uses arithmetic shifts, so a negative floating point value will end up with the unused high bits set to 1 and a positive floating point value will end up with the unused high bits set to 0. In contrast, the double implementation uses a logical shift in one place, so the unused high bits are always 0.

As an example, here's an encoding of (float) -0 into Smile using 2.13.0:

00000000: 3a29 0a00 2878 0000 0000                 :)..(x....

The first byte of the encoded float is 0x78, aka 0b01111000.

And here's the encoding of (double) -0:

00000000: 3a29 0a00 2901 0000 0000 0000 0000 00    :)..)..........

The first byte of the encoded double is 0x01, aka 0b00000001.

The contents of the unused bits shouldn't matter in practice, but it'd probably be good to unify and explicitly specify the desired behavior here.

@cowtowncoder
Copy link
Member

@sfackler I agree that it would make sense to both define what is expected (required), and then, if necessary, what remains implementation dependent. It sounds handling by Java implementation is inconsistent across float and double; this is unfortunate.

In this particular case it would seem to me that the unused bits must be ignored by decoder (given that there is discrepancy), but that encoders should probably be recommended to use one approach for both cases; probably that of leaving them as 0s.
Java codec would then be deviating from this as of Jackson 2.13.0.

Does above make sense?

I'll file an issue at:

https://github.com/FasterXML/smile-format-specification/issues

linking to this one, so that ideally specification would clarify this behavior.

@sfackler
Copy link
Author

Yep, I think that makes sense.

@cowtowncoder cowtowncoder changed the title (Smile) Floats are encoded with sign extension while doubles are encoded without Floats are encodedwith sign extension while doubles are encoded without Feb 20, 2022
@cowtowncoder
Copy link
Member

I finally went ahead and updated Smile spec, as per:

FasterXML/smile-format-specification#17

Please let me know if this helps. I hope to tackle the encoding itself in (near-ish?) future, probably for 2.14.0 since change may be slight compatibility concern: it is possible some decoders could rely on sign extension.
Not sure how to alleviate that concern: maybe add a SmileParser.Feature to allow old handling.

@sfackler
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants