Floats are encodedwith sign extension while doubles are encoded without #300

sfackler · 2021-10-09T18:42:05Z

I'm working on a Smile implementation in Rust, and am trying to exactly match Jackson's behavior to be able to test against it. While testing floats and doubles, I've noticed an inconsistency in how Jackson's serializer behaves. Specifically, the float and double serialization logic differ in what ends up in the unused high bits of the MSB of the encoded representation.

The float implementation uses arithmetic shifts, so a negative floating point value will end up with the unused high bits set to 1 and a positive floating point value will end up with the unused high bits set to 0. In contrast, the double implementation uses a logical shift in one place, so the unused high bits are always 0.

As an example, here's an encoding of (float) -0 into Smile using 2.13.0:

00000000: 3a29 0a00 2878 0000 0000                 :)..(x....

The first byte of the encoded float is 0x78, aka 0b01111000.

And here's the encoding of (double) -0:

00000000: 3a29 0a00 2901 0000 0000 0000 0000 00    :)..)..........

The first byte of the encoded double is 0x01, aka 0b00000001.

The contents of the unused bits shouldn't matter in practice, but it'd probably be good to unify and explicitly specify the desired behavior here.

The text was updated successfully, but these errors were encountered:

cowtowncoder · 2021-10-18T00:30:44Z

@sfackler I agree that it would make sense to both define what is expected (required), and then, if necessary, what remains implementation dependent. It sounds handling by Java implementation is inconsistent across float and double; this is unfortunate.

In this particular case it would seem to me that the unused bits must be ignored by decoder (given that there is discrepancy), but that encoders should probably be recommended to use one approach for both cases; probably that of leaving them as 0s.
Java codec would then be deviating from this as of Jackson 2.13.0.

Does above make sense?

I'll file an issue at:

https://github.com/FasterXML/smile-format-specification/issues

linking to this one, so that ideally specification would clarify this behavior.

sfackler · 2021-10-18T00:43:53Z

Yep, I think that makes sense.

cowtowncoder · 2022-02-20T17:54:17Z

I finally went ahead and updated Smile spec, as per:

FasterXML/smile-format-specification#17

Please let me know if this helps. I hope to tackle the encoding itself in (near-ish?) future, probably for 2.14.0 since change may be slight compatibility concern: it is possible some decoders could rely on sign extension.
Not sure how to alleviate that concern: maybe add a SmileParser.Feature to allow old handling.

sfackler · 2022-02-20T18:12:20Z

Thanks!

cowtowncoder added the smile label Feb 20, 2022

cowtowncoder changed the title ~~(Smile) Floats are encoded with sign extension while doubles are encoded without~~ Floats are encodedwith sign extension while doubles are encoded without Feb 20, 2022

cowtowncoder mentioned this issue Feb 20, 2022

Should clarify expected handling of unused bits for float and double value encoding FasterXML/smile-format-specification#17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Floats are encodedwith sign extension while doubles are encoded without #300

Floats are encodedwith sign extension while doubles are encoded without #300

sfackler commented Oct 9, 2021 •

edited

Loading

cowtowncoder commented Oct 18, 2021

sfackler commented Oct 18, 2021

cowtowncoder commented Feb 20, 2022

sfackler commented Feb 20, 2022

Floats are encodedwith sign extension while doubles are encoded without #300

Floats are encodedwith sign extension while doubles are encoded without #300

Comments

sfackler commented Oct 9, 2021 • edited Loading

cowtowncoder commented Oct 18, 2021

sfackler commented Oct 18, 2021

cowtowncoder commented Feb 20, 2022

sfackler commented Feb 20, 2022

sfackler commented Oct 9, 2021 •

edited

Loading