Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serde deserializer fails deserialize to untagged enums which has integer(i128) values #19

Open
sugyan opened this issue Aug 4, 2024 · 9 comments · May be fixed by #21 or #22
Open

serde deserializer fails deserialize to untagged enums which has integer(i128) values #19

sugyan opened this issue Aug 4, 2024 · 9 comments · May be fixed by #21 or #22

Comments

@sugyan
Copy link

sugyan commented Aug 4, 2024

Similar to the issue previously reported on ipld/serde_ipld_dagcbor#21, when trying to deserialize to an Internally tagged or Untagged enum, it seems that the serde internally calls deserialize_any, in which case i128 is not supported.

The following test fails.

#[test]
fn ipld_deserializer_integer_untagged() {
    #[derive(Clone, Debug, Deserialize, PartialEq)]
    #[serde(untagged)]
    enum MyEnum {
        Foo { value: bool },
        Bar { value: i32 },
    }

    // foo: OK
    {
        let ipld = Ipld::Map(BTreeMap::from([("value".into(), Ipld::Bool(true))]));
        match MyEnum::deserialize(ipld) {
            Ok(deserialized) => {
                assert_eq!(deserialized, MyEnum::Foo { value: true });
            }
            Err(e) => {
                panic!("{e:?}");
            }
        }
    }
    // bar: FAIL
    {
        let ipld = Ipld::Map(BTreeMap::from([("value".into(), Ipld::Integer(42))]));
        match MyEnum::deserialize(ipld) {
            Ok(deserialized) => {
                assert_eq!(deserialized, MyEnum::Bar { value: 42 });
            }
            Err(e) => {
                panic!("{e:?}");
            }
        }
    }
}
SerdeError("invalid type: integer `42` as i128, expected any value")

I think this is a serde's unresolved issue, so it may not be the responsibility of this library.

However, we can work around this problem by changing Ipld::Integer to have i64 instead of i128.
my experimental branch:
sugyan@539530d

Since the IPLD specification requires minimum support for values up to 2^53, the change to i64 is destructive but still meets the specification.
https://ipld.io/design/tricky-choices/numeric-domain/#integers

Is it possible to change it so that, for example, "if a certain feature flag is specified, the Ipld::Integer of i64 is used"?

My background is that I am developing a library atrium, and I would like to provide a function to hold unknown values received from the server as an Ipld and deserialize them to an arbitrary type on the user side. However, I am having trouble converting from Ipld to other types due to this problem...

@vmx
Copy link
Member

vmx commented Aug 5, 2024

For me using i128 for an IPLD Integer was because we can do that easily. But it sucks that it's breaking with Serde features people use. I also expect IPLD being used mostly through Serde.

Adding a feature is certainly possible, though I generally prefer taking the time to think about it a bit, whether there's a way so that we don't end up with an infinite list of knobs one could tweak (and end up in maintenance/support hell).

Hence I wonder if it makes sense to change the Integer type by default to i64. This is a big breaking change in theory, but I don't think the impact is that big practically. Many programming language have as a maximum built-in integer type of 64-bit, so IPLD libraries in those languages likely use that type.

Another way to look at this is: this is another case where Serde untagged enums are a problem. So should we just add a new feature as proposed here and document it properly that if you used untagged enums, that you should enable that features?

@Stebalien Do you have any thoughts on this?

@sugyan
Copy link
Author

sugyan commented Aug 5, 2024

Thanks for your thoughts on this!

As a library user (if we don't consider compatibility, etc.), would the ideal form be to use i64 by default, and then be able to change to use num-bigint by the feature flag if we want to handle larger values?
(Even when dealing with bigint, the full serde functionality will probably not be available, so in this case serde support should either be disabled or limited, and it should be clearly stated in the documentation...)

@Stebalien
Copy link
Contributor

So, the issue with just i64 is that u64::MAX is a nice and useful sentinel value (and isn't representable in i64). On the other hand, the Ipld thing is mostly for convenience when dealing with data-model types...

I wonder if we should have an IpldGeneric<P: IpldPrimitives> type aliased to Ipld<SaneDefaults>. Something like:

/// Doesn't have to be a trait, could just be type parameters but... that's a lot of type parameters.
pub trait Primitives {
    type Integer;
    type String;
    type Map;
    // ...
}
pub enum IpldGeneric<P: Primitives> {
    Integer(P::Integer),
    String(P::String),
    // ...
}

type Ipld = IpldGeneric<DefaultPrimitives>;

That should give users the tools to do whatever they need. But it's also adding complexity.

(anyways, please ignore the 2^53 thing)

@vmx
Copy link
Member

vmx commented Aug 6, 2024

@Stebalien That's an interesting idea. Having used Rust for a while, I wonder if that adds complexity at unintended places.

I wonder if that makes interoperability between systems/libraries a problem. But maybe it's also a win as this way it breaks in kind of visible ways and not just subtle due to random feature flags.

@sugyan would you be open to give that idea a chance and try it out. I'd imagine that atrium is a large enough system to get an idea whether this ends up in unusable complexities or is quite easy to use.

@sugyan
Copy link
Author

sugyan commented Aug 6, 2024

@vmx I personally don't care what implementation changes are made, as long as they solve the enum deserialize failures that are currently troubling atrium anyway.

I think it would be a great implementation to allow users to choose the type they want to use more flexibly. However, I can't imagine that there would be any case where the user would want to change the primitive type other than to make the integer i64 or larger.

vmx added a commit that referenced this issue Aug 12, 2024
As issue #19 outlines, it might make sense to use a different internal
type for integer values. This commit makes all primitive types generic,
so that you can choose your own.

Though you're discouraged from doing so. Usually using the IPLD type
should be enough and leads to the best interoperability.

Thanks @Stebalien for the idea of this implementation.

Fixes #19.
vmx added a commit that referenced this issue Aug 12, 2024
As issue #19 outlines, it might make sense to use a different internal
type for integer values. This commit makes all primitive types generic,
so that you can choose your own.

Though you're discouraged from doing so. Usually using the IPLD type
should be enough and leads to the best interoperability.

Thanks @Stebalien for the idea of this implementation.

Fixes #19.
@vmx vmx linked a pull request Aug 12, 2024 that will close this issue
@vmx
Copy link
Member

vmx commented Aug 12, 2024

@sugyan I've created a PR with @Stebalien's suggestion at #21. It would be cool if you could try it out and also let me know if you ran into any weird issues. My hope would be that things would be smooth.

You would add something like this to your code:

#[derive(Clone)]
pub struct PrimitivesI64;

impl Primitives for PrimitivesI64 {
    type Bool = bool;
    type Integer = i64;
    type Float = f64;
    type String = String;
    type Bytes = Vec<u8>;
    type Link = Cid;
}

type Ipld = IpldGeneric<PrimitivesI64>;

@sugyan
Copy link
Author

sugyan commented Aug 13, 2024

Unfortunately, in my case this does not seem to work as expected.

https://github.com/sugyan/atrium/actions/runs/10372924349/job/28716796955?pr=212#step:3:187

Because the only implementation of Serializer or Deserializer in crate::serde::ser, crate::serde::de is for Ipld (it's just IpldGeneric<DefaultPrimitives>) and if we define new type with type Ipld = IpldGeneric<PrimitivesI64>, it cannot be serialized/deserialized them because there is no impl of Serializer or Deserializer for them.

vmx added a commit that referenced this issue Sep 2, 2024
By default the IPLD Integer kind is represented internally as an `i128`. Serde
has problems with untagged enums that contain i128 types. Therefore a feature
flag called `integer-max-i64` is introduced, which reduces the internal integer
representation to `i64`.

This flag should be used with caution as e.g. not all valid DAG-CBOR data can
now be represented.

Closes #19.
vmx added a commit that referenced this issue Sep 2, 2024
By default the IPLD Integer kind is represented internally as an `i128`. Serde
has problems with untagged enums that contain i128 types. Therefore a feature
flag called `integer-max-i64` is introduced, which reduces the internal integer
representation to `i64`.

This flag should be used with caution as e.g. not all valid DAG-CBOR data can
now be represented.

Closes #19.
vmx added a commit that referenced this issue Sep 2, 2024
By default the IPLD Integer kind is represented internally as an `i128`. Serde
has problems with untagged enums that contain i128 types. Therefore a feature
flag called `integer-max-i64` is introduced, which reduces the internal integer
representation to `i64`.

This flag should be used with caution as e.g. not all valid DAG-CBOR data can
now be represented.

Closes #19.
@vmx vmx linked a pull request Sep 2, 2024 that will close this issue
@vmx
Copy link
Member

vmx commented Sep 2, 2024

@sugyan sorry for taking so long and thanks for giving #21 a try. Although it's kind of a nice idea to solve it with traits, the added complexity gets too much out of control for my taste. Thinking about it again, I think a feature flag might be the simplest and easiest to follow solution.

@sugyan if you could try out #22 it would be great. It should be as simple as enabling the integer-max-i64 feature.

@sugyan
Copy link
Author

sugyan commented Sep 3, 2024

@vmx Thank you for continuing to work on solving the problem! It seems to be very difficult to solve the problem with traits...
The solution in #22 is relatively simple, I've confirmed that it works fine in my library!
sugyan/atrium#212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants