Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown Excluded in sublists #100

Open
remidebette opened this issue May 1, 2020 · 11 comments
Open

Unknown Excluded in sublists #100

remidebette opened this issue May 1, 2020 · 11 comments
Labels

Comments

@remidebette
Copy link

Hi,

I am quite new to the marshmallow ecosystem.
I am considering the use of desert as a way to write python API clients by explicitly describing their data structure to ease the deserialisation.

In the process of writing an API Client for a specific purpose one could be confronted to a lot of unnecessary data in API responses that he does not intend to use.
Exploring this data and writing up the schemas while continuously testing, a way to build such code could be to default to marshmallow EXCLUDE for unknown fields.
That way, continuous TDD with "deserialisation of real life JSONS" tests would be possible without ValidationError while not having to describe unused chunks of the data.

Furthermore, continuing to use EXCLUDE after this initial stage of development could be acceptable, if the policy of the API itself is to authorize adding of values for minor changes.
One could want to make its API client future-proof by not breaking when such a change happens on the other side that he does not necessarily controls.

But this behavior is broken when deserialising unknown fields in a List:

from dataclasses import dataclass
from typing import List

import desert
import marshmallow


@dataclass
class Ambiguous:
    firstname: str
    lastname: str


@dataclass
class AmbiguousList:
    id: str
    not_yet_known: List[Ambiguous]


AmbiguousListSchema = desert.schema_class(AmbiguousList, meta={"unknown": marshmallow.EXCLUDE})()

AmbiguousListSchema.loads("""
{
    "id":"789456132",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith",
        }
    ]
}
""")
# Ok

AmbiguousListSchema.loads("""
{
    "id":"789456132",
    "shiny_new_arg": "uninteresting info",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith"
        }
    ]
}
""")
# Still Ok

AmbiguousListSchema.loads("""
{
    "id":"789456132",
    "shiny_new_arg": "uninteresting info",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith",
            "shiny_new_arg": "uninteresting info"
        }
    ]
}
""")
# marshmallow.exceptions.ValidationError: {'not_yet_known': {1: {'shiny_new_arg': ['Unknown field.']}}}

I am not sure what is the library that is responsible. It seems that marshmallow_dataclass shares the same error.

@altendky
Copy link
Member

altendky commented May 1, 2020

Seems like the marshmallow.EXCLUDE is being applied only to the top level schema and isn't cascaded down. I can imagine sometimes wanting it to cascade (such as your case probably) but I can also imagine wanting to specify that per class. I'm not the only one here so hopefully some discussion can ensue on this.

@altendky
Copy link
Member

altendky commented May 1, 2020

Hmm, I guess in the granular case it could be per attribute's usage of a class/schema? A pure marshmallow line for not_yet_known: List[Ambiguous] would look something like not_yet_known = marshmallow.fields.List(AmbiguousSchema(unknown=marshmallow.EXCLUDE)). So it seems that the closer we can get it to the specification of the class the better.

@remidebette
Copy link
Author

remidebette commented May 1, 2020

Ok so either I need to cascade or to specify EXCLUDE for specific subclasses.
I could live with either.

Is there already a syntax for transposing the above marshmallow line to a desert compatible dataclass without having to create an intermediary AmbiguousSchema?

@altendky
Copy link
Member

altendky commented May 1, 2020

I guess it might be:

not_yet_known: List[Ambiguous] = desert.ib(
    marshmallow_field=marshmallow.fields.List(
        desert.schema(cls=Ambiguous, meta={'exclude': marshmallow.EXCLUDE})
    )
)

@altendky
Copy link
Member

altendky commented May 1, 2020

If you need that a lot you can for now compensate with a helper function.

def my_excluding_ib(cls):
    return desert.ib(
        marshmallow_field=marshmallow.fields.List(
            desert.schema(cls=cls, meta={'exclude': marshmallow.EXCLUDE})
        )
    )

not_yet_known: List[Ambiguous] = my_excluding_ib(cls=Ambiguous)

Obviously still not as dry as we'd like desert to be what with the repetition of Ambiguous, but perhaps good enough to be a useful stop gap.

@remidebette
Copy link
Author

remidebette commented May 1, 2020

So I tried your first example and got a Type error since fields.List accepts only FieldABC.
I reworked it to use fields.Nested, but even so I still have the initial error:

The full code:

from dataclasses import dataclass
from typing import List

import desert
import marshmallow


@dataclass
class Ambiguous:
    firstname: str
    lastname: str

# Apparently Nested prefers to get the Schema Class
AmbiguousSchemaClass = desert.schema_class(Ambiguous, meta={'unknown': marshmallow.EXCLUDE})


@dataclass
class AmbiguousList:
    id: str
    not_yet_known: List[Ambiguous] = desert.ib(marshmallow_field=marshmallow.fields.List(marshmallow.fields.Nested(AmbiguousSchemaClass)))


AmbiguousListSchema = desert.schema_class(AmbiguousList, meta={"unknown": marshmallow.EXCLUDE})()

AmbiguousListSchema.loads("""
{
    "id":"789456132",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith"
        }
    ]
}
""")
# Ok

dat = AmbiguousListSchema.loads("""
{
    "id":"789456132",
    "shiny_new_arg": "uninteresting info",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith"
        }
    ]
}
""")
# Still Ok

AmbiguousListSchema.loads("""
{
    "id":"789456132",
    "shiny_new_arg": "uninteresting info",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith",
            "shiny_new_arg": "uninteresting info"
        }
    ]
}
""")
# marshmallow.exceptions.ValidationError: {'not_yet_known': {1: {'shiny_new_arg': ['Unknown field.']}}}

@altendky
Copy link
Member

altendky commented May 1, 2020

Yeah, I usually forget Nested, sorry. I also had my head all turned around on everything else, so sorry for that. desert.ib() is for attrs while desert.field() is for dataclasses. Also, the meta= for both are _not_ going down to marshmallow.

from dataclasses import dataclass
from typing import List

import desert
import marshmallow


@dataclass
class Ambiguous:
    firstname: str
    lastname: str


AmbiguousSchemaClass = desert.schema_class(Ambiguous)


@dataclass
class AmbiguousList:
    id: str
    not_yet_known: List[Ambiguous] = desert.field(
        marshmallow_field=marshmallow.fields.List(
            marshmallow.fields.Nested(
                AmbiguousSchemaClass(unknown=marshmallow.EXCLUDE),
            ),
        ),
    )


AmbiguousListSchema = desert.schema_class(AmbiguousList)

AmbiguousListSchema(unknown=marshmallow.EXCLUDE).loads("""
{
    "id":"789456132",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith"
        }
    ]
}
""")

dat = AmbiguousListSchema(unknown=marshmallow.EXCLUDE).loads("""
{
    "id":"789456132",
    "shiny_new_arg": "uninteresting info",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith"
        }
    ]
}
""")

AmbiguousListSchema(unknown=marshmallow.EXCLUDE).loads("""
{
    "id":"789456132",
    "shiny_new_arg": "uninteresting info",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith",
            "shiny_new_arg": "uninteresting info"
        }
    ]
}
""")

@remidebette
Copy link
Author

remidebette commented May 2, 2020

Thanks @altendky ,

I reworked the example for the consistency of the codestyle but now it works.

So my request about DRY coding in desert would be to have ways to pass marshmallow parameters more simply without having to redefine the intermediary schema and the marshmallow type in the desert.field.
Or to have a way to cascade.

But at least I have a workaround to continue using desert for now.

The final code:

from dataclasses import dataclass
from typing import List

import desert
import marshmallow


@dataclass
class Ambiguous:
    firstname: str
    lastname: str

# This is what currently has to be introduced to do the unknown sub declaration
AmbiguousSchema = desert.schema_class(Ambiguous, meta={'unknown': marshmallow.EXCLUDE})()


@dataclass
class AmbiguousList:
    id: str
    
    # Here we have to code 2 times the type definition, in both dataclass style and marshmallow style
    not_yet_known: List[Ambiguous] = desert.field(marshmallow_field=marshmallow.fields.List(
        marshmallow.fields.Nested(AmbiguousSchema)))


AmbiguousListSchema = desert.schema_class(AmbiguousList, meta={"unknown": marshmallow.EXCLUDE})()

Maybe the desert.field could have a shortcut way to inject the marshmallow meta parameters while respecting the above logic behind the scenes?

Or maybe my issue is that I expected to be able to write only one final desert schema definition and the rest of the code would be purely dataclasses (without having to explicity write a schema for each sub dataclass that has to be customized)
But today there no way to attach the meta parameter directly to the dataclass, unlike the @dataclass decorator of marshmallow_dataclass for instance

@wanderrful
Copy link

wanderrful commented Sep 4, 2020

There's a similar discussion to this in the marshmallow-code repo: marshmallow-code/marshmallow#1490

@altendky
Copy link
Member

From my quick read of marshmallow-code/marshmallow#1490 it seems that marshmallow intentionally doesn't want to propagate and that the OP concluded that was reasonable? Which would mean that we would be going counter to that I guess? Doesn't necessarily mean we shouldn't, just trying to get myself straight here. My gut says that people would often want to be able to specify this in one place rather than spreading it throughout their nested 'schemas'. I guess I need to read deeper into the secondary link layer explaining why they don't want to propagate. Perhaps we have an explicitly separate option (maybe like mentioned in 1490) that to enable propagation. Though making it an all or nothing has it's own limitations. Hmm...

@Ouradze
Copy link

Ouradze commented Mar 18, 2021

Hi, if it can help someone, I used this to make it work with desert and still keeping it not too convoluted.

@dataclass
class Child:
    name: str

@dataclass
class Parent:
    name: str
    dummy_list: List[str]
    children: List[Child] = field(
        default_factory=list,
        metadata=desert.metadata(
            m_fields.Nested(
                desert.schema(Child, meta={"unknown": EXCLUDE}, many=True), required=True
            )
        ),
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants