Division between array serialization and specification #6

sneakers-the-rat · 2024-02-03T03:44:45Z

I've said this a few times when we've talked on zoom during the hackathons, so I don't mean to be a broken record, but one of the places that a lot of prior schema languages have messed up array specification is taking on too much of the weight of specifying the actual encoding of the arrays, rather than being a schematic description that is generic across serializations.

The generality of the current form is pretty good! one way that I see us buying more complexity than we need to though is in this GroupingByArrayOrder idea:
https://github.com/linkml/linkml-model/blob/aab9842be0e230c0040688dfc6ffa26696c97827/linkml_model/model/schema/array.yaml#L67-L94

That's an implementation detail of how arrays are stored and indexed - I don't think we should touch the storage part in the schema, and the indexing part is handled by the rest of the array specification, right? I could be missing something that requires that to be specified in the schema, but I think in general it would be good to make a clear separation of concerns here - a decent test is "can this array specification be satisfied in such a way that the schema knows absolutely nothing about the way that the array is serialized?" where the responsibility for getting the array ordering correct is that of the dumper/loader, similarly to how we would expect the dumper/loader to correctly handle chunking and other serialization details.

This is actually what i want to work on at the hackashop - to work on a second set of specifications for declaring serializations, so in a linked data context one would be able to say "this particular array has n linked serializations - this numpy format, that zarr format, etc." without having that be specified in the array's schema. So a way of saying "this particular hash of a binary stream is annotated with being a numpy ndarray with shape (x,y)" and all the other details needed to handle the serialization/deserialization that could be consumed by a generalized dumper/loaders. So we may want to just talk about this next week :)

The text was updated successfully, but these errors were encountered:

sneakers-the-rat mentioned this issue Feb 3, 2024

Use lists of lists for elements #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Division between array serialization and specification #6

Division between array serialization and specification #6

sneakers-the-rat commented Feb 3, 2024

Division between array serialization and specification #6

Division between array serialization and specification #6

Comments

sneakers-the-rat commented Feb 3, 2024