-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First pass at native NDArray support. #181
Conversation
@@ -26,7 +26,7 @@ classes: | |||
implements: | |||
- linkml:NDArray | |||
annotations: | |||
dimensions: 1 | |||
dimensions_info: "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I reformat these to use:
array_info:
exact_dimensions: 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this TimestampSeries class, I'm actually not sure it would be allowed to put array_info
at the class definition level, but maybe for the other ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately that doesn't quite work because the range of dimensions_info
is a collection of dimension_expression
s, so these have to be at a different level
e.g. this is invalid yaml:
dimensions_info: 3
x:
y:
z:
x: | ||
y: | ||
rgb: | ||
exact_cardinality: 3 | ||
description: r, g, b values | ||
annotations: | ||
names: "[red, green, blue]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x: | |
y: | |
rgb: | |
exact_cardinality: 3 | |
description: r, g, b values | |
annotations: | |
names: "[red, green, blue]" | |
x: | |
dimension_index: 0 | |
y: | |
dimension_index: 1 | |
rgb: | |
dimension_index: 2 | |
exact_cardinality: 3 | |
description: r, g, b values | |
annotations: | |
names: "[red, green, blue]" |
It would be good to be explicit here. This also allows some dimensions to be unnamed.
Notes:
Some useful validation checks:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I am mostly settled on this, but I have a pitch that i'll make in person about extending the notion of implements
and how it could allow for plugins in a minute (and then update this so that it's public later)
linkml_model/model/schema/meta.yaml
Outdated
array_info: | ||
domain: slot_definition | ||
range: array_info_expression | ||
inherited: true | ||
description: coerces the value of the slot into an array and defines the dimensions of that array | ||
status: testing | ||
|
||
dimensions_info: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hopefully not annoying naming question, why not just array
and dimensions
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we did change it to that)
linkml_model/model/schema/meta.yaml
Outdated
inlined: true | ||
status: testing | ||
|
||
minimum_dimensions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we model mutual exclusivity and constraints between properties here? eg. one shouldn't use exact_dimensions
alongside maximum_dimensions
, maximum_dimensions
shouldn't be less than minimum_dimensions
, etc. Part of why i like expressing ranges/values in a single object like dimensions: 3
or dimensions: {min: 2, max: 3}
is that you can model within the scope of that object, but it seems like that can also be done here i'm just not sure how
linkml_model/model/schema/meta.yaml
Outdated
range: integer | ||
status: testing | ||
|
||
maximum_dimensions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate but related question to above - it seems like ranges and exact values will/is a common pattern, thoughts on having a range syntax so we could just have a single property that can take an integer or a 1..2
specification?
linkml_model/model/schema/meta.yaml
Outdated
@@ -1420,6 +1420,39 @@ slots: | |||
- BasicSubset | |||
- ObjectOrientedProfile | |||
|
|||
array_info: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there no minimum properties that need to be specified on an array? eg. having this property without any values is an Any
shaped array?
Modeling session with Ryan, Ben, Jonny
for posterity, archived version of matrix of tradeoffs to be added to consolidated docs later, along with @rly 's notes and examples :) live: https://wiki.jon-e.net/LinkML_Arrays |
has_extra_dimensions: | ||
description: If this is set to true | ||
domain: array_expression | ||
range: boolean | ||
status: testing | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_extra_dimensions: | |
description: If this is set to true | |
domain: array_expression | |
range: boolean | |
status: testing |
- exact_number_dimensions | ||
- minimum_number_dimensions | ||
- maximum_number_dimensions | ||
- has_extra_dimensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- has_extra_dimensions |
See
max_cardinality
andmin_cardinality
as axis size constraints? linkml-arrays#5This introduces first-class array support into LinkML.
A minimal example would be:
The native serialization of this in json/yaml will be a LoLoL. Using linkml-xarrays it will be possible to serialize using hdf5/zarr/etc.
The corresponding nptyping type would be
NDArray[Shape["*, *, *"], Float]
.(note: modelers will want the ability to use ctypes but this is orthogonal)
Note that this does not force any metadata on the array; we are deferring on the datamodel for what is equivalent to xarray DataArrays, these will be supported via
implements
for now and first-class incorporation in a future version. This will allow binding between axes are other LinkML arrays.Minimal metadata can be introduced via naming the axes
The corresponding nptyping type would be
NDArray[Shape["* x, * y, * z"], Float]
.The shape can be further constrained; imagine an RGB matrix with coords x, y, and a length 3 r/g/b:
corresponds to
NDArray[Shape["* x, * y, 3 rgb"]
For now if you do want to bind dimensions to additional metadata this can be done via annotations: