Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorstore #1174

Open
wants to merge 39 commits into
base: master
Choose a base branch
from
Open

tensorstore #1174

wants to merge 39 commits into from

Conversation

brokkoli71
Copy link

@brokkoli71 brokkoli71 commented Aug 25, 2024

Description:

  • Use tensorstore for zarr arrays
  • Reduces the number of IO ops when opening a dataset
  • based on changes from Tensorstore #1026

Todos:

  • Updated Changelog
  • Added / Updated Tests

@normanrz normanrz added this to the 2.x milestone Oct 9, 2024
@normanrz normanrz mentioned this pull request Dec 5, 2024
4 tasks
@normanrz normanrz self-assigned this Dec 10, 2024
@normanrz normanrz requested a review from markbader December 10, 2024 09:37
@normanrz normanrz marked this pull request as ready for review December 10, 2024 09:37
Copy link
Contributor

@markbader markbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good already. I just have a few questions and suggested some small changes.

webknossos/webknossos/geometry/vec_int.py Outdated Show resolved Hide resolved
webknossos/test.sh Outdated Show resolved Hide resolved
Comment on lines 551 to 552
if all(a == "" for a in axes):
dimension_names = ("c", "x", "y", "z")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the default axis order and axis naming for Zarr arrays? Maybe we should take a look on the length of the shape metadata and set the default dimension names for each length accordingly. E.g.:
len(array.shape) == 2: ("x", "y")
len(array.shape) == 3: ("x", "y", "z")
...

Otherwise, we might get an IndexError when using the x_index, y_index and z_index for accessing chunk_shape and shard_shape.



def _value_error(args: Any) -> str:
return f"VecInt can be instantiated with int values `VecInt(1,2,3,4) or with `VecIntLike` object `VecInt([1,2,3,4]). Got {args}."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the ValueError suggestions for instance creation are wrong, the axes are required.

Suggested change
return f"VecInt can be instantiated with int values `VecInt(1,2,3,4) or with `VecIntLike` object `VecInt([1,2,3,4]). Got {args}."
return f"VecInt can be instantiated with int values `VecInt(1,2,3,4, axes=("x","y","z","t"))`, with axes as argument names `VecInt(x=10, y=42, z=3, t=100)` or with two Iterables one with the axes values and the other with axes names `VecInt((1, 2, 3, 4), ("x", "y", "z", "t"))`. Got {args}."

array = self._array
axes = array.domain.labels
if all(a == "" for a in axes):
dimension_names = ("c", "x", "y", "z")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in the comment above, maybe we should consider the array shape when using these dimension names as default.

)
return kvstore_spec
else:
return str(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a string enough for a KVStoreSpec? I would have assumed that something like {"driver": "file", "path": str(path)} is required here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using a string is a shorthand, but we can also make it explicit.

Comment on lines +93 to +97
# assert remote_ds.display_name is None
# remote_ds.display_name = "Test Remote Dataset"
# assert remote_ds.display_name == "Test Remote Dataset"
# del remote_ds.display_name
# assert remote_ds.display_name is None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# assert remote_ds.display_name is None
# remote_ds.display_name = "Test Remote Dataset"
# assert remote_ds.display_name == "Test Remote Dataset"
# del remote_ds.display_name
# assert remote_ds.display_name is None
assert remote_ds.display_name is None
remote_ds.display_name = "Test Remote Dataset"
assert remote_ds.display_name == "Test Remote Dataset"
del remote_ds.display_name
assert remote_ds.display_name is None

@@ -222,6 +222,7 @@ def check_properties(annotation: wk.Annotation) -> None:
check_properties(annotation_deserialized)


@pytest.mark.skip("issue with current wk version")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.skip("issue with current wk version")

@@ -189,7 +189,7 @@ jobs:
timeout-minutes: 30
env:
WK_TOKEN: ${{ secrets.WK_TOKEN }}
run: ./test.sh -vv -p no:faulthandler --splits 3 --group ${{ matrix.group }} --splitting-algorithm least_duration
run: ./test.sh -vv -s -p no:faulthandler --splits 3 --group ${{ matrix.group }} --splitting-algorithm least_duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run: ./test.sh -vv -s -p no:faulthandler --splits 3 --group ${{ matrix.group }} --splitting-algorithm least_duration
run: ./test.sh -vv -p no:faulthandler --splits 3 --group ${{ matrix.group }} --splitting-algorithm least_duration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants