From fd8802cd1627881433139c296625b6f2f11f8f9f Mon Sep 17 00:00:00 2001 From: Owen Littlejohns Date: Wed, 8 May 2024 17:58:35 -0400 Subject: [PATCH 01/57] DAS-2155 - Merge datatree documentation into main docs. --- ci/requirements/doc.yml | 3 +- doc/api.rst | 336 ++++++++++++++ doc/conf.py | 2 +- doc/internals/extending-xarray.rst | 5 +- doc/internals/internal-design.rst | 8 +- doc/internals/interoperability.rst | 2 +- doc/roadmap.rst | 9 +- doc/user-guide/data-structures.rst | 197 ++++++++ .../user-guide/datatree.rst | 13 +- .../user-guide}/hierarchical-data.rst | 63 +-- doc/user-guide/index.rst | 2 + doc/user-guide/io.rst | 117 ++--- doc/user-guide/terminology.rst | 26 ++ doc/whats-new.rst | 9 + pyproject.toml | 1 - xarray/core/datatree_ops.py | 2 +- xarray/datatree_/docs/Makefile | 183 -------- xarray/datatree_/docs/README.md | 14 - xarray/datatree_/docs/make.bat | 242 ---------- xarray/datatree_/docs/source/api.rst | 362 --------------- xarray/datatree_/docs/source/conf.py | 412 ----------------- xarray/datatree_/docs/source/contributing.rst | 136 ------ .../datatree_/docs/source/data-structures.rst | 197 -------- xarray/datatree_/docs/source/index.rst | 61 --- xarray/datatree_/docs/source/installation.rst | 38 -- xarray/datatree_/docs/source/io.rst | 54 --- xarray/datatree_/docs/source/terminology.rst | 34 -- xarray/datatree_/docs/source/tutorial.rst | 7 - xarray/datatree_/docs/source/whats-new.rst | 426 ------------------ 29 files changed, 658 insertions(+), 2303 deletions(-) rename xarray/datatree_/docs/source/quick-overview.rst => doc/user-guide/datatree.rst (88%) rename {xarray/datatree_/docs/source => doc/user-guide}/hierarchical-data.rst (85%) delete mode 100644 xarray/datatree_/docs/Makefile delete mode 100644 xarray/datatree_/docs/README.md delete mode 100644 xarray/datatree_/docs/make.bat delete mode 100644 xarray/datatree_/docs/source/api.rst delete mode 100644 xarray/datatree_/docs/source/conf.py delete mode 100644 xarray/datatree_/docs/source/contributing.rst delete mode 100644 xarray/datatree_/docs/source/data-structures.rst delete mode 100644 xarray/datatree_/docs/source/index.rst delete mode 100644 xarray/datatree_/docs/source/installation.rst delete mode 100644 xarray/datatree_/docs/source/io.rst delete mode 100644 xarray/datatree_/docs/source/terminology.rst delete mode 100644 xarray/datatree_/docs/source/tutorial.rst delete mode 100644 xarray/datatree_/docs/source/whats-new.rst diff --git a/ci/requirements/doc.yml b/ci/requirements/doc.yml index 066d085ec53..8609e9636a7 100644 --- a/ci/requirements/doc.yml +++ b/ci/requirements/doc.yml @@ -37,7 +37,8 @@ dependencies: - sphinx-copybutton - sphinx-design - sphinx-inline-tabs - - sphinx>=5.0 + - sphinx=6.2.1 # sphinx-book-theme issue: 749 + - sphinxcontrib-srclinks - sphinxext-opengraph - sphinxext-rediraffe - zarr>=2.10 diff --git a/doc/api.rst b/doc/api.rst index a8f8ea7dd1c..773ad6b5664 100644 --- a/doc/api.rst +++ b/doc/api.rst @@ -588,6 +588,298 @@ Reshaping and reorganizing DataArray.sortby DataArray.broadcast_like +DataTree +======== + +Creating a DataTree +------------------- + +Methods of creating a ``DataTree``. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree + xarray.core.datatree.DataTree.from_dict + +Tree Attributes +--------------- + +Attributes relating to the recursive tree-like structure of a ``DataTree``. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.parent + xarray.core.datatree.DataTree.children + xarray.core.datatree.DataTree.name + xarray.core.datatree.DataTree.path + xarray.core.datatree.DataTree.root + xarray.core.datatree.DataTree.is_root + xarray.core.datatree.DataTree.is_leaf + xarray.core.datatree.DataTree.leaves + xarray.core.datatree.DataTree.level + xarray.core.datatree.DataTree.depth + xarray.core.datatree.DataTree.width + xarray.core.datatree.DataTree.subtree + xarray.core.datatree.DataTree.descendants + xarray.core.datatree.DataTree.siblings + xarray.core.datatree.DataTree.lineage + xarray.core.datatree.DataTree.parents + xarray.core.datatree.DataTree.ancestors + xarray.core.datatree.DataTree.groups + +Data Contents +------------- + +Interface to the data objects (optionally) stored inside a single ``DataTree`` node. +This interface echoes that of ``xarray.Dataset``. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.dims + xarray.core.datatree.DataTree.sizes + xarray.core.datatree.DataTree.data_vars + xarray.core.datatree.DataTree.coords + xarray.core.datatree.DataTree.attrs + xarray.core.datatree.DataTree.encoding + xarray.core.datatree.DataTree.indexes + xarray.core.datatree.DataTree.nbytes + xarray.core.datatree.DataTree.ds + xarray.core.datatree.DataTree.to_dataset + xarray.core.datatree.DataTree.has_data + xarray.core.datatree.DataTree.has_attrs + xarray.core.datatree.DataTree.is_empty + xarray.core.datatree.DataTree.is_hollow + +Dictionary Interface +-------------------- + +``DataTree`` objects also have a dict-like interface mapping keys to either ``xarray.DataArray``s or to child ``DataTree`` nodes. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.__getitem__ + xarray.core.datatree.DataTree.__setitem__ + xarray.core.datatree.DataTree.__delitem__ + xarray.core.datatree.DataTree.update + xarray.core.datatree.DataTree.get + xarray.core.datatree.DataTree.items + xarray.core.datatree.DataTree.keys + xarray.core.datatree.DataTree.values + +Tree Manipulation +----------------- + +For manipulating, traversing, navigating, or mapping over the tree structure. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.orphan + xarray.core.datatree.DataTree.same_tree + xarray.core.datatree.DataTree.relative_to + xarray.core.datatree.DataTree.iter_lineage + xarray.core.datatree.DataTree.find_common_ancestor + xarray.core.datatree.DataTree.map_over_subtree + xarray.core.datatree.DataTree.pipe + xarray.core.datatree.DataTree.match + xarray.core.datatree.DataTree.filter + +Pathlib-like Interface +---------------------- + +``DataTree`` objects deliberately echo some of the API of :py:class:`pathlib.PurePath`. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.name + xarray.core.datatree.DataTree.parent + xarray.core.datatree.DataTree.parents + xarray.core.datatree.DataTree.relative_to + +Missing: + +.. + + ``DataTree.glob`` + ``DataTree.joinpath`` + ``DataTree.with_name`` + ``DataTree.walk`` + ``DataTree.rename`` + ``DataTree.replace`` + +DataTree Contents +----------------- + +Manipulate the contents of all nodes in a ``DataTree`` simultaneously. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.copy + xarray.core.datatree.DataTree.assign_coords + xarray.core.datatree.DataTree.merge + xarray.core.datatree.DataTree.rename + xarray.core.datatree.DataTree.rename_vars + xarray.core.datatree.DataTree.rename_dims + xarray.core.datatree.DataTree.swap_dims + xarray.core.datatree.DataTree.expand_dims + xarray.core.datatree.DataTree.drop_vars + xarray.core.datatree.DataTree.drop_dims + xarray.core.datatree.DataTree.set_coords + xarray.core.datatree.DataTree.reset_coords + +DataTree Node Contents +---------------------- + +Manipulate the contents of a single ``DataTree`` node. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.assign + xarray.core.datatree.DataTree.drop_nodes + +Comparisons +----------- + +Compare one ``DataTree`` object to another. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.isomorphic + xarray.core.datatree.DataTree.equals + xarray.core.datatree.DataTree.identical + +Indexing +-------- + +Index into all nodes in the subtree simultaneously. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.isel + xarray.core.datatree.DataTree.sel + xarray.core.datatree.DataTree.drop_sel + xarray.core.datatree.DataTree.drop_isel + xarray.core.datatree.DataTree.head + xarray.core.datatree.DataTree.tail + xarray.core.datatree.DataTree.thin + xarray.core.datatree.DataTree.squeeze + xarray.core.datatree.DataTree.interp + xarray.core.datatree.DataTree.interp_like + xarray.core.datatree.DataTree.reindex + xarray.core.datatree.DataTree.reindex_like + xarray.core.datatree.DataTree.set_index + xarray.core.datatree.DataTree.reset_index + xarray.core.datatree.DataTree.reorder_levels + xarray.core.datatree.DataTree.query + +.. + + Missing: + ``DataTree.loc`` + + +Missing Value Handling +---------------------- + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.isnull + xarray.core.datatree.DataTree.notnull + xarray.core.datatree.DataTree.combine_first + xarray.core.datatree.DataTree.dropna + xarray.core.datatree.DataTree.fillna + xarray.core.datatree.DataTree.ffill + xarray.core.datatree.DataTree.bfill + xarray.core.datatree.DataTree.interpolate_na + xarray.core.datatree.DataTree.where + xarray.core.datatree.DataTree.isin + +Computation +----------- + +Apply a computation to the data in all nodes in the subtree simultaneously. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.map + xarray.core.datatree.DataTree.reduce + xarray.core.datatree.DataTree.diff + xarray.core.datatree.DataTree.quantile + xarray.core.datatree.DataTree.differentiate + xarray.core.datatree.DataTree.integrate + xarray.core.datatree.DataTree.map_blocks + xarray.core.datatree.DataTree.polyfit + xarray.core.datatree.DataTree.curvefit + +Aggregation +----------- + +Aggregate data in all nodes in the subtree simultaneously. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.all + xarray.core.datatree.DataTree.any + xarray.core.datatree.DataTree.argmax + xarray.core.datatree.DataTree.argmin + xarray.core.datatree.DataTree.idxmax + xarray.core.datatree.DataTree.idxmin + xarray.core.datatree.DataTree.max + xarray.core.datatree.DataTree.min + xarray.core.datatree.DataTree.mean + xarray.core.datatree.DataTree.median + xarray.core.datatree.DataTree.prod + xarray.core.datatree.DataTree.sum + xarray.core.datatree.DataTree.std + xarray.core.datatree.DataTree.var + xarray.core.datatree.DataTree.cumsum + xarray.core.datatree.DataTree.cumprod + +ndarray methods +--------------- + +Methods copied from :py:class:`numpy.ndarray` objects, here applying to the data in all nodes in the subtree. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.argsort + xarray.core.datatree.DataTree.astype + xarray.core.datatree.DataTree.clip + xarray.core.datatree.DataTree.conj + xarray.core.datatree.DataTree.conjugate + xarray.core.datatree.DataTree.round + xarray.core.datatree.DataTree.rank + +Reshaping and reorganising +-------------------------- + +Reshape or reorganise the data in all nodes in the subtree. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree.DataTree.transpose + xarray.core.datatree.DataTree.stack + xarray.core.datatree.DataTree.unstack + xarray.core.datatree.DataTree.shift + xarray.core.datatree.DataTree.roll + xarray.core.datatree.DataTree.pad + xarray.core.datatree.DataTree.sortby + xarray.core.datatree.DataTree.broadcast_like + IO / Conversion =============== @@ -652,6 +944,22 @@ DataArray methods DataArray.load DataArray.unify_chunks +DataTree methods +---------------- + +.. autosummary:: + :toctree: generated/ + + xarray.backends.api.open_datatree + xarray.core.datatree.DataTree.to_dict + xarray.core.datatree.DataTree.to_netcdf + xarray.core.datatree.DataTree.to_zarr + +.. + + Missing: + ``open_mfdatatree`` + Coordinates objects =================== @@ -1071,6 +1379,15 @@ Testing testing.assert_allclose testing.assert_chunks_equal +Test that two ``DataTree`` objects are similar. + +.. autosummary:: + :toctree: generated/ + + testing.assertions.assert_isomorphic + testing.assert_equal + testing.assert_identical + Hypothesis Testing Strategies ============================= @@ -1101,6 +1418,18 @@ Exceptions MergeError SerializationWarning +DataTree +-------- + +Exceptions raised when manipulating trees. + +.. autosummary:: + :toctree: generated/ + + xarray.core.datatree_mapping.TreeIsomorphismError + xarray.core.treenode.InvalidTreeError + xarray.core.treenode.NotFoundInTreeError + Advanced API ============ @@ -1110,6 +1439,7 @@ Advanced API Coordinates Dataset.variables DataArray.variable + xarray.core.datatree.DataTree.variables Variable IndexVariable as_variable @@ -1118,12 +1448,18 @@ Advanced API Context register_dataset_accessor register_dataarray_accessor + xarray.core.extensions.register_datatree_accessor Dataset.set_close backends.BackendArray backends.BackendEntrypoint backends.list_engines backends.refresh_engines +.. + + Missing: + ``DataTree.set_close`` + Default, pandas-backed indexes built-in Xarray: indexes.PandasIndex diff --git a/doc/conf.py b/doc/conf.py index 152eb6794b4..641624a165c 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -153,6 +153,7 @@ "DataArray": "~xarray.DataArray", "Dataset": "~xarray.Dataset", "Variable": "~xarray.Variable", + "DataTree": "~xarray.core.datatree.DataTree", "DatasetGroupBy": "~xarray.core.groupby.DatasetGroupBy", "DataArrayGroupBy": "~xarray.core.groupby.DataArrayGroupBy", # objects without namespace: numpy @@ -319,7 +320,6 @@ "cftime": ("https://unidata.github.io/cftime", None), "cubed": ("https://cubed-dev.github.io/cubed/", None), "dask": ("https://docs.dask.org/en/latest", None), - "datatree": ("https://xarray-datatree.readthedocs.io/en/latest/", None), "flox": ("https://flox.readthedocs.io/en/latest/", None), "hypothesis": ("https://hypothesis.readthedocs.io/en/latest/", None), "iris": ("https://scitools-iris.readthedocs.io/en/latest", None), diff --git a/doc/internals/extending-xarray.rst b/doc/internals/extending-xarray.rst index 0537ae85389..1f4ad0cd924 100644 --- a/doc/internals/extending-xarray.rst +++ b/doc/internals/extending-xarray.rst @@ -40,8 +40,9 @@ Writing Custom Accessors ------------------------ To resolve this issue for more complex cases, xarray has the -:py:func:`~xarray.register_dataset_accessor` and -:py:func:`~xarray.register_dataarray_accessor` decorators for adding custom +:py:func:`~xarray.register_dataset_accessor`, +:py:func:`~xarray.register_dataarray_accessor` and +:py:func:`~xarray.core.extensions.register_datatree_accessor` decorators for adding custom "accessors" on xarray objects, thereby "extending" the functionality of your xarray object. Here's how you might use these decorators to diff --git a/doc/internals/internal-design.rst b/doc/internals/internal-design.rst index 55ab2d79dbe..19cb3c6da70 100644 --- a/doc/internals/internal-design.rst +++ b/doc/internals/internal-design.rst @@ -4,6 +4,7 @@ import numpy as np import pandas as pd import xarray as xr + from xarray.core.datatree import DataTree np.random.seed(123456) np.set_printoptions(threshold=20) @@ -21,16 +22,15 @@ In order of increasing complexity, they are: - :py:class:`xarray.Variable`, - :py:class:`xarray.DataArray`, - :py:class:`xarray.Dataset`, -- :py:class:`datatree.DataTree`. +- :py:class:`xarray.core.datatree.DataTree`. The user guide lists only :py:class:`xarray.DataArray` and :py:class:`xarray.Dataset`, but :py:class:`~xarray.Variable` is the fundamental object internally, -and :py:class:`~datatree.DataTree` is a natural generalisation of :py:class:`xarray.Dataset`. +and :py:class:`~xarray.core.datatree.DataTree` is a natural generalisation of :py:class:`xarray.Dataset`. .. note:: - Our :ref:`roadmap` includes plans both to document :py:class:`~xarray.Variable` as fully public API, - and to merge the `xarray-datatree `_ package into xarray's main repository. + Our :ref:`roadmap` includes plans to document :py:class:`~xarray.Variable` as fully public API. Internally private :ref:`lazy indexing classes ` are used to avoid loading more data than necessary, and flexible indexes classes (derived from :py:class:`~xarray.indexes.Index`) provide performant label-based lookups. diff --git a/doc/internals/interoperability.rst b/doc/internals/interoperability.rst index a45363bcab7..66149104f2a 100644 --- a/doc/internals/interoperability.rst +++ b/doc/internals/interoperability.rst @@ -36,7 +36,7 @@ it is entirely possible today to: - track the physical units of the data through computations (e.g via `pint-xarray `_), - query the data via custom index logic optimized for specific applications (e.g. an :py:class:`~xarray.Index` object backed by a KDTree structure), - attach domain-specific logic via accessor methods (e.g. to understand geographic Coordinate Reference System metadata), -- organize hierarchical groups of xarray data in a :py:class:`~datatree.DataTree` (e.g. to treat heterogeneous simulation and observational data together during analysis). +- organize hierarchical groups of xarray data in a :py:class:`xarray.core.datatree.DataTree` (e.g. to treat heterogeneous simulation and observational data together during analysis). All of these features can be provided simultaneously, using libraries compatible with the rest of the scientific python ecosystem. In this situation xarray would be essentially a thin wrapper acting as pure-python framework, providing a common interface and diff --git a/doc/roadmap.rst b/doc/roadmap.rst index 820ff82151c..a0d4ffb685c 100644 --- a/doc/roadmap.rst +++ b/doc/roadmap.rst @@ -202,9 +202,8 @@ Tree-like data structure ++++++++++++++++++++++++ .. note:: - Work on developing a hierarchical data structure in xarray is just - beginning. See `Datatree `__ - for an early prototype. + Work on merging `DataTree `__ into + xarray is currently underway. Xarray’s highest-level object is currently an ``xarray.Dataset``, whose data model echoes that of a single netCDF group. However real-world datasets are @@ -226,8 +225,8 @@ multiple netCDF groups (see :issue:`4118`). Currently there are several libraries which have wrapped xarray in order to build domain-specific data structures (e.g. `xarray-multiscale `__.), -but a general ``xarray.DataTree`` object would obviate the need for these and] -consolidate effort in a single domain-agnostic tool, much as xarray has already achieved. +but a general ``xarray.core.datatree.DataTree`` object obviates the need for these and] +consolidates effort in a single domain-agnostic tool, much as xarray has already achieved. Labeled array without coordinates +++++++++++++++++++++++++++++++++ diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index a1794f4123d..f2c61bfc649 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -495,6 +495,203 @@ dimension and non-dimension variables: ds.coords["day"] = ("time", [6, 7, 8, 9]) ds.swap_dims({"time": "day"}) +DataTree +-------- + +:py:class:`DataTree` is ``xarray``'s highest-level data structure, able to +organise heterogeneous data which could not be stored inside a single +:py:class:`Dataset` object. This includes representing the recursive structure +of multiple `groups`_ within a netCDF file or `Zarr Store`_. + +.. _groups: https://www.unidata.ucar.edu/software/netcdf/workshops/2011/groups-types/GroupsIntro.html +.. _Zarr Store: https://zarr.readthedocs.io/en/stable/tutorial.html#groups + +Each ``DataTree`` object (or "node") contains the same data that a single +``xarray.Dataset`` would (i.e. ``DataArray`` objects stored under hashable +keys), and so has the same key properties: + +- ``dims``: a dictionary mapping of dimension names to lengths, for the + variables in this node, +- ``data_vars``: a dict-like container of DataArrays corresponding to variables + in this node, +- ``coords``: another dict-like container of DataArrays, corresponding to + coordinate variables in this node, +- ``attrs``: dict to hold arbitary metadata relevant to data in this node. + +A single ``DataTree`` object acts much like a single ``Dataset`` object, and +has a similar set of dict-like methods defined upon it. However, ``DataTree``'s +can also contain other ``DataTree`` objects, so they can be thought of as +nested dict-like containers of both ``xarray.DataArray``'s and ``DataTree``'s. + +A single datatree object is known as a "node", and its position relative to +other nodes is defined by two more key properties: + +- ``children``: An ordered dictionary mapping from names to other ``DataTree`` + objects, known as its' "child nodes". +- ``parent``: The single ``DataTree`` object whose children this datatree is a + member of, known as its' "parent node". + +Each child automatically knows about its parent node, and a node without a +parent is known as a "root" node (represented by the ``parent`` attribute +pointing to ``None``). Nodes can have multiple children, but as each child node +has at most one parent, there can only ever be one root node in a given tree. + +The overall structure is technically a `connected acyclic undirected rooted graph`, +otherwise known as a `"Tree" `_. + +.. note:: + + Technically a ``DataTree`` with more than one child node forms an + `"Ordered Tree" `_, + because the children are stored in an Ordered Dictionary. However, this + distinction only really matters for a few edge cases involving operations + on multiple trees simultaneously, and can safely be ignored by most users. + + +``DataTree`` objects can also optionally have a ``name`` as well as ``attrs``, +just like a ``DataArray``. Again these are not normally used unless explicitly +accessed by the user. + + +.. _creating a datatree: + +Creating a DataTree +~~~~~~~~~~~~~~~~~~~ + +One way to create a ``DataTree`` from scratch is to create each node individually, +specifying the nodes' relationship to one another as you create each one. + +The ``DataTree`` constructor takes: + +- ``data``: The data that will be stored in this node, represented by a single + ``xarray.Dataset``, or a named ``xarray.DataArray``. +- ``parent``: The parent node (if there is one), given as a ``DataTree`` object. +- ``children``: The various child nodes (if there are any), given as a mapping + from string keys to ``DataTree`` objects. +- ``name``: A string to use as the name of this node. + +Let's make a single datatree node with some example data in it: + +.. ipython:: python + + from xarray.core.datatree import DataTree + + ds1 = xr.Dataset({"foo": "orange"}) + dt = DataTree(name="root", data=ds1) # create root node + + dt + +At this point our node is also the root node, as every tree has a root node. + +We can add a second node to this tree either by referring to the first node in +the constructor of the second: + +.. ipython:: python + + ds2 = xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}) + # add a child by referring to the parent node + node2 = DataTree(name="a", parent=dt, data=ds2) + +or by dynamically updating the attributes of one node to refer to another: + +.. ipython:: python + + # add a second child by first creating a new node ... + ds3 = xr.Dataset({"zed": np.NaN}) + node3 = DataTree(name="b", data=ds3) + # ... then updating its .parent property + node3.parent = dt + +Our tree now has three nodes within it: + +.. ipython:: python + + dt + +It is at tree construction time that consistency checks are enforced. For +instance, if we try to create a `cycle` the constructor will raise an error: + +.. ipython:: python + :okexcept: + + dt.parent = node3 + +Alternatively you can also create a ``DataTree`` object from + +- An ``xarray.Dataset`` using ``Dataset.to_node()`` (not yet implemented), +- A dictionary mapping directory-like paths to either ``DataTree`` nodes or + data, using :py:meth:`DataTree.from_dict()`, +- A netCDF or Zarr file on disk with :py:func:`open_datatree()`. See + :ref:`reading and writing files `. + + +DataTree Contents +~~~~~~~~~~~~~~~~~ + +Like ``xarray.Dataset``, ``DataTree`` implements the python mapping interface, +but with values given by either ``xarray.DataArray`` objects or other +``DataTree`` objects. + +.. ipython:: python + + dt["a"] + dt["foo"] + +Iterating over keys will iterate over both the names of variables and child nodes. + +We can also access all the data in a single node through a dataset-like view + +.. ipython:: python + + dt["a"].ds + +This demonstrates the fact that the data in any one node is equivalent to the +contents of a single ``xarray.Dataset`` object. The ``DataTree.ds`` property +returns an immutable view, but we can instead extract the node's data contents +as a new (and mutable) ``xarray.Dataset`` object via +:py:meth:`xarray.core.datatree.DataTree.to_dataset()`: + +.. ipython:: python + + dt["a"].to_dataset() + +Like with ``Dataset``, you can access the data and coordinate variables of a +node separately via the ``data_vars`` and ``coords`` attributes: + +.. ipython:: python + + dt["a"].data_vars + dt["a"].coords + + +Dictionary-like methods +~~~~~~~~~~~~~~~~~~~~~~~ + +We can update a datatree in-place using Python's standard dictionary syntax, +similar to how we can for Dataset objects. For example, to create this example +datatree from scratch, we could have written: + +.. ipython:: python + + dt = DataTree(name="root") + dt["foo"] = "orange" + dt["a"] = DataTree(data=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])})) + dt["a/b/zed"] = np.NaN + dt + +To change the variables in a node of a ``DataTree``, you can use all the +standard dictionary methods, including ``values``, ``items``, ``__delitem__``, +``get`` and :py:meth:`xarray.core.datatree.DataTree.update`. +Note that assigning a ``DataArray`` object to a ``DataTree`` variable using +``__setitem__`` or ``update`` will :ref:`automatically align ` the +array(s) to the original node's indexes. + +If you copy a ``DataTree`` using the :py:func:`copy` function or the +:py:meth:`xarray.core.datatree.DataTree.copy` method it will copy the subtree, +meaning that node and children below it, but no parents above it. +Like for ``Dataset``, this copy is shallow by default, but you can copy all the +underlying data arrays by calling ``dt.copy(deep=True)``. + .. _coordinates: Coordinates diff --git a/xarray/datatree_/docs/source/quick-overview.rst b/doc/user-guide/datatree.rst similarity index 88% rename from xarray/datatree_/docs/source/quick-overview.rst rename to doc/user-guide/datatree.rst index 4743b0899fa..e336f9e29f7 100644 --- a/xarray/datatree_/docs/source/quick-overview.rst +++ b/doc/user-guide/datatree.rst @@ -1,13 +1,9 @@ -.. currentmodule:: datatree - -############## -Quick overview -############## +.. _datatree: DataTrees --------- -:py:class:`DataTree` is a tree-like container of :py:class:`xarray.DataArray` objects, organised into multiple mutually alignable groups. +:py:class:`xarray.core.datatree.DataTree` is a tree-like container of :py:class:`xarray.DataArray` objects, organised into multiple mutually allignable groups. You can think of it like a (recursive) ``dict`` of :py:class:`xarray.Dataset` objects. Let's first make some example xarray datasets (following on from xarray's @@ -17,6 +13,7 @@ Let's first make some example xarray datasets (following on from xarray's import numpy as np import xarray as xr + from xarray.core.datatree import DataTree data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]}) ds = xr.Dataset(dict(foo=data, bar=("x", [1, 2]), baz=np.pi)) @@ -35,8 +32,6 @@ Now we'll put this data into a multi-group tree: .. ipython:: python - from datatree import DataTree - dt = DataTree.from_dict({"simulation/coarse": ds, "simulation/fine": ds2, "/": ds3}) dt @@ -81,4 +76,4 @@ This allows you to work with multiple groups of non-alignable variables at once. If all of your variables are mutually alignable (i.e. they live on the same grid, such that every common dimension name maps to the same length), - then you probably don't need :py:class:`DataTree`, and should consider just sticking with ``xarray.Dataset``. + then you probably don't need :py:class:`xarray.core.datatree.DataTree`, and should consider just sticking with ``xarray.Dataset``. diff --git a/xarray/datatree_/docs/source/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst similarity index 85% rename from xarray/datatree_/docs/source/hierarchical-data.rst rename to doc/user-guide/hierarchical-data.rst index d4f58847718..94b5a7481f2 100644 --- a/xarray/datatree_/docs/source/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -1,5 +1,3 @@ -.. currentmodule:: datatree - .. _hierarchical-data: Working With Hierarchical Data @@ -11,7 +9,7 @@ Working With Hierarchical Data import numpy as np import pandas as pd import xarray as xr - from datatree import DataTree + from xarray.core.datatree import DataTree np.random.seed(123456) np.set_printoptions(threshold=10) @@ -35,9 +33,9 @@ or even any combination of the above. Often datasets like this cannot easily fit into a single :py:class:`xarray.Dataset` object, or are more usefully thought of as groups of related ``xarray.Dataset`` objects. -For this purpose we provide the :py:class:`DataTree` class. +For this purpose we provide the :py:class:`xarray.core.datatree.DataTree` class. -This page explains in detail how to understand and use the different features of the :py:class:`DataTree` class for your own hierarchical data needs. +This page explains in detail how to understand and use the different features of the :py:class:`xarray.core.datatree.DataTree` class for your own hierarchical data needs. .. _node relationships: @@ -59,7 +57,7 @@ Let's start by defining nodes representing the two siblings, Bart and Lisa Simps bart = DataTree(name="Bart") lisa = DataTree(name="Lisa") -Each of these node objects knows their own :py:class:`~DataTree.name`, but they currently have no relationship to one another. +Each of these node objects knows their own :py:class:`~xarray.core.datatree.DataTree.name`, but they currently have no relationship to one another. We can connect them by creating another node representing a common parent, Homer Simpson: .. ipython:: python @@ -74,13 +72,13 @@ We now have a small family tree homer where we can see how these individual Simpson family members are related to one another. -The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the :py:class:`~DataTree.siblings` property: +The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the :py:class:`~xarray.core.datatree.DataTree.siblings` property: .. ipython:: python list(bart.siblings) -But oops, we forgot Homer's third daughter, Maggie! Let's add her by updating Homer's :py:class:`~DataTree.children` property to include her: +But oops, we forgot Homer's third daughter, Maggie! Let's add her by updating Homer's :py:class:`~xarray.core.datatree.DataTree.children` property to include her: .. ipython:: python @@ -101,14 +99,14 @@ That's good - updating the properties of our nodes does not break the internal c the fact that distant relatives can mate makes it a directed acyclic graph. Trees of ``DataTree`` objects cannot represent this. -Homer is currently listed as having no parent (the so-called "root node" of this tree), but we can update his :py:class:`~DataTree.parent` property: +Homer is currently listed as having no parent (the so-called "root node" of this tree), but we can update his :py:class:`~xarray.core.datatree.DataTree.parent` property: .. ipython:: python abe = DataTree(name="Abe") homer.parent = abe -Abe is now the "root" of this tree, which we can see by examining the :py:class:`~DataTree.root` property of any node in the tree +Abe is now the "root" of this tree, which we can see by examining the :py:class:`~xarray.core.datatree.DataTree.root` property of any node in the tree .. ipython:: python @@ -124,7 +122,7 @@ We can see the whole tree by printing Abe's node or just part of the tree by pri We can see that Homer is aware of his parentage, and we say that Homer and his children form a "subtree" of the larger Simpson family tree. In episode 28, Abe Simpson reveals that he had another son, Herbert "Herb" Simpson. -We can add Herbert to the family tree without displacing Homer by :py:meth:`~DataTree.assign`-ing another child to Abe: +We can add Herbert to the family tree without displacing Homer by :py:meth:`~xarray.core.datatree.DataTree.assign`-ing another child to Abe: .. ipython:: python @@ -174,7 +172,7 @@ Let's use a different example of a tree to discuss more complex relationships be "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs" ] -We have used the :py:meth:`~DataTree.from_dict` constructor method as an alternate way to quickly create a whole tree, +We have used the :py:meth:`~xarray.core.datatree.DataTree.from_dict` constructor method as an alternate way to quickly create a whole tree, and :ref:`filesystem paths` (to be explained shortly) to select two nodes of interest. .. ipython:: python @@ -186,7 +184,7 @@ rather than an evolutionary tree). Here both the species and the features used to group them are represented by ``DataTree`` node objects - there is no distinction in types of node. We can however get a list of only the nodes we used to represent species by using the fact that all those nodes have no children - they are "leaf nodes". -We can check if a node is a leaf with :py:meth:`~DataTree.is_leaf`, and get a list of all leaves with the :py:class:`~DataTree.leaves` property: +We can check if a node is a leaf with :py:meth:`~xarray.core.datatree.DataTree.is_leaf`, and get a list of all leaves with the :py:class:`~xarray.core.datatree.DataTree.leaves` property: .. ipython:: python @@ -224,7 +222,7 @@ There are various ways to access the different nodes in a tree. Properties ~~~~~~~~~~ -We can navigate trees using the :py:class:`~DataTree.parent` and :py:class:`~DataTree.children` properties of each node, for example: +We can navigate trees using the :py:class:`~xarray.core.datatree.DataTree.parent` and :py:class:`~xarray.core.datatree.DataTree.children` properties of each node, for example: .. ipython:: python @@ -236,17 +234,17 @@ Dictionary-like interface ~~~~~~~~~~~~~~~~~~~~~~~~~ Children are stored on each node as a key-value mapping from name to child node. -They can be accessed and altered via the :py:class:`~DataTree.__getitem__` and :py:class:`~DataTree.__setitem__` syntax. -In general :py:class:`~DataTree.DataTree` objects support almost the entire set of dict-like methods, -including :py:meth:`~DataTree.keys`, :py:class:`~DataTree.values`, :py:class:`~DataTree.items`, -:py:meth:`~DataTree.__delitem__` and :py:meth:`~DataTree.update`. +They can be accessed and altered via the :py:class:`~xarray.core.datatree.DataTree.__getitem__` and :py:class:`~xarray.core.datatree.DataTree.__setitem__` syntax. +In general :py:class:`~xarray.core.datatree.DataTree.DataTree` objects support almost the entire set of dict-like methods, +including :py:meth:`~xarray.core.datatree.DataTree.keys`, :py:class:`~xarray.core.datatree.DataTree.values`, :py:class:`~xarray.core.datatree.DataTree.items`, +:py:meth:`~xarray.core.datatree.DataTree.__delitem__` and :py:meth:`~xarray.core.datatree.DataTree.update`. .. ipython:: python vertebrates["Bony Skeleton"]["Ray-finned Fish"] Note that the dict-like interface combines access to child ``DataTree`` nodes and stored ``DataArrays``, -so if we have a node that contains both children and data, calling :py:meth:`~DataTree.keys` will list both names of child nodes and +so if we have a node that contains both children and data, calling :py:meth:`~xarray.core.datatree.DataTree.keys` will list both names of child nodes and names of data variables: .. ipython:: python @@ -280,7 +278,10 @@ Each node is like a directory, and each directory can contain both more sub-dire .. note:: - You can even make the filesystem analogy concrete by using :py:func:`~DataTree.open_mfdatatree` or :py:func:`~DataTree.save_mfdatatree` # TODO not yet implemented - see GH issue 51 + Future development will allow you to make the filesystem analogy concrete by + using :py:func:`~xarray.core.datatree.DataTree.open_mfdatatree` or + :py:func:`~xarray.core.datatree.DataTree.save_mfdatatree`. + (`See related issue in GitHub `_) Datatree objects support a syntax inspired by unix-like filesystems, where the "path" to a node is specified by the keys of each intermediate node in sequence, @@ -338,14 +339,14 @@ we can construct a complex tree quickly using the alternative constructor :py:me Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path (i.e. the node labelled `"c"` in this case.) - This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`DataTree.from_dict`. + This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`xarray.core.datatree.DataTree.from_dict`. .. _iterating over trees: Iterating over trees ~~~~~~~~~~~~~~~~~~~~ -You can iterate over every node in a tree using the subtree :py:class:`~DataTree.subtree` property. +You can iterate over every node in a tree using the subtree :py:class:`~xarray.core.datatree.DataTree.subtree` property. This returns an iterable of nodes, which yields them in depth-first order. .. ipython:: python @@ -353,11 +354,11 @@ This returns an iterable of nodes, which yields them in depth-first order. for node in vertebrates.subtree: print(node.path) -A very useful pattern is to use :py:class:`~DataTree.subtree` conjunction with the :py:class:`~DataTree.path` property to manipulate the nodes however you wish, -then rebuild a new tree using :py:meth:`DataTree.from_dict()`. +A very useful pattern is to use :py:class:`~xarray.core.datatree.DataTree.subtree` conjunction with the :py:class:`~xarray.core.datatree.DataTree.path` property to manipulate the nodes however you wish, +then rebuild a new tree using :py:meth:`xarray.core.datatree.DataTree.from_dict()`. For example, we could keep only the nodes containing data by looping over all nodes, -checking if they contain any data using :py:class:`~DataTree.has_data`, +checking if they contain any data using :py:class:`~xarray.core.datatree.DataTree.has_data`, then rebuilding a new tree using only the paths of those nodes: .. ipython:: python @@ -380,7 +381,7 @@ Subsetting Tree Nodes We can subset our tree to select only nodes of interest in various ways. Similarly to on a real filesystem, matching nodes by common patterns in their paths is often useful. -We can use :py:meth:`DataTree.match` for this: +We can use :py:meth:`xarray.core.datatree.DataTree.match` for this: .. ipython:: python @@ -396,7 +397,7 @@ We can use :py:meth:`DataTree.match` for this: result We can also subset trees by the contents of the nodes. -:py:meth:`DataTree.filter` retains only the nodes of a tree that meet a certain condition. +:py:meth:`xarray.core.datatree.DataTree.filter` retains only the nodes of a tree that meet a certain condition. For example, we could recreate the Simpson's family tree with the ages of each individual, then filter for only the adults: First lets recreate the tree but with an `age` data variable in every node: @@ -423,7 +424,7 @@ Now let's filter out the minors: The result is a new tree, containing only the nodes matching the condition. -(Yes, under the hood :py:meth:`~DataTree.filter` is just syntactic sugar for the pattern we showed you in :ref:`iterating over trees` !) +(Yes, under the hood :py:meth:`~xarray.core.datatree.DataTree.filter` is just syntactic sugar for the pattern we showed you in :ref:`iterating over trees` !) .. _Tree Contents: @@ -436,7 +437,7 @@ Hollow Trees A concept that can sometimes be useful is that of a "Hollow Tree", which means a tree with data stored only at the leaf nodes. This is useful because certain useful tree manipulation operations only make sense for hollow trees. -You can check if a tree is a hollow tree by using the :py:class:`~DataTree.is_hollow` property. +You can check if a tree is a hollow tree by using the :py:class:`~xarray.core.datatree.DataTree.is_hollow` property. We can see that the Simpson's family is not hollow because the data variable ``"age"`` is present at some nodes which have children (i.e. Abe and Homer). @@ -449,7 +450,7 @@ have children (i.e. Abe and Homer). Computation ----------- -`DataTree` objects are also useful for performing computations, not just for organizing data. +``DataTree`` objects are also useful for performing computations, not just for organizing data. Operations and Methods on Trees ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -540,7 +541,7 @@ See that the same change (fast-forwarding by adding 10 years to the age of each Mapping Custom Functions Over Trees ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -You can map custom computation over each node in a tree using :py:meth:`DataTree.map_over_subtree`. +You can map custom computation over each node in a tree using :py:meth:`xarray.core.datatree.DataTree.map_over_subtree`. You can map any function, so long as it takes `xarray.Dataset` objects as one (or more) of the input arguments, and returns one (or more) xarray datasets. diff --git a/doc/user-guide/index.rst b/doc/user-guide/index.rst index 45f0ce352de..57e3ded3c9a 100644 --- a/doc/user-guide/index.rst +++ b/doc/user-guide/index.rst @@ -27,3 +27,5 @@ examples that describe many common tasks that you can accomplish with xarray. options testing duckarrays + datatree + hierarchical-data diff --git a/doc/user-guide/io.rst b/doc/user-guide/io.rst index b73d0fdcb51..b9133097e32 100644 --- a/doc/user-guide/io.rst +++ b/doc/user-guide/io.rst @@ -153,83 +153,31 @@ to the original netCDF file, regardless if they exist in the original dataset. Groups ~~~~~~ -NetCDF groups are not supported as part of the :py:class:`Dataset` data model. -Instead, groups can be loaded individually as Dataset objects. -To do so, pass a ``group`` keyword argument to the -:py:func:`open_dataset` function. The group can be specified as a path-like -string, e.g., to access subgroup ``'bar'`` within group ``'foo'`` pass -``'/foo/bar'`` as the ``group`` argument. - -In a similar way, the ``group`` keyword argument can be given to the -:py:meth:`Dataset.to_netcdf` method to write to a group -in a netCDF file. -When writing multiple groups in one file, pass ``mode='a'`` to -:py:meth:`Dataset.to_netcdf` to ensure that each call does not delete the file. -For example: - -.. ipython:: - :verbatim: - - In [1]: ds1 = xr.Dataset({"a": 0}) - - In [2]: ds2 = xr.Dataset({"b": 1}) - - In [3]: ds1.to_netcdf("file.nc", group="A") - - In [4]: ds2.to_netcdf("file.nc", group="B", mode="a") - -We can verify that two groups have been saved using the ncdump command-line utility. - -.. code:: bash - - $ ncdump file.nc - netcdf file { - - group: A { - variables: - int64 a ; - data: - - a = 0 ; - } // group A - - group: B { - variables: - int64 b ; - data: - - b = 1 ; - } // group B - } - -Either of these groups can be loaded from the file as an independent :py:class:`Dataset` object: - -.. ipython:: - :verbatim: - - In [1]: group1 = xr.open_dataset("file.nc", group="A") +Whilst netCDF groups can only be loaded individually as ``Dataset`` objects, a +whole file of many nested groups can be loaded as a single +:py:class:`xarray.core.datatree.DataTree` object. To open a whole netCDF file as a tree of groups +use the :py:func:`open_datatree` function. To save a DataTree object as a +netCDF file containing many groups, use the :py:meth:`xarray.core.datatree.DataTree.to_netcdf` method. - In [2]: group1 - Out[2]: - - Dimensions: () - Data variables: - a int64 ... - - In [3]: group2 = xr.open_dataset("file.nc", group="B") - - In [4]: group2 - Out[4]: - - Dimensions: () - Data variables: - b int64 ... -.. note:: - - For native handling of multiple groups with xarray, including I/O, you might be interested in the experimental - `xarray-datatree `_ package. +.. _netcdf.group.warning: +.. warning:: + ``DataTree`` objects do not follow the exact same data model as netCDF + files, which means that perfect round-tripping is not always possible. + + In particular in the netCDF data model dimensions are entities that can + exist regardless of whether any variable possesses them. This is in contrast + to `xarray's data model `_ + (and hence :ref:`datatree's data model `) in which the + dimensions of a (Dataset/Tree) object are simply the set of dimensions + present across all variables in that dataset. + + This means that if a netCDF file contains dimensions but no variables which + possess those dimensions, these dimensions will not be present when that + file is opened as a DataTree object. + Saving this DataTree object to file will therefore not preserve these + "unused" dimensions. .. _io.encoding: @@ -606,13 +554,6 @@ Natively the xarray data structures can only handle one level of nesting, organi DataArrays inside of Datasets. If your HDF5 file has additional levels of hierarchy you can only access one group and a time and will need to specify group names. -.. note:: - - For native handling of multiple HDF5 groups with xarray, including I/O, you might be - interested in the experimental - `xarray-datatree `_ package. - - .. _HDF5: https://hdfgroup.github.io/hdf5/index.html .. _h5py: https://www.h5py.org/ @@ -974,6 +915,20 @@ length of each dimension by using the shorthand chunk size ``-1``: The number of chunks on Tair matches our dask chunks, while there is now only a single chunk in the directory stores of each coordinate. +Groups +~~~~~~ + +Nested groups in zarr stores can be represented by loading the store as a +:py:class:`xarray.core.datatree.DataTree` object, similarly to netCDF. To open a whole zarr store as +a tree of groups use the :py:func:`open_datatree` function. To save a +``DataTree`` object as a zarr store containing many groups, use the +:py:meth:`xarray.core.datatree.DataTree.to_zarr()` method. + +.. note:: + Note that perfect round-tripping should always be possible with a zarr + store (:ref:`unlike for netCDF files `), as zarr does + not support "unused" dimensions. + .. _io.iris: Iris diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 55937310827..7adfa16dedd 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -255,3 +255,29 @@ complete examples, please consult the relevant documentation.* - Slicing: You can take a "slice" of your data, like you might want all temperatures from July 1st to July 10th. xarray supports slicing for both positional and label-based indexing. + + DataTree + A tree-like collection of ``Dataset`` objects. A *tree* is made up of one or more *nodes*, + each of which can store the same information as a single ``Dataset`` (accessed via ``.ds``). + This data is stored in the same way as in a ``Dataset``, i.e. in the form of data variables + (see **Variable** in the `corresponding xarray terminology page `_), + dimensions, coordinates, and attributes. + + The nodes in a tree are linked to one another, and each node is it's own instance of + ``DataTree`` object. Each node can have zero or more *children* (stored in a dictionary-like + manner under their corresponding *names*), and those child nodes can themselves have + children. If a node is a child of another node that other node is said to be its *parent*. + Nodes can have a maximum of one parent, and if a node has no parent it is said to be the + *root* node of that *tree*. + + Subtree + A section of a *tree*, consisting of a *node* along with all the child nodes below it + (and the child nodes below them, i.e. all so-called *descendant* nodes). + Excludes the parent node and all nodes above. + + Group + Another word for a subtree, reflecting how the hierarchical structure of a ``DataTree`` + allows for grouping related data together. + Analogous to a single + `netCDF group `_ + or `Zarr group `_. diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 49d39927f2b..6c2658669db 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -39,6 +39,11 @@ Bug fixes Documentation ~~~~~~~~~~~~~ +- Migrate documentation for ``datatree`` into main ``xarray`` documentation. + For information on previous ``datatree`` releases, please see: + `datatree's historical release notes `_. + By `Owen Littlejohns `_ and + `Tom Nicholas `_. Internal Changes ~~~~~~~~~~~~~~~~ @@ -128,6 +133,10 @@ Internal Changes xarray functions to use ``dim``. Using the existing kwarg will raise a warning. By `Maximilian Roos `_ + rather than ``dims`` or ``dimensions``. This is the final change to make xarray methods + consistent with their use of ``dim``. Using the existing kwarg will raise a + warning. By `Maximilian Roos `_ + .. _whats-new.2024.03.0: diff --git a/pyproject.toml b/pyproject.toml index db64d7a18c5..741d1990f54 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -88,7 +88,6 @@ exclude_lines = ["pragma: no cover", "if TYPE_CHECKING"] enable_error_code = "redundant-self" exclude = [ 'xarray/util/generate_.*\.py', - 'xarray/datatree_/doc/.*\.py', ] files = "xarray" show_error_codes = true diff --git a/xarray/core/datatree_ops.py b/xarray/core/datatree_ops.py index bc64b44ae1e..9a0e0c6da5a 100644 --- a/xarray/core/datatree_ops.py +++ b/xarray/core/datatree_ops.py @@ -15,7 +15,7 @@ _MAPPED_DOCSTRING_ADDENDUM = ( - "This method was copied from xarray.Dataset, but has been altered to " + "This method was copied from :py:class:`xarray.Dataset`, but has been altered to " "call the method on the Datasets stored in every node of the subtree. " "See the `map_over_subtree` function for more details." ) diff --git a/xarray/datatree_/docs/Makefile b/xarray/datatree_/docs/Makefile deleted file mode 100644 index 6e9b4058414..00000000000 --- a/xarray/datatree_/docs/Makefile +++ /dev/null @@ -1,183 +0,0 @@ -# Makefile for Sphinx documentation -# - -# You can set these variables from the command line. -SPHINXOPTS = -SPHINXBUILD = sphinx-build -PAPER = -BUILDDIR = _build - -# User-friendly check for sphinx-build -ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) -$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) -endif - -# Internal variables. -PAPEROPT_a4 = -D latex_paper_size=a4 -PAPEROPT_letter = -D latex_paper_size=letter -ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source -# the i18n builder cannot share the environment and doctrees with the others -I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source - -.PHONY: help clean html rtdhtml dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext - -help: - @echo "Please use \`make ' where is one of" - @echo " html to make standalone HTML files" - @echo " rtdhtml Build html using same settings used on ReadtheDocs" - @echo " dirhtml to make HTML files named index.html in directories" - @echo " singlehtml to make a single large HTML file" - @echo " pickle to make pickle files" - @echo " json to make JSON files" - @echo " htmlhelp to make HTML files and a HTML help project" - @echo " qthelp to make HTML files and a qthelp project" - @echo " devhelp to make HTML files and a Devhelp project" - @echo " epub to make an epub" - @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" - @echo " latexpdf to make LaTeX files and run them through pdflatex" - @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" - @echo " text to make text files" - @echo " man to make manual pages" - @echo " texinfo to make Texinfo files" - @echo " info to make Texinfo files and run them through makeinfo" - @echo " gettext to make PO message catalogs" - @echo " changes to make an overview of all changed/added/deprecated items" - @echo " xml to make Docutils-native XML files" - @echo " pseudoxml to make pseudoxml-XML files for display purposes" - @echo " linkcheck to check all external links for integrity" - @echo " doctest to run all doctests embedded in the documentation (if enabled)" - -clean: - rm -rf $(BUILDDIR)/* - -html: - $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html - @echo - @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." - -rtdhtml: - $(SPHINXBUILD) -T -j auto -E -W --keep-going -b html -d $(BUILDDIR)/doctrees -D language=en . $(BUILDDIR)/html - @echo - @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." - -dirhtml: - $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml - @echo - @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." - -singlehtml: - $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml - @echo - @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." - -pickle: - $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle - @echo - @echo "Build finished; now you can process the pickle files." - -json: - $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json - @echo - @echo "Build finished; now you can process the JSON files." - -htmlhelp: - $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp - @echo - @echo "Build finished; now you can run HTML Help Workshop with the" \ - ".hhp project file in $(BUILDDIR)/htmlhelp." - -qthelp: - $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp - @echo - @echo "Build finished; now you can run "qcollectiongenerator" with the" \ - ".qhcp project file in $(BUILDDIR)/qthelp, like this:" - @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/complexity.qhcp" - @echo "To view the help file:" - @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/complexity.qhc" - -devhelp: - $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp - @echo - @echo "Build finished." - @echo "To view the help file:" - @echo "# mkdir -p $$HOME/.local/share/devhelp/complexity" - @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/complexity" - @echo "# devhelp" - -epub: - $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub - @echo - @echo "Build finished. The epub file is in $(BUILDDIR)/epub." - -latex: - $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex - @echo - @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." - @echo "Run \`make' in that directory to run these through (pdf)latex" \ - "(use \`make latexpdf' here to do that automatically)." - -latexpdf: - $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex - @echo "Running LaTeX files through pdflatex..." - $(MAKE) -C $(BUILDDIR)/latex all-pdf - @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." - -latexpdfja: - $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex - @echo "Running LaTeX files through platex and dvipdfmx..." - $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja - @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." - -text: - $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text - @echo - @echo "Build finished. The text files are in $(BUILDDIR)/text." - -man: - $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man - @echo - @echo "Build finished. The manual pages are in $(BUILDDIR)/man." - -texinfo: - $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo - @echo - @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." - @echo "Run \`make' in that directory to run these through makeinfo" \ - "(use \`make info' here to do that automatically)." - -info: - $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo - @echo "Running Texinfo files through makeinfo..." - make -C $(BUILDDIR)/texinfo info - @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." - -gettext: - $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale - @echo - @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." - -changes: - $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes - @echo - @echo "The overview file is in $(BUILDDIR)/changes." - -linkcheck: - $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck - @echo - @echo "Link check complete; look for any errors in the above output " \ - "or in $(BUILDDIR)/linkcheck/output.txt." - -doctest: - $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest - @echo "Testing of doctests in the sources finished, look at the " \ - "results in $(BUILDDIR)/doctest/output.txt." - -xml: - $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml - @echo - @echo "Build finished. The XML files are in $(BUILDDIR)/xml." - -pseudoxml: - $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml - @echo - @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." diff --git a/xarray/datatree_/docs/README.md b/xarray/datatree_/docs/README.md deleted file mode 100644 index ca2bf72952e..00000000000 --- a/xarray/datatree_/docs/README.md +++ /dev/null @@ -1,14 +0,0 @@ -# README - docs - -## Build the documentation locally - -```bash -cd docs # From project's root -make clean -rm -rf source/generated # remove autodoc artefacts, that are not removed by `make clean` -make html -``` - -## Access the documentation locally - -Open `docs/_build/html/index.html` in a web browser diff --git a/xarray/datatree_/docs/make.bat b/xarray/datatree_/docs/make.bat deleted file mode 100644 index 2df9a8cbbb6..00000000000 --- a/xarray/datatree_/docs/make.bat +++ /dev/null @@ -1,242 +0,0 @@ -@ECHO OFF - -REM Command file for Sphinx documentation - -if "%SPHINXBUILD%" == "" ( - set SPHINXBUILD=sphinx-build -) -set BUILDDIR=_build -set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% . -set I18NSPHINXOPTS=%SPHINXOPTS% . -if NOT "%PAPER%" == "" ( - set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% - set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% -) - -if "%1" == "" goto help - -if "%1" == "help" ( - :help - echo.Please use `make ^` where ^ is one of - echo. html to make standalone HTML files - echo. dirhtml to make HTML files named index.html in directories - echo. singlehtml to make a single large HTML file - echo. pickle to make pickle files - echo. json to make JSON files - echo. htmlhelp to make HTML files and a HTML help project - echo. qthelp to make HTML files and a qthelp project - echo. devhelp to make HTML files and a Devhelp project - echo. epub to make an epub - echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter - echo. text to make text files - echo. man to make manual pages - echo. texinfo to make Texinfo files - echo. gettext to make PO message catalogs - echo. changes to make an overview over all changed/added/deprecated items - echo. xml to make Docutils-native XML files - echo. pseudoxml to make pseudoxml-XML files for display purposes - echo. linkcheck to check all external links for integrity - echo. doctest to run all doctests embedded in the documentation if enabled - goto end -) - -if "%1" == "clean" ( - for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i - del /q /s %BUILDDIR%\* - goto end -) - - -%SPHINXBUILD% 2> nul -if errorlevel 9009 ( - echo. - echo.The 'sphinx-build' command was not found. Make sure you have Sphinx - echo.installed, then set the SPHINXBUILD environment variable to point - echo.to the full path of the 'sphinx-build' executable. Alternatively you - echo.may add the Sphinx directory to PATH. - echo. - echo.If you don't have Sphinx installed, grab it from - echo.http://sphinx-doc.org/ - exit /b 1 -) - -if "%1" == "html" ( - %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The HTML pages are in %BUILDDIR%/html. - goto end -) - -if "%1" == "dirhtml" ( - %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. - goto end -) - -if "%1" == "singlehtml" ( - %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. - goto end -) - -if "%1" == "pickle" ( - %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle - if errorlevel 1 exit /b 1 - echo. - echo.Build finished; now you can process the pickle files. - goto end -) - -if "%1" == "json" ( - %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json - if errorlevel 1 exit /b 1 - echo. - echo.Build finished; now you can process the JSON files. - goto end -) - -if "%1" == "htmlhelp" ( - %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp - if errorlevel 1 exit /b 1 - echo. - echo.Build finished; now you can run HTML Help Workshop with the ^ -.hhp project file in %BUILDDIR%/htmlhelp. - goto end -) - -if "%1" == "qthelp" ( - %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp - if errorlevel 1 exit /b 1 - echo. - echo.Build finished; now you can run "qcollectiongenerator" with the ^ -.qhcp project file in %BUILDDIR%/qthelp, like this: - echo.^> qcollectiongenerator %BUILDDIR%\qthelp\complexity.qhcp - echo.To view the help file: - echo.^> assistant -collectionFile %BUILDDIR%\qthelp\complexity.ghc - goto end -) - -if "%1" == "devhelp" ( - %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. - goto end -) - -if "%1" == "epub" ( - %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The epub file is in %BUILDDIR%/epub. - goto end -) - -if "%1" == "latex" ( - %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex - if errorlevel 1 exit /b 1 - echo. - echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. - goto end -) - -if "%1" == "latexpdf" ( - %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex - cd %BUILDDIR%/latex - make all-pdf - cd %BUILDDIR%/.. - echo. - echo.Build finished; the PDF files are in %BUILDDIR%/latex. - goto end -) - -if "%1" == "latexpdfja" ( - %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex - cd %BUILDDIR%/latex - make all-pdf-ja - cd %BUILDDIR%/.. - echo. - echo.Build finished; the PDF files are in %BUILDDIR%/latex. - goto end -) - -if "%1" == "text" ( - %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The text files are in %BUILDDIR%/text. - goto end -) - -if "%1" == "man" ( - %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The manual pages are in %BUILDDIR%/man. - goto end -) - -if "%1" == "texinfo" ( - %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. - goto end -) - -if "%1" == "gettext" ( - %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The message catalogs are in %BUILDDIR%/locale. - goto end -) - -if "%1" == "changes" ( - %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes - if errorlevel 1 exit /b 1 - echo. - echo.The overview file is in %BUILDDIR%/changes. - goto end -) - -if "%1" == "linkcheck" ( - %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck - if errorlevel 1 exit /b 1 - echo. - echo.Link check complete; look for any errors in the above output ^ -or in %BUILDDIR%/linkcheck/output.txt. - goto end -) - -if "%1" == "doctest" ( - %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest - if errorlevel 1 exit /b 1 - echo. - echo.Testing of doctests in the sources finished, look at the ^ -results in %BUILDDIR%/doctest/output.txt. - goto end -) - -if "%1" == "xml" ( - %SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The XML files are in %BUILDDIR%/xml. - goto end -) - -if "%1" == "pseudoxml" ( - %SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml - if errorlevel 1 exit /b 1 - echo. - echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml. - goto end -) - -:end diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst deleted file mode 100644 index d325d24f4a4..00000000000 --- a/xarray/datatree_/docs/source/api.rst +++ /dev/null @@ -1,362 +0,0 @@ -.. currentmodule:: datatree - -############# -API reference -############# - -DataTree -======== - -Creating a DataTree -------------------- - -Methods of creating a datatree. - -.. autosummary:: - :toctree: generated/ - - DataTree - DataTree.from_dict - -Tree Attributes ---------------- - -Attributes relating to the recursive tree-like structure of a ``DataTree``. - -.. autosummary:: - :toctree: generated/ - - DataTree.parent - DataTree.children - DataTree.name - DataTree.path - DataTree.root - DataTree.is_root - DataTree.is_leaf - DataTree.leaves - DataTree.level - DataTree.depth - DataTree.width - DataTree.subtree - DataTree.descendants - DataTree.siblings - DataTree.lineage - DataTree.parents - DataTree.ancestors - DataTree.groups - -Data Contents -------------- - -Interface to the data objects (optionally) stored inside a single ``DataTree`` node. -This interface echoes that of ``xarray.Dataset``. - -.. autosummary:: - :toctree: generated/ - - DataTree.dims - DataTree.sizes - DataTree.data_vars - DataTree.coords - DataTree.attrs - DataTree.encoding - DataTree.indexes - DataTree.nbytes - DataTree.ds - DataTree.to_dataset - DataTree.has_data - DataTree.has_attrs - DataTree.is_empty - DataTree.is_hollow - -Dictionary Interface --------------------- - -``DataTree`` objects also have a dict-like interface mapping keys to either ``xarray.DataArray``s or to child ``DataTree`` nodes. - -.. autosummary:: - :toctree: generated/ - - DataTree.__getitem__ - DataTree.__setitem__ - DataTree.__delitem__ - DataTree.update - DataTree.get - DataTree.items - DataTree.keys - DataTree.values - -Tree Manipulation ------------------ - -For manipulating, traversing, navigating, or mapping over the tree structure. - -.. autosummary:: - :toctree: generated/ - - DataTree.orphan - DataTree.same_tree - DataTree.relative_to - DataTree.iter_lineage - DataTree.find_common_ancestor - DataTree.map_over_subtree - map_over_subtree - DataTree.pipe - DataTree.match - DataTree.filter - -Pathlib-like Interface ----------------------- - -``DataTree`` objects deliberately echo some of the API of `pathlib.PurePath`. - -.. autosummary:: - :toctree: generated/ - - DataTree.name - DataTree.parent - DataTree.parents - DataTree.relative_to - -Missing: - -.. - - ``DataTree.glob`` - ``DataTree.joinpath`` - ``DataTree.with_name`` - ``DataTree.walk`` - ``DataTree.rename`` - ``DataTree.replace`` - -DataTree Contents ------------------ - -Manipulate the contents of all nodes in a tree simultaneously. - -.. autosummary:: - :toctree: generated/ - - DataTree.copy - DataTree.assign_coords - DataTree.merge - DataTree.rename - DataTree.rename_vars - DataTree.rename_dims - DataTree.swap_dims - DataTree.expand_dims - DataTree.drop_vars - DataTree.drop_dims - DataTree.set_coords - DataTree.reset_coords - -DataTree Node Contents ----------------------- - -Manipulate the contents of a single DataTree node. - -.. autosummary:: - :toctree: generated/ - - DataTree.assign - DataTree.drop_nodes - -Comparisons -=========== - -Compare one ``DataTree`` object to another. - -.. autosummary:: - :toctree: generated/ - - DataTree.isomorphic - DataTree.equals - DataTree.identical - -Indexing -======== - -Index into all nodes in the subtree simultaneously. - -.. autosummary:: - :toctree: generated/ - - DataTree.isel - DataTree.sel - DataTree.drop_sel - DataTree.drop_isel - DataTree.head - DataTree.tail - DataTree.thin - DataTree.squeeze - DataTree.interp - DataTree.interp_like - DataTree.reindex - DataTree.reindex_like - DataTree.set_index - DataTree.reset_index - DataTree.reorder_levels - DataTree.query - -.. - - Missing: - ``DataTree.loc`` - - -Missing Value Handling -====================== - -.. autosummary:: - :toctree: generated/ - - DataTree.isnull - DataTree.notnull - DataTree.combine_first - DataTree.dropna - DataTree.fillna - DataTree.ffill - DataTree.bfill - DataTree.interpolate_na - DataTree.where - DataTree.isin - -Computation -=========== - -Apply a computation to the data in all nodes in the subtree simultaneously. - -.. autosummary:: - :toctree: generated/ - - DataTree.map - DataTree.reduce - DataTree.diff - DataTree.quantile - DataTree.differentiate - DataTree.integrate - DataTree.map_blocks - DataTree.polyfit - DataTree.curvefit - -Aggregation -=========== - -Aggregate data in all nodes in the subtree simultaneously. - -.. autosummary:: - :toctree: generated/ - - DataTree.all - DataTree.any - DataTree.argmax - DataTree.argmin - DataTree.idxmax - DataTree.idxmin - DataTree.max - DataTree.min - DataTree.mean - DataTree.median - DataTree.prod - DataTree.sum - DataTree.std - DataTree.var - DataTree.cumsum - DataTree.cumprod - -ndarray methods -=============== - -Methods copied from :py:class:`numpy.ndarray` objects, here applying to the data in all nodes in the subtree. - -.. autosummary:: - :toctree: generated/ - - DataTree.argsort - DataTree.astype - DataTree.clip - DataTree.conj - DataTree.conjugate - DataTree.round - DataTree.rank - -Reshaping and reorganising -========================== - -Reshape or reorganise the data in all nodes in the subtree. - -.. autosummary:: - :toctree: generated/ - - DataTree.transpose - DataTree.stack - DataTree.unstack - DataTree.shift - DataTree.roll - DataTree.pad - DataTree.sortby - DataTree.broadcast_like - -Plotting -======== - -I/O -=== - -Open a datatree from an on-disk store or serialize the tree. - -.. autosummary:: - :toctree: generated/ - - open_datatree - DataTree.to_dict - DataTree.to_netcdf - DataTree.to_zarr - -.. - - Missing: - ``open_mfdatatree`` - -Tutorial -======== - -Testing -======= - -Test that two DataTree objects are similar. - -.. autosummary:: - :toctree: generated/ - - testing.assert_isomorphic - testing.assert_equal - testing.assert_identical - -Exceptions -========== - -Exceptions raised when manipulating trees. - -.. autosummary:: - :toctree: generated/ - - TreeIsomorphismError - InvalidTreeError - NotFoundInTreeError - -Advanced API -============ - -Relatively advanced API for users or developers looking to understand the internals, or extend functionality. - -.. autosummary:: - :toctree: generated/ - - DataTree.variables - register_datatree_accessor - -.. - - Missing: - ``DataTree.set_close`` diff --git a/xarray/datatree_/docs/source/conf.py b/xarray/datatree_/docs/source/conf.py deleted file mode 100644 index 430dbb5bf6d..00000000000 --- a/xarray/datatree_/docs/source/conf.py +++ /dev/null @@ -1,412 +0,0 @@ -# -*- coding: utf-8 -*- -# flake8: noqa -# Ignoring F401: imported but unused - -# complexity documentation build configuration file, created by -# sphinx-quickstart on Tue Jul 9 22:26:36 2013. -# -# This file is execfile()d with the current directory set to its containing dir. -# -# Note that not all possible configuration values are present in this -# autogenerated file. -# -# All configuration values have a default; values that are commented out -# serve to show the default. - -import inspect -import os -import sys - -import sphinx_autosummary_accessors # type: ignore - -import datatree # type: ignore - -# If extensions (or modules to document with autodoc) are in another directory, -# add these directories to sys.path here. If the directory is relative to the -# documentation root, use os.path.abspath to make it absolute, like shown here. -# sys.path.insert(0, os.path.abspath('.')) - -cwd = os.getcwd() -parent = os.path.dirname(cwd) -sys.path.insert(0, parent) - - -# -- General configuration ----------------------------------------------------- - -# If your documentation needs a minimal Sphinx version, state it here. -# needs_sphinx = '1.0' - -# Add any Sphinx extension module names here, as strings. They can be extensions -# coming with Sphinx (named 'sphinx.ext.*') or your custom ones. -extensions = [ - "sphinx.ext.autodoc", - "sphinx.ext.viewcode", - "sphinx.ext.linkcode", - "sphinx.ext.autosummary", - "sphinx.ext.intersphinx", - "sphinx.ext.extlinks", - "sphinx.ext.napoleon", - "sphinx_copybutton", - "sphinxext.opengraph", - "sphinx_autosummary_accessors", - "IPython.sphinxext.ipython_console_highlighting", - "IPython.sphinxext.ipython_directive", - "nbsphinx", - "sphinxcontrib.srclinks", -] - -extlinks = { - "issue": ("https://github.com/xarray-contrib/datatree/issues/%s", "GH#%s"), - "pull": ("https://github.com/xarray-contrib/datatree/pull/%s", "GH#%s"), -} -# Add any paths that contain templates here, relative to this directory. -templates_path = ["_templates", sphinx_autosummary_accessors.templates_path] - -# Generate the API documentation when building -autosummary_generate = True - - -# Napoleon configurations - -napoleon_google_docstring = False -napoleon_numpy_docstring = True -napoleon_use_param = False -napoleon_use_rtype = False -napoleon_preprocess_types = True -napoleon_type_aliases = { - # general terms - "sequence": ":term:`sequence`", - "iterable": ":term:`iterable`", - "callable": ":py:func:`callable`", - "dict_like": ":term:`dict-like `", - "dict-like": ":term:`dict-like `", - "path-like": ":term:`path-like `", - "mapping": ":term:`mapping`", - "file-like": ":term:`file-like `", - # special terms - # "same type as caller": "*same type as caller*", # does not work, yet - # "same type as values": "*same type as values*", # does not work, yet - # stdlib type aliases - "MutableMapping": "~collections.abc.MutableMapping", - "sys.stdout": ":obj:`sys.stdout`", - "timedelta": "~datetime.timedelta", - "string": ":class:`string `", - # numpy terms - "array_like": ":term:`array_like`", - "array-like": ":term:`array-like `", - "scalar": ":term:`scalar`", - "array": ":term:`array`", - "hashable": ":term:`hashable `", - # matplotlib terms - "color-like": ":py:func:`color-like `", - "matplotlib colormap name": ":doc:`matplotlib colormap name `", - "matplotlib axes object": ":py:class:`matplotlib axes object `", - "colormap": ":py:class:`colormap `", - # objects without namespace: xarray - "DataArray": "~xarray.DataArray", - "Dataset": "~xarray.Dataset", - "Variable": "~xarray.Variable", - "DatasetGroupBy": "~xarray.core.groupby.DatasetGroupBy", - "DataArrayGroupBy": "~xarray.core.groupby.DataArrayGroupBy", - # objects without namespace: numpy - "ndarray": "~numpy.ndarray", - "MaskedArray": "~numpy.ma.MaskedArray", - "dtype": "~numpy.dtype", - "ComplexWarning": "~numpy.ComplexWarning", - # objects without namespace: pandas - "Index": "~pandas.Index", - "MultiIndex": "~pandas.MultiIndex", - "CategoricalIndex": "~pandas.CategoricalIndex", - "TimedeltaIndex": "~pandas.TimedeltaIndex", - "DatetimeIndex": "~pandas.DatetimeIndex", - "Series": "~pandas.Series", - "DataFrame": "~pandas.DataFrame", - "Categorical": "~pandas.Categorical", - "Path": "~~pathlib.Path", - # objects with abbreviated namespace (from pandas) - "pd.Index": "~pandas.Index", - "pd.NaT": "~pandas.NaT", -} - -# The suffix of source filenames. -source_suffix = ".rst" - -# The encoding of source files. -# source_encoding = 'utf-8-sig' - -# The master toctree document. -master_doc = "index" - -# General information about the project. -project = "Datatree" -copyright = "2021 onwards, Tom Nicholas and its Contributors" -author = "Tom Nicholas" - -html_show_sourcelink = True -srclink_project = "https://github.com/xarray-contrib/datatree" -srclink_branch = "main" -srclink_src_path = "docs/source" - -# The version info for the project you're documenting, acts as replacement for -# |version| and |release|, also used in various other places throughout the -# built documents. -# -# The short X.Y version. -version = datatree.__version__ -# The full version, including alpha/beta/rc tags. -release = datatree.__version__ - -# The language for content autogenerated by Sphinx. Refer to documentation -# for a list of supported languages. -# language = None - -# There are two options for replacing |today|: either, you set today to some -# non-false value, then it is used: -# today = '' -# Else, today_fmt is used as the format for a strftime call. -# today_fmt = '%B %d, %Y' - -# List of patterns, relative to source directory, that match files and -# directories to ignore when looking for source files. -exclude_patterns = ["_build"] - -# The reST default role (used for this markup: `text`) to use for all documents. -# default_role = None - -# If true, '()' will be appended to :func: etc. cross-reference text. -# add_function_parentheses = True - -# If true, the current module name will be prepended to all description -# unit titles (such as .. function::). -# add_module_names = True - -# If true, sectionauthor and moduleauthor directives will be shown in the -# output. They are ignored by default. -# show_authors = False - -# The name of the Pygments (syntax highlighting) style to use. -pygments_style = "sphinx" - -# A list of ignored prefixes for module index sorting. -# modindex_common_prefix = [] - -# If true, keep warnings as "system message" paragraphs in the built documents. -# keep_warnings = False - - -# -- Intersphinx links --------------------------------------------------------- - -intersphinx_mapping = { - "python": ("https://docs.python.org/3.8/", None), - "numpy": ("https://numpy.org/doc/stable", None), - "xarray": ("https://xarray.pydata.org/en/stable/", None), -} - -# -- Options for HTML output --------------------------------------------------- - -# The theme to use for HTML and HTML Help pages. See the documentation for -# a list of builtin themes. -html_theme = "sphinx_book_theme" - -# Theme options are theme-specific and customize the look and feel of a theme -# further. For a list of options available for each theme, see the -# documentation. -html_theme_options = { - "repository_url": "https://github.com/xarray-contrib/datatree", - "repository_branch": "main", - "path_to_docs": "docs/source", - "use_repository_button": True, - "use_issues_button": True, - "use_edit_page_button": True, -} - -# Add any paths that contain custom themes here, relative to this directory. -# html_theme_path = [] - -# The name for this set of Sphinx documents. If None, it defaults to -# " v documentation". -# html_title = None - -# A shorter title for the navigation bar. Default is the same as html_title. -# html_short_title = None - -# The name of an image file (relative to this directory) to place at the top -# of the sidebar. -# html_logo = None - -# The name of an image file (within the static path) to use as favicon of the -# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 -# pixels large. -# html_favicon = None - -# If not '', a 'Last updated on:' timestamp is inserted at every page bottom, -# using the given strftime format. -# html_last_updated_fmt = '%b %d, %Y' - -# If true, SmartyPants will be used to convert quotes and dashes to -# typographically correct entities. -# html_use_smartypants = True - -# Custom sidebar templates, maps document names to template names. -# html_sidebars = {} - -# Additional templates that should be rendered to pages, maps page names to -# template names. -# html_additional_pages = {} - -# If false, no module index is generated. -# html_domain_indices = True - -# If false, no index is generated. -# html_use_index = True - -# If true, the index is split into individual pages for each letter. -# html_split_index = False - -# If true, links to the reST sources are added to the pages. -# html_show_sourcelink = True - -# If true, "Created using Sphinx" is shown in the HTML footer. Default is True. -# html_show_sphinx = True - -# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. -# html_show_copyright = True - -# If true, an OpenSearch description file will be output, and all pages will -# contain a tag referring to it. The value of this option must be the -# base URL from which the finished HTML is served. -# html_use_opensearch = '' - -# This is the file name suffix for HTML files (e.g. ".xhtml"). -# html_file_suffix = None - -# Output file base name for HTML help builder. -htmlhelp_basename = "datatree_doc" - - -# -- Options for LaTeX output -------------------------------------------------- - -latex_elements: dict = { - # The paper size ('letterpaper' or 'a4paper'). - # 'papersize': 'letterpaper', - # The font size ('10pt', '11pt' or '12pt'). - # 'pointsize': '10pt', - # Additional stuff for the LaTeX preamble. - # 'preamble': '', -} - -# Grouping the document tree into LaTeX files. List of tuples -# (source start file, target name, title, author, documentclass [howto/manual]). -latex_documents = [ - ("index", "datatree.tex", "Datatree Documentation", author, "manual") -] - -# The name of an image file (relative to this directory) to place at the top of -# the title page. -# latex_logo = None - -# For "manual" documents, if this is true, then toplevel headings are parts, -# not chapters. -# latex_use_parts = False - -# If true, show page references after internal links. -# latex_show_pagerefs = False - -# If true, show URL addresses after external links. -# latex_show_urls = False - -# Documents to append as an appendix to all manuals. -# latex_appendices = [] - -# If false, no module index is generated. -# latex_domain_indices = True - - -# -- Options for manual page output -------------------------------------------- - -# One entry per manual page. List of tuples -# (source start file, name, description, authors, manual section). -man_pages = [("index", "datatree", "Datatree Documentation", [author], 1)] - -# If true, show URL addresses after external links. -# man_show_urls = False - - -# -- Options for Texinfo output ------------------------------------------------ - -# Grouping the document tree into Texinfo files. List of tuples -# (source start file, target name, title, author, -# dir menu entry, description, category) -texinfo_documents = [ - ( - "index", - "datatree", - "Datatree Documentation", - author, - "datatree", - "Tree-like hierarchical data structure for xarray.", - "Miscellaneous", - ) -] - -# Documents to append as an appendix to all manuals. -# texinfo_appendices = [] - -# If false, no module index is generated. -# texinfo_domain_indices = True - -# How to display URL addresses: 'footnote', 'no', or 'inline'. -# texinfo_show_urls = 'footnote' - -# If true, do not generate a @detailmenu in the "Top" node's menu. -# texinfo_no_detailmenu = False - - -# based on numpy doc/source/conf.py -def linkcode_resolve(domain, info): - """ - Determine the URL corresponding to Python object - """ - if domain != "py": - return None - - modname = info["module"] - fullname = info["fullname"] - - submod = sys.modules.get(modname) - if submod is None: - return None - - obj = submod - for part in fullname.split("."): - try: - obj = getattr(obj, part) - except AttributeError: - return None - - try: - fn = inspect.getsourcefile(inspect.unwrap(obj)) - except TypeError: - fn = None - if not fn: - return None - - try: - source, lineno = inspect.getsourcelines(obj) - except OSError: - lineno = None - - if lineno: - linespec = f"#L{lineno}-L{lineno + len(source) - 1}" - else: - linespec = "" - - fn = os.path.relpath(fn, start=os.path.dirname(datatree.__file__)) - - if "+" in datatree.__version__: - return f"https://github.com/xarray-contrib/datatree/blob/main/datatree/{fn}{linespec}" - else: - return ( - f"https://github.com/xarray-contrib/datatree/blob/" - f"v{datatree.__version__}/datatree/{fn}{linespec}" - ) diff --git a/xarray/datatree_/docs/source/contributing.rst b/xarray/datatree_/docs/source/contributing.rst deleted file mode 100644 index b070c07c867..00000000000 --- a/xarray/datatree_/docs/source/contributing.rst +++ /dev/null @@ -1,136 +0,0 @@ -======================== -Contributing to Datatree -======================== - -Contributions are highly welcomed and appreciated. Every little help counts, -so do not hesitate! - -.. contents:: Contribution links - :depth: 2 - -.. _submitfeedback: - -Feature requests and feedback ------------------------------ - -Do you like Datatree? Share some love on Twitter or in your blog posts! - -We'd also like to hear about your propositions and suggestions. Feel free to -`submit them as issues `_ and: - -* Explain in detail how they should work. -* Keep the scope as narrow as possible. This will make it easier to implement. - -.. _reportbugs: - -Report bugs ------------ - -Report bugs for Datatree in the `issue tracker `_. - -If you are reporting a bug, please include: - -* Your operating system name and version. -* Any details about your local setup that might be helpful in troubleshooting, - specifically the Python interpreter version, installed libraries, and Datatree - version. -* Detailed steps to reproduce the bug. - -If you can write a demonstration test that currently fails but should pass -(xfail), that is a very useful commit to make as well, even if you cannot -fix the bug itself. - -.. _fixbugs: - -Fix bugs --------- - -Look through the `GitHub issues for bugs `_. - -Talk to developers to find out how you can fix specific bugs. - -Write documentation -------------------- - -Datatree could always use more documentation. What exactly is needed? - -* More complementary documentation. Have you perhaps found something unclear? -* Docstrings. There can never be too many of them. -* Blog posts, articles and such -- they're all very appreciated. - -You can also edit documentation files directly in the GitHub web interface, -without using a local copy. This can be convenient for small fixes. - -To build the documentation locally, you first need to install the following -tools: - -- `Sphinx `__ -- `sphinx_rtd_theme `__ -- `sphinx-autosummary-accessors `__ - -You can then build the documentation with the following commands:: - - $ cd docs - $ make html - -The built documentation should be available in the ``docs/_build/`` folder. - -.. _`pull requests`: -.. _pull-requests: - -Preparing Pull Requests ------------------------ - -#. Fork the - `Datatree GitHub repository `__. It's - fine to use ``Datatree`` as your fork repository name because it will live - under your user. - -#. Clone your fork locally using `git `_ and create a branch:: - - $ git clone git@github.com:{YOUR_GITHUB_USERNAME}/Datatree.git - $ cd Datatree - - # now, to fix a bug or add feature create your own branch off "master": - - $ git checkout -b your-bugfix-feature-branch-name master - -#. Install `pre-commit `_ and its hook on the Datatree repo:: - - $ pip install --user pre-commit - $ pre-commit install - - Afterwards ``pre-commit`` will run whenever you commit. - - https://pre-commit.com/ is a framework for managing and maintaining multi-language pre-commit hooks - to ensure code-style and code formatting is consistent. - -#. Install dependencies into a new conda environment:: - - $ conda env update -f ci/environment.yml - -#. Run all the tests - - Now running tests is as simple as issuing this command:: - - $ conda activate datatree-dev - $ pytest --junitxml=test-reports/junit.xml --cov=./ --verbose - - This command will run tests via the "pytest" tool. - -#. You can now edit your local working copy and run the tests again as necessary. Please follow PEP-8 for naming. - - When committing, ``pre-commit`` will re-format the files if necessary. - -#. Commit and push once your tests pass and you are happy with your change(s):: - - $ git commit -a -m "" - $ git push -u - -#. Finally, submit a pull request through the GitHub website using this data:: - - head-fork: YOUR_GITHUB_USERNAME/Datatree - compare: your-branch-name - - base-fork: TomNicholas/datatree - base: master diff --git a/xarray/datatree_/docs/source/data-structures.rst b/xarray/datatree_/docs/source/data-structures.rst deleted file mode 100644 index 02e4a31f688..00000000000 --- a/xarray/datatree_/docs/source/data-structures.rst +++ /dev/null @@ -1,197 +0,0 @@ -.. currentmodule:: datatree - -.. _data structures: - -Data Structures -=============== - -.. ipython:: python - :suppress: - - import numpy as np - import pandas as pd - import xarray as xr - import datatree - - np.random.seed(123456) - np.set_printoptions(threshold=10) - - %xmode minimal - -.. note:: - - This page builds on the information given in xarray's main page on - `data structures `_, so it is suggested that you - are familiar with those first. - -DataTree --------- - -:py:class:`DataTree` is xarray's highest-level data structure, able to organise heterogeneous data which -could not be stored inside a single :py:class:`Dataset` object. This includes representing the recursive structure of multiple -`groups`_ within a netCDF file or `Zarr Store`_. - -.. _groups: https://www.unidata.ucar.edu/software/netcdf/workshops/2011/groups-types/GroupsIntro.html -.. _Zarr Store: https://zarr.readthedocs.io/en/stable/tutorial.html#groups - -Each ``DataTree`` object (or "node") contains the same data that a single ``xarray.Dataset`` would (i.e. ``DataArray`` objects -stored under hashable keys), and so has the same key properties: - -- ``dims``: a dictionary mapping of dimension names to lengths, for the variables in this node, -- ``data_vars``: a dict-like container of DataArrays corresponding to variables in this node, -- ``coords``: another dict-like container of DataArrays, corresponding to coordinate variables in this node, -- ``attrs``: dict to hold arbitary metadata relevant to data in this node. - -A single ``DataTree`` object acts much like a single ``Dataset`` object, and has a similar set of dict-like methods -defined upon it. However, ``DataTree``'s can also contain other ``DataTree`` objects, so they can be thought of as nested dict-like -containers of both ``xarray.DataArray``'s and ``DataTree``'s. - -A single datatree object is known as a "node", and its position relative to other nodes is defined by two more key -properties: - -- ``children``: An ordered dictionary mapping from names to other ``DataTree`` objects, known as its' "child nodes". -- ``parent``: The single ``DataTree`` object whose children this datatree is a member of, known as its' "parent node". - -Each child automatically knows about its parent node, and a node without a parent is known as a "root" node -(represented by the ``parent`` attribute pointing to ``None``). -Nodes can have multiple children, but as each child node has at most one parent, there can only ever be one root node in a given tree. - -The overall structure is technically a `connected acyclic undirected rooted graph`, otherwise known as a -`"Tree" `_. - -.. note:: - - Technically a ``DataTree`` with more than one child node forms an `"Ordered Tree" `_, - because the children are stored in an Ordered Dictionary. However, this distinction only really matters for a few - edge cases involving operations on multiple trees simultaneously, and can safely be ignored by most users. - - -``DataTree`` objects can also optionally have a ``name`` as well as ``attrs``, just like a ``DataArray``. -Again these are not normally used unless explicitly accessed by the user. - - -.. _creating a datatree: - -Creating a DataTree -~~~~~~~~~~~~~~~~~~~ - -One way to create a ``DataTree`` from scratch is to create each node individually, -specifying the nodes' relationship to one another as you create each one. - -The ``DataTree`` constructor takes: - -- ``data``: The data that will be stored in this node, represented by a single ``xarray.Dataset``, or a named ``xarray.DataArray``. -- ``parent``: The parent node (if there is one), given as a ``DataTree`` object. -- ``children``: The various child nodes (if there are any), given as a mapping from string keys to ``DataTree`` objects. -- ``name``: A string to use as the name of this node. - -Let's make a single datatree node with some example data in it: - -.. ipython:: python - - from datatree import DataTree - - ds1 = xr.Dataset({"foo": "orange"}) - dt = DataTree(name="root", data=ds1) # create root node - - dt - -At this point our node is also the root node, as every tree has a root node. - -We can add a second node to this tree either by referring to the first node in the constructor of the second: - -.. ipython:: python - - ds2 = xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}) - # add a child by referring to the parent node - node2 = DataTree(name="a", parent=dt, data=ds2) - -or by dynamically updating the attributes of one node to refer to another: - -.. ipython:: python - - # add a second child by first creating a new node ... - ds3 = xr.Dataset({"zed": np.NaN}) - node3 = DataTree(name="b", data=ds3) - # ... then updating its .parent property - node3.parent = dt - -Our tree now has three nodes within it: - -.. ipython:: python - - dt - -It is at tree construction time that consistency checks are enforced. For instance, if we try to create a `cycle` the constructor will raise an error: - -.. ipython:: python - :okexcept: - - dt.parent = node3 - -Alternatively you can also create a ``DataTree`` object from - -- An ``xarray.Dataset`` using ``Dataset.to_node()`` (not yet implemented), -- A dictionary mapping directory-like paths to either ``DataTree`` nodes or data, using :py:meth:`DataTree.from_dict()`, -- A netCDF or Zarr file on disk with :py:func:`open_datatree()`. See :ref:`reading and writing files `. - - -DataTree Contents -~~~~~~~~~~~~~~~~~ - -Like ``xarray.Dataset``, ``DataTree`` implements the python mapping interface, but with values given by either ``xarray.DataArray`` objects or other ``DataTree`` objects. - -.. ipython:: python - - dt["a"] - dt["foo"] - -Iterating over keys will iterate over both the names of variables and child nodes. - -We can also access all the data in a single node through a dataset-like view - -.. ipython:: python - - dt["a"].ds - -This demonstrates the fact that the data in any one node is equivalent to the contents of a single ``xarray.Dataset`` object. -The ``DataTree.ds`` property returns an immutable view, but we can instead extract the node's data contents as a new (and mutable) -``xarray.Dataset`` object via :py:meth:`DataTree.to_dataset()`: - -.. ipython:: python - - dt["a"].to_dataset() - -Like with ``Dataset``, you can access the data and coordinate variables of a node separately via the ``data_vars`` and ``coords`` attributes: - -.. ipython:: python - - dt["a"].data_vars - dt["a"].coords - - -Dictionary-like methods -~~~~~~~~~~~~~~~~~~~~~~~ - -We can update a datatree in-place using Python's standard dictionary syntax, similar to how we can for Dataset objects. -For example, to create this example datatree from scratch, we could have written: - -# TODO update this example using ``.coords`` and ``.data_vars`` as setters, - -.. ipython:: python - - dt = DataTree(name="root") - dt["foo"] = "orange" - dt["a"] = DataTree(data=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])})) - dt["a/b/zed"] = np.NaN - dt - -To change the variables in a node of a ``DataTree``, you can use all the standard dictionary -methods, including ``values``, ``items``, ``__delitem__``, ``get`` and -:py:meth:`DataTree.update`. -Note that assigning a ``DataArray`` object to a ``DataTree`` variable using ``__setitem__`` or ``update`` will -:ref:`automatically align ` the array(s) to the original node's indexes. - -If you copy a ``DataTree`` using the :py:func:`copy` function or the :py:meth:`DataTree.copy` method it will copy the subtree, -meaning that node and children below it, but no parents above it. -Like for ``Dataset``, this copy is shallow by default, but you can copy all the underlying data arrays by calling ``dt.copy(deep=True)``. diff --git a/xarray/datatree_/docs/source/index.rst b/xarray/datatree_/docs/source/index.rst deleted file mode 100644 index a88a5747ada..00000000000 --- a/xarray/datatree_/docs/source/index.rst +++ /dev/null @@ -1,61 +0,0 @@ -.. currentmodule:: datatree - -Datatree -======== - -**Datatree is a prototype implementation of a tree-like hierarchical data structure for xarray.** - -Why Datatree? -~~~~~~~~~~~~~ - -Datatree was born after the xarray team recognised a `need for a new hierarchical data structure `_, -that was more flexible than a single :py:class:`xarray.Dataset` object. -The initial motivation was to represent netCDF files / Zarr stores with multiple nested groups in a single in-memory object, -but :py:class:`~datatree.DataTree` objects have many other uses. - -You might want to use datatree for: - -- Organising many related datasets, e.g. results of the same experiment with different parameters, or simulations of the same system using different models, -- Analysing similar data at multiple resolutions simultaneously, such as when doing a convergence study, -- Comparing heterogenous but related data, such as experimental and theoretical data, -- I/O with nested data formats such as netCDF / Zarr groups. - -Development Roadmap -~~~~~~~~~~~~~~~~~~~ - -Datatree currently lives in a separate repository to the main xarray package. -This allows the datatree developers to make changes to it, experiment, and improve it faster. - -Eventually we plan to fully integrate datatree upstream into xarray's main codebase, at which point the `github.com/xarray-contrib/datatree `_ repository will be archived. -This should not cause much disruption to code that depends on datatree - you will likely only have to change the import line (i.e. from ``from datatree import DataTree`` to ``from xarray import DataTree``). - -However, until this full integration occurs, datatree's API should not be considered to have the same `level of stability as xarray's `_. - -User Feedback -~~~~~~~~~~~~~ - -We really really really want to hear your opinions on datatree! -At this point in development, user feedback is critical to help us create something that will suit everyone's needs. -Please raise any thoughts, issues, suggestions or bugs, no matter how small or large, on the `github issue tracker `_. - -.. toctree:: - :maxdepth: 2 - :caption: Documentation Contents - - Installation - Quick Overview - Tutorial - Data Model - Hierarchical Data - Reading and Writing Files - API Reference - Terminology - Contributing Guide - What's New - GitHub repository - -Feedback --------- - -If you encounter any errors, problems with **Datatree**, or have any suggestions, please open an issue -on `GitHub `_. diff --git a/xarray/datatree_/docs/source/installation.rst b/xarray/datatree_/docs/source/installation.rst deleted file mode 100644 index b2682743ade..00000000000 --- a/xarray/datatree_/docs/source/installation.rst +++ /dev/null @@ -1,38 +0,0 @@ -.. currentmodule:: datatree - -============ -Installation -============ - -Datatree can be installed in three ways: - -Using the `conda `__ package manager that comes with the -Anaconda/Miniconda distribution: - -.. code:: bash - - $ conda install xarray-datatree --channel conda-forge - -Using the `pip `__ package manager: - -.. code:: bash - - $ python -m pip install xarray-datatree - -To install a development version from source: - -.. code:: bash - - $ git clone https://github.com/xarray-contrib/datatree - $ cd datatree - $ python -m pip install -e . - - -You will just need xarray as a required dependency, with netcdf4, zarr, and h5netcdf as optional dependencies to allow file I/O. - -.. note:: - - Datatree is very much still in the early stages of development. There may be functions that are present but whose - internals are not yet implemented, or significant changes to the API in future. - That said, if you try it out and find some behaviour that looks like a bug to you, please report it on the - `issue tracker `_! diff --git a/xarray/datatree_/docs/source/io.rst b/xarray/datatree_/docs/source/io.rst deleted file mode 100644 index 2f2dabf9948..00000000000 --- a/xarray/datatree_/docs/source/io.rst +++ /dev/null @@ -1,54 +0,0 @@ -.. currentmodule:: datatree - -.. _io: - -Reading and Writing Files -========================= - -.. note:: - - This page builds on the information given in xarray's main page on - `reading and writing files `_, - so it is suggested that you are familiar with those first. - - -netCDF ------- - -Groups -~~~~~~ - -Whilst netCDF groups can only be loaded individually as Dataset objects, a whole file of many nested groups can be loaded -as a single :py:class:`DataTree` object. -To open a whole netCDF file as a tree of groups use the :py:func:`open_datatree` function. -To save a DataTree object as a netCDF file containing many groups, use the :py:meth:`DataTree.to_netcdf` method. - - -.. _netcdf.group.warning: - -.. warning:: - ``DataTree`` objects do not follow the exact same data model as netCDF files, which means that perfect round-tripping - is not always possible. - - In particular in the netCDF data model dimensions are entities that can exist regardless of whether any variable possesses them. - This is in contrast to `xarray's data model `_ - (and hence :ref:`datatree's data model `) in which the dimensions of a (Dataset/Tree) - object are simply the set of dimensions present across all variables in that dataset. - - This means that if a netCDF file contains dimensions but no variables which possess those dimensions, - these dimensions will not be present when that file is opened as a DataTree object. - Saving this DataTree object to file will therefore not preserve these "unused" dimensions. - -Zarr ----- - -Groups -~~~~~~ - -Nested groups in zarr stores can be represented by loading the store as a :py:class:`DataTree` object, similarly to netCDF. -To open a whole zarr store as a tree of groups use the :py:func:`open_datatree` function. -To save a DataTree object as a zarr store containing many groups, use the :py:meth:`DataTree.to_zarr()` method. - -.. note:: - Note that perfect round-tripping should always be possible with a zarr store (:ref:`unlike for netCDF files `), - as zarr does not support "unused" dimensions. diff --git a/xarray/datatree_/docs/source/terminology.rst b/xarray/datatree_/docs/source/terminology.rst deleted file mode 100644 index e481a01a6b2..00000000000 --- a/xarray/datatree_/docs/source/terminology.rst +++ /dev/null @@ -1,34 +0,0 @@ -.. currentmodule:: datatree - -.. _terminology: - -This page extends `xarray's page on terminology `_. - -Terminology -=========== - -.. glossary:: - - DataTree - A tree-like collection of ``Dataset`` objects. A *tree* is made up of one or more *nodes*, - each of which can store the same information as a single ``Dataset`` (accessed via `.ds`). - This data is stored in the same way as in a ``Dataset``, i.e. in the form of data variables - (see **Variable** in the `corresponding xarray terminology page `_), - dimensions, coordinates, and attributes. - - The nodes in a tree are linked to one another, and each node is it's own instance of ``DataTree`` object. - Each node can have zero or more *children* (stored in a dictionary-like manner under their corresponding *names*), - and those child nodes can themselves have children. - If a node is a child of another node that other node is said to be its *parent*. Nodes can have a maximum of one parent, - and if a node has no parent it is said to be the *root* node of that *tree*. - - Subtree - A section of a *tree*, consisting of a *node* along with all the child nodes below it - (and the child nodes below them, i.e. all so-called *descendant* nodes). - Excludes the parent node and all nodes above. - - Group - Another word for a subtree, reflecting how the hierarchical structure of a ``DataTree`` allows for grouping related data together. - Analogous to a single - `netCDF group `_ or - `Zarr group `_. diff --git a/xarray/datatree_/docs/source/tutorial.rst b/xarray/datatree_/docs/source/tutorial.rst deleted file mode 100644 index 6e33bd36f91..00000000000 --- a/xarray/datatree_/docs/source/tutorial.rst +++ /dev/null @@ -1,7 +0,0 @@ -.. currentmodule:: datatree - -======== -Tutorial -======== - -Coming soon! diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst deleted file mode 100644 index 2f6e4f88fe5..00000000000 --- a/xarray/datatree_/docs/source/whats-new.rst +++ /dev/null @@ -1,426 +0,0 @@ -.. currentmodule:: datatree - -What's New -========== - -.. ipython:: python - :suppress: - - import numpy as np - import pandas as pd - import xarray as xray - import xarray - import xarray as xr - import datatree - - np.random.seed(123456) - -.. _whats-new.v0.0.14: - -v0.0.14 (unreleased) --------------------- - -New Features -~~~~~~~~~~~~ - -Breaking changes -~~~~~~~~~~~~~~~~ - -- Renamed `DataTree.lineage` to `DataTree.parents` to match `pathlib` vocabulary - (:issue:`283`, :pull:`286`) -- Minimum required version of xarray is now 2023.12.0, i.e. the latest version. - This is required to prevent recent changes to xarray's internals from breaking datatree. - (:issue:`293`, :pull:`294`) - By `Tom Nicholas `_. -- Change default write mode of :py:meth:`DataTree.to_zarr` to ``'w-'`` to match ``xarray`` - default and prevent accidental directory overwrites. (:issue:`274`, :pull:`275`) - By `Sam Levang `_. - -Deprecations -~~~~~~~~~~~~ - -- Renamed `DataTree.lineage` to `DataTree.parents` to match `pathlib` vocabulary - (:issue:`283`, :pull:`286`). `lineage` is now deprecated and use of `parents` is encouraged. - By `Etienne Schalk `_. - -Bug fixes -~~~~~~~~~ -- Keep attributes on nodes containing no data in :py:func:`map_over_subtree`. (:issue:`278`, :pull:`279`) - By `Sam Levang `_. - -Documentation -~~~~~~~~~~~~~ -- Use ``napoleon`` instead of ``numpydoc`` to align with xarray documentation - (:issue:`284`, :pull:`298`). - By `Etienne Schalk `_. - -Internal Changes -~~~~~~~~~~~~~~~~ - -.. _whats-new.v0.0.13: - -v0.0.13 (27/10/2023) --------------------- - -New Features -~~~~~~~~~~~~ - -- New :py:meth:`DataTree.match` method for glob-like pattern matching of node paths. (:pull:`267`) - By `Tom Nicholas `_. -- New :py:meth:`DataTree.is_hollow` property for checking if data is only contained at the leaf nodes. (:pull:`272`) - By `Tom Nicholas `_. -- Indicate which node caused the problem if error encountered while applying user function using :py:func:`map_over_subtree` - (:issue:`190`, :pull:`264`). Only works when using python 3.11 or later. - By `Tom Nicholas `_. - -Breaking changes -~~~~~~~~~~~~~~~~ - -- Nodes containing only attributes but no data are now ignored by :py:func:`map_over_subtree` (:issue:`262`, :pull:`263`) - By `Tom Nicholas `_. -- Disallow altering of given dataset inside function called by :py:func:`map_over_subtree` (:pull:`269`, reverts part of :pull:`194`). - By `Tom Nicholas `_. - -Bug fixes -~~~~~~~~~ - -- Fix unittests on i386. (:pull:`249`) - By `Antonio Valentino `_. -- Ensure nodepath class is compatible with python 3.12 (:pull:`260`) - By `Max Grover `_. - -Documentation -~~~~~~~~~~~~~ - -- Added new sections to page on ``Working with Hierarchical Data`` (:pull:`180`) - By `Tom Nicholas `_. - -Internal Changes -~~~~~~~~~~~~~~~~ - -* No longer use the deprecated `distutils` package. - -.. _whats-new.v0.0.12: - -v0.0.12 (03/07/2023) --------------------- - -New Features -~~~~~~~~~~~~ - -- Added a :py:func:`DataTree.level`, :py:func:`DataTree.depth`, and :py:func:`DataTree.width` property (:pull:`208`). - By `Tom Nicholas `_. -- Allow dot-style (or "attribute-like") access to child nodes and variables, with ipython autocomplete. (:issue:`189`, :pull:`98`) - By `Tom Nicholas `_. - -Breaking changes -~~~~~~~~~~~~~~~~ - -Deprecations -~~~~~~~~~~~~ - -- Dropped support for python 3.8 (:issue:`212`, :pull:`214`) - By `Tom Nicholas `_. - -Bug fixes -~~~~~~~~~ - -- Allow for altering of given dataset inside function called by :py:func:`map_over_subtree` (:issue:`188`, :pull:`194`). - By `Tom Nicholas `_. -- copy subtrees without creating ancestor nodes (:pull:`201`) - By `Justus Magin `_. - -Documentation -~~~~~~~~~~~~~ - -Internal Changes -~~~~~~~~~~~~~~~~ - -.. _whats-new.v0.0.11: - -v0.0.11 (01/09/2023) --------------------- - -Big update with entirely new pages in the docs, -new methods (``.drop_nodes``, ``.filter``, ``.leaves``, ``.descendants``), and bug fixes! - -New Features -~~~~~~~~~~~~ - -- Added a :py:meth:`DataTree.drop_nodes` method (:issue:`161`, :pull:`175`). - By `Tom Nicholas `_. -- New, more specific exception types for tree-related errors (:pull:`169`). - By `Tom Nicholas `_. -- Added a new :py:meth:`DataTree.descendants` property (:pull:`170`). - By `Tom Nicholas `_. -- Added a :py:meth:`DataTree.leaves` property (:pull:`177`). - By `Tom Nicholas `_. -- Added a :py:meth:`DataTree.filter` method (:pull:`184`). - By `Tom Nicholas `_. - -Breaking changes -~~~~~~~~~~~~~~~~ - -- :py:meth:`DataTree.copy` copy method now only copies the subtree, not the parent nodes (:pull:`171`). - By `Tom Nicholas `_. -- Grafting a subtree onto another tree now leaves name of original subtree object unchanged (:issue:`116`, :pull:`172`, :pull:`178`). - By `Tom Nicholas `_. -- Changed the :py:meth:`DataTree.assign` method to just work on the local node (:pull:`181`). - By `Tom Nicholas `_. - -Deprecations -~~~~~~~~~~~~ - -Bug fixes -~~~~~~~~~ - -- Fix bug with :py:meth:`DataTree.relative_to` method (:issue:`133`, :pull:`160`). - By `Tom Nicholas `_. -- Fix links to API docs in all documentation (:pull:`183`). - By `Tom Nicholas `_. - -Documentation -~~~~~~~~~~~~~ - -- Changed docs theme to match xarray's main documentation. (:pull:`173`) - By `Tom Nicholas `_. -- Added ``Terminology`` page. (:pull:`174`) - By `Tom Nicholas `_. -- Added page on ``Working with Hierarchical Data`` (:pull:`179`) - By `Tom Nicholas `_. -- Added context content to ``Index`` page (:pull:`182`) - By `Tom Nicholas `_. -- Updated the README (:pull:`187`) - By `Tom Nicholas `_. - -Internal Changes -~~~~~~~~~~~~~~~~ - - -.. _whats-new.v0.0.10: - -v0.0.10 (12/07/2022) --------------------- - -Adds accessors and a `.pipe()` method. - -New Features -~~~~~~~~~~~~ - -- Add the ability to register accessors on ``DataTree`` objects, by using ``register_datatree_accessor``. (:pull:`144`) - By `Tom Nicholas `_. -- Allow method chaining with a new :py:meth:`DataTree.pipe` method (:issue:`151`, :pull:`156`). - By `Justus Magin `_. - -Breaking changes -~~~~~~~~~~~~~~~~ - -Deprecations -~~~~~~~~~~~~ - -Bug fixes -~~~~~~~~~ - -- Allow ``Datatree`` objects as values in :py:meth:`DataTree.from_dict` (:pull:`159`). - By `Justus Magin `_. - -Documentation -~~~~~~~~~~~~~ - -- Added ``Reading and Writing Files`` page. (:pull:`158`) - By `Tom Nicholas `_. - -Internal Changes -~~~~~~~~~~~~~~~~ - -- Avoid reading from same file twice with fsspec3 (:pull:`130`) - By `William Roberts `_. - - -.. _whats-new.v0.0.9: - -v0.0.9 (07/14/2022) -------------------- - -New Features -~~~~~~~~~~~~ - -Breaking changes -~~~~~~~~~~~~~~~~ - -Deprecations -~~~~~~~~~~~~ - -Bug fixes -~~~~~~~~~ - -Documentation -~~~~~~~~~~~~~ -- Switch docs theme (:pull:`123`). - By `JuliusBusecke `_. - -Internal Changes -~~~~~~~~~~~~~~~~ - - -.. _whats-new.v0.0.7: - -v0.0.7 (07/11/2022) -------------------- - -New Features -~~~~~~~~~~~~ - -- Improve the HTML repr by adding tree-style lines connecting groups and sub-groups (:pull:`109`). - By `Benjamin Woods `_. - -Breaking changes -~~~~~~~~~~~~~~~~ - -- The ``DataTree.ds`` attribute now returns a view onto an immutable Dataset-like object, instead of an actual instance - of ``xarray.Dataset``. This make break existing ``isinstance`` checks or ``assert`` comparisons. (:pull:`99`) - By `Tom Nicholas `_. - -Deprecations -~~~~~~~~~~~~ - -Bug fixes -~~~~~~~~~ - -- Modifying the contents of a ``DataTree`` object via the ``DataTree.ds`` attribute is now forbidden, which prevents - any possibility of the contents of a ``DataTree`` object and its ``.ds`` attribute diverging. (:issue:`38`, :pull:`99`) - By `Tom Nicholas `_. -- Fixed a bug so that names of children now always match keys under which parents store them (:pull:`99`). - By `Tom Nicholas `_. - -Documentation -~~~~~~~~~~~~~ - -- Added ``Data Structures`` page describing the internal structure of a ``DataTree`` object, and its relation to - ``xarray.Dataset`` objects. (:pull:`103`) - By `Tom Nicholas `_. -- API page updated with all the methods that are copied from ``xarray.Dataset``. (:pull:`41`) - By `Tom Nicholas `_. - -Internal Changes -~~~~~~~~~~~~~~~~ - -- Refactored ``DataTree`` class to store a set of ``xarray.Variable`` objects instead of a single ``xarray.Dataset``. - This approach means that the ``DataTree`` class now effectively copies and extends the internal structure of - ``xarray.Dataset``. (:pull:`41`) - By `Tom Nicholas `_. -- Refactored to use intermediate ``NamedNode`` class, separating implementation of methods requiring a ``name`` - attribute from those not requiring it. - By `Tom Nicholas `_. -- Made ``testing.test_datatree.create_test_datatree`` into a pytest fixture (:pull:`107`). - By `Benjamin Woods `_. - - - -.. _whats-new.v0.0.6: - -v0.0.6 (06/03/2022) -------------------- - -Various small bug fixes, in preparation for more significant changes in the next version. - -Bug fixes -~~~~~~~~~ - -- Fixed bug with checking that assigning parent or new children did not create a loop in the tree (:pull:`105`) - By `Tom Nicholas `_. -- Do not call ``__exit__`` on Zarr store when opening (:pull:`90`) - By `Matt McCormick `_. -- Fix netCDF encoding for compression (:pull:`95`) - By `Joe Hamman `_. -- Added validity checking for node names (:pull:`106`) - By `Tom Nicholas `_. - -.. _whats-new.v0.0.5: - -v0.0.5 (05/05/2022) -------------------- - -- Major refactor of internals, moving from the ``DataTree.children`` attribute being a ``Tuple[DataTree]`` to being a - ``OrderedDict[str, DataTree]``. This was necessary in order to integrate better with xarray's dictionary-like API, - solve several issues, simplify the code internally, remove dependencies, and enable new features. (:pull:`76`) - By `Tom Nicholas `_. - -New Features -~~~~~~~~~~~~ - -- Syntax for accessing nodes now supports file-like paths, including parent nodes via ``"../"``, relative paths, the - root node via ``"/"``, and the current node via ``"."``. (Internally it actually uses ``pathlib`` now.) - By `Tom Nicholas `_. -- New path-like API methods, such as ``.relative_to``, ``.find_common_ancestor``, and ``.same_tree``. -- Some new dictionary-like methods, such as ``DataTree.get`` and ``DataTree.update``. (:pull:`76`) - By `Tom Nicholas `_. -- New HTML repr, which will automatically display in a jupyter notebook. (:pull:`78`) - By `Tom Nicholas `_. -- New delitem method so you can delete nodes. (:pull:`88`) - By `Tom Nicholas `_. -- New ``to_dict`` method. (:pull:`82`) - By `Tom Nicholas `_. - -Breaking changes -~~~~~~~~~~~~~~~~ - -- Node names are now optional, which means that the root of the tree can be unnamed. This has knock-on effects for - a lot of the API. -- The ``__init__`` signature for ``DataTree`` has changed, so that ``name`` is now an optional kwarg. -- Files will now be loaded as a slightly different tree, because the root group no longer needs to be given a default - name. -- Removed tag-like access to nodes. -- Removes the option to delete all data in a node by assigning None to the node (in favour of deleting data by replacing - the node's ``.ds`` attribute with an empty Dataset), or to create a new empty node in the same way (in favour of - assigning an empty DataTree object instead). -- Removes the ability to create a new node by assigning a ``Dataset`` object to ``DataTree.__setitem__``. -- Several other minor API changes such as ``.pathstr`` -> ``.path``, and ``from_dict``'s dictionary argument now being - required. (:pull:`76`) - By `Tom Nicholas `_. - -Deprecations -~~~~~~~~~~~~ - -- No longer depends on the anytree library (:pull:`76`) - By `Tom Nicholas `_. - -Bug fixes -~~~~~~~~~ - -- Fixed indentation issue with the string repr (:pull:`86`) - By `Tom Nicholas `_. - -Documentation -~~~~~~~~~~~~~ - -- Quick-overview page updated to match change in path syntax (:pull:`76`) - By `Tom Nicholas `_. - -Internal Changes -~~~~~~~~~~~~~~~~ - -- Basically every file was changed in some way to accommodate (:pull:`76`). -- No longer need the utility functions for string manipulation that were defined in ``utils.py``. -- A considerable amount of code copied over from the internals of anytree (e.g. in ``render.py`` and ``iterators.py``). - The Apache license for anytree has now been bundled with datatree. (:pull:`76`). - By `Tom Nicholas `_. - -.. _whats-new.v0.0.4: - -v0.0.4 (31/03/2022) -------------------- - -- Ensure you get the pretty tree-like string representation by default in ipython (:pull:`73`). - By `Tom Nicholas `_. -- Now available on conda-forge (as xarray-datatree)! (:pull:`71`) - By `Anderson Banihirwe `_. -- Allow for python 3.8 (:pull:`70`). - By `Don Setiawan `_. - -.. _whats-new.v0.0.3: - -v0.0.3 (30/03/2022) -------------------- - -- First released version available on both pypi (as xarray-datatree)! From eb23afb72f3a459f04cead9d7ad3a242dfe74d19 Mon Sep 17 00:00:00 2001 From: Owen Littlejohns Date: Fri, 17 May 2024 22:42:25 -0400 Subject: [PATCH 02/57] DAS-2155 - Add PR reference to whats-new.rst. --- doc/whats-new.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 6c2658669db..58f87ace52a 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -39,7 +39,7 @@ Bug fixes Documentation ~~~~~~~~~~~~~ -- Migrate documentation for ``datatree`` into main ``xarray`` documentation. +- Migrate documentation for ``datatree`` into main ``xarray`` documentation (:pull:`9033`). For information on previous ``datatree`` releases, please see: `datatree's historical release notes `_. By `Owen Littlejohns `_ and From bf3a1e797459aae96994d38120274f348a36ac15 Mon Sep 17 00:00:00 2001 From: Owen Littlejohns Date: Sat, 18 May 2024 00:56:11 -0400 Subject: [PATCH 03/57] DAS-2155 - Fix DataTree documentation string insertion test. --- xarray/tests/test_datatree.py | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/xarray/tests/test_datatree.py b/xarray/tests/test_datatree.py index 58fec20d4c6..4a57b83cf37 100644 --- a/xarray/tests/test_datatree.py +++ b/xarray/tests/test_datatree.py @@ -861,9 +861,10 @@ def test_standard_doc(self): Unlike compute, the original dataset is modified and returned. .. note:: - This method was copied from xarray.Dataset, but has been altered to - call the method on the Datasets stored in every node of the - subtree. See the `map_over_subtree` function for more details. + This method was copied from :py:class:`xarray.Dataset`, but has + been altered to call the method on the Datasets stored in every + node of the subtree. See the `map_over_subtree` function for more + details. Normally, it should not be necessary to call this method in user code, because all xarray functions should either work on deferred data or @@ -891,9 +892,9 @@ def test_one_liner(self): """\ Same as abs(a). - This method was copied from xarray.Dataset, but has been altered to call the - method on the Datasets stored in every node of the subtree. See the - `map_over_subtree` function for more details.""" + This method was copied from :py:class:`xarray.Dataset`, but has been altered to + call the method on the Datasets stored in every node of the subtree. See + the `map_over_subtree` function for more details.""" ) actual_doc = insert_doc_addendum(mixin_doc, _MAPPED_DOCSTRING_ADDENDUM) From 2353d17698dc45d46ee2ffc43fd379dd9eaa0876 Mon Sep 17 00:00:00 2001 From: Owen Littlejohns Date: Mon, 20 May 2024 23:17:07 -0400 Subject: [PATCH 04/57] DAS-2067 - Expose DataTree API in xarray public API. --- doc/api.rst | 286 +++++++++++++-------------- doc/internals/extending-xarray.rst | 2 +- doc/internals/internal-design.rst | 5 +- doc/internals/interoperability.rst | 2 +- doc/roadmap.rst | 2 +- doc/user-guide/data-structures.rst | 22 +-- doc/user-guide/datatree.rst | 9 +- doc/user-guide/hierarchical-data.rst | 97 +++++---- doc/user-guide/io.rst | 10 +- doc/whats-new.rst | 6 + xarray/__init__.py | 8 + xarray/testing/__init__.py | 2 + 12 files changed, 232 insertions(+), 219 deletions(-) diff --git a/doc/api.rst b/doc/api.rst index 773ad6b5664..28b0f5adf2d 100644 --- a/doc/api.rst +++ b/doc/api.rst @@ -599,8 +599,8 @@ Methods of creating a ``DataTree``. .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree - xarray.core.datatree.DataTree.from_dict + DataTree + DataTree.from_dict Tree Attributes --------------- @@ -610,24 +610,24 @@ Attributes relating to the recursive tree-like structure of a ``DataTree``. .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.parent - xarray.core.datatree.DataTree.children - xarray.core.datatree.DataTree.name - xarray.core.datatree.DataTree.path - xarray.core.datatree.DataTree.root - xarray.core.datatree.DataTree.is_root - xarray.core.datatree.DataTree.is_leaf - xarray.core.datatree.DataTree.leaves - xarray.core.datatree.DataTree.level - xarray.core.datatree.DataTree.depth - xarray.core.datatree.DataTree.width - xarray.core.datatree.DataTree.subtree - xarray.core.datatree.DataTree.descendants - xarray.core.datatree.DataTree.siblings - xarray.core.datatree.DataTree.lineage - xarray.core.datatree.DataTree.parents - xarray.core.datatree.DataTree.ancestors - xarray.core.datatree.DataTree.groups + DataTree.parent + DataTree.children + DataTree.name + DataTree.path + DataTree.root + DataTree.is_root + DataTree.is_leaf + DataTree.leaves + DataTree.level + DataTree.depth + DataTree.width + DataTree.subtree + DataTree.descendants + DataTree.siblings + DataTree.lineage + DataTree.parents + DataTree.ancestors + DataTree.groups Data Contents ------------- @@ -638,20 +638,20 @@ This interface echoes that of ``xarray.Dataset``. .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.dims - xarray.core.datatree.DataTree.sizes - xarray.core.datatree.DataTree.data_vars - xarray.core.datatree.DataTree.coords - xarray.core.datatree.DataTree.attrs - xarray.core.datatree.DataTree.encoding - xarray.core.datatree.DataTree.indexes - xarray.core.datatree.DataTree.nbytes - xarray.core.datatree.DataTree.ds - xarray.core.datatree.DataTree.to_dataset - xarray.core.datatree.DataTree.has_data - xarray.core.datatree.DataTree.has_attrs - xarray.core.datatree.DataTree.is_empty - xarray.core.datatree.DataTree.is_hollow + DataTree.dims + DataTree.sizes + DataTree.data_vars + DataTree.coords + DataTree.attrs + DataTree.encoding + DataTree.indexes + DataTree.nbytes + DataTree.ds + DataTree.to_dataset + DataTree.has_data + DataTree.has_attrs + DataTree.is_empty + DataTree.is_hollow Dictionary Interface -------------------- @@ -661,14 +661,14 @@ Dictionary Interface .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.__getitem__ - xarray.core.datatree.DataTree.__setitem__ - xarray.core.datatree.DataTree.__delitem__ - xarray.core.datatree.DataTree.update - xarray.core.datatree.DataTree.get - xarray.core.datatree.DataTree.items - xarray.core.datatree.DataTree.keys - xarray.core.datatree.DataTree.values + DataTree.__getitem__ + DataTree.__setitem__ + DataTree.__delitem__ + DataTree.update + DataTree.get + DataTree.items + DataTree.keys + DataTree.values Tree Manipulation ----------------- @@ -678,15 +678,15 @@ For manipulating, traversing, navigating, or mapping over the tree structure. .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.orphan - xarray.core.datatree.DataTree.same_tree - xarray.core.datatree.DataTree.relative_to - xarray.core.datatree.DataTree.iter_lineage - xarray.core.datatree.DataTree.find_common_ancestor - xarray.core.datatree.DataTree.map_over_subtree - xarray.core.datatree.DataTree.pipe - xarray.core.datatree.DataTree.match - xarray.core.datatree.DataTree.filter + DataTree.orphan + DataTree.same_tree + DataTree.relative_to + DataTree.iter_lineage + DataTree.find_common_ancestor + DataTree.map_over_subtree + DataTree.pipe + DataTree.match + DataTree.filter Pathlib-like Interface ---------------------- @@ -696,10 +696,10 @@ Pathlib-like Interface .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.name - xarray.core.datatree.DataTree.parent - xarray.core.datatree.DataTree.parents - xarray.core.datatree.DataTree.relative_to + DataTree.name + DataTree.parent + DataTree.parents + DataTree.relative_to Missing: @@ -720,18 +720,18 @@ Manipulate the contents of all nodes in a ``DataTree`` simultaneously. .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.copy - xarray.core.datatree.DataTree.assign_coords - xarray.core.datatree.DataTree.merge - xarray.core.datatree.DataTree.rename - xarray.core.datatree.DataTree.rename_vars - xarray.core.datatree.DataTree.rename_dims - xarray.core.datatree.DataTree.swap_dims - xarray.core.datatree.DataTree.expand_dims - xarray.core.datatree.DataTree.drop_vars - xarray.core.datatree.DataTree.drop_dims - xarray.core.datatree.DataTree.set_coords - xarray.core.datatree.DataTree.reset_coords + DataTree.copy + DataTree.assign_coords + DataTree.merge + DataTree.rename + DataTree.rename_vars + DataTree.rename_dims + DataTree.swap_dims + DataTree.expand_dims + DataTree.drop_vars + DataTree.drop_dims + DataTree.set_coords + DataTree.reset_coords DataTree Node Contents ---------------------- @@ -741,8 +741,8 @@ Manipulate the contents of a single ``DataTree`` node. .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.assign - xarray.core.datatree.DataTree.drop_nodes + DataTree.assign + DataTree.drop_nodes Comparisons ----------- @@ -752,9 +752,9 @@ Compare one ``DataTree`` object to another. .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.isomorphic - xarray.core.datatree.DataTree.equals - xarray.core.datatree.DataTree.identical + DataTree.isomorphic + DataTree.equals + DataTree.identical Indexing -------- @@ -764,22 +764,22 @@ Index into all nodes in the subtree simultaneously. .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.isel - xarray.core.datatree.DataTree.sel - xarray.core.datatree.DataTree.drop_sel - xarray.core.datatree.DataTree.drop_isel - xarray.core.datatree.DataTree.head - xarray.core.datatree.DataTree.tail - xarray.core.datatree.DataTree.thin - xarray.core.datatree.DataTree.squeeze - xarray.core.datatree.DataTree.interp - xarray.core.datatree.DataTree.interp_like - xarray.core.datatree.DataTree.reindex - xarray.core.datatree.DataTree.reindex_like - xarray.core.datatree.DataTree.set_index - xarray.core.datatree.DataTree.reset_index - xarray.core.datatree.DataTree.reorder_levels - xarray.core.datatree.DataTree.query + DataTree.isel + DataTree.sel + DataTree.drop_sel + DataTree.drop_isel + DataTree.head + DataTree.tail + DataTree.thin + DataTree.squeeze + DataTree.interp + DataTree.interp_like + DataTree.reindex + DataTree.reindex_like + DataTree.set_index + DataTree.reset_index + DataTree.reorder_levels + DataTree.query .. @@ -793,16 +793,16 @@ Missing Value Handling .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.isnull - xarray.core.datatree.DataTree.notnull - xarray.core.datatree.DataTree.combine_first - xarray.core.datatree.DataTree.dropna - xarray.core.datatree.DataTree.fillna - xarray.core.datatree.DataTree.ffill - xarray.core.datatree.DataTree.bfill - xarray.core.datatree.DataTree.interpolate_na - xarray.core.datatree.DataTree.where - xarray.core.datatree.DataTree.isin + DataTree.isnull + DataTree.notnull + DataTree.combine_first + DataTree.dropna + DataTree.fillna + DataTree.ffill + DataTree.bfill + DataTree.interpolate_na + DataTree.where + DataTree.isin Computation ----------- @@ -812,15 +812,15 @@ Apply a computation to the data in all nodes in the subtree simultaneously. .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.map - xarray.core.datatree.DataTree.reduce - xarray.core.datatree.DataTree.diff - xarray.core.datatree.DataTree.quantile - xarray.core.datatree.DataTree.differentiate - xarray.core.datatree.DataTree.integrate - xarray.core.datatree.DataTree.map_blocks - xarray.core.datatree.DataTree.polyfit - xarray.core.datatree.DataTree.curvefit + DataTree.map + DataTree.reduce + DataTree.diff + DataTree.quantile + DataTree.differentiate + DataTree.integrate + DataTree.map_blocks + DataTree.polyfit + DataTree.curvefit Aggregation ----------- @@ -830,22 +830,22 @@ Aggregate data in all nodes in the subtree simultaneously. .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.all - xarray.core.datatree.DataTree.any - xarray.core.datatree.DataTree.argmax - xarray.core.datatree.DataTree.argmin - xarray.core.datatree.DataTree.idxmax - xarray.core.datatree.DataTree.idxmin - xarray.core.datatree.DataTree.max - xarray.core.datatree.DataTree.min - xarray.core.datatree.DataTree.mean - xarray.core.datatree.DataTree.median - xarray.core.datatree.DataTree.prod - xarray.core.datatree.DataTree.sum - xarray.core.datatree.DataTree.std - xarray.core.datatree.DataTree.var - xarray.core.datatree.DataTree.cumsum - xarray.core.datatree.DataTree.cumprod + DataTree.all + DataTree.any + DataTree.argmax + DataTree.argmin + DataTree.idxmax + DataTree.idxmin + DataTree.max + DataTree.min + DataTree.mean + DataTree.median + DataTree.prod + DataTree.sum + DataTree.std + DataTree.var + DataTree.cumsum + DataTree.cumprod ndarray methods --------------- @@ -855,13 +855,13 @@ Methods copied from :py:class:`numpy.ndarray` objects, here applying to the data .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.argsort - xarray.core.datatree.DataTree.astype - xarray.core.datatree.DataTree.clip - xarray.core.datatree.DataTree.conj - xarray.core.datatree.DataTree.conjugate - xarray.core.datatree.DataTree.round - xarray.core.datatree.DataTree.rank + DataTree.argsort + DataTree.astype + DataTree.clip + DataTree.conj + DataTree.conjugate + DataTree.round + DataTree.rank Reshaping and reorganising -------------------------- @@ -871,14 +871,14 @@ Reshape or reorganise the data in all nodes in the subtree. .. autosummary:: :toctree: generated/ - xarray.core.datatree.DataTree.transpose - xarray.core.datatree.DataTree.stack - xarray.core.datatree.DataTree.unstack - xarray.core.datatree.DataTree.shift - xarray.core.datatree.DataTree.roll - xarray.core.datatree.DataTree.pad - xarray.core.datatree.DataTree.sortby - xarray.core.datatree.DataTree.broadcast_like + DataTree.transpose + DataTree.stack + DataTree.unstack + DataTree.shift + DataTree.roll + DataTree.pad + DataTree.sortby + DataTree.broadcast_like IO / Conversion =============== @@ -951,9 +951,9 @@ DataTree methods :toctree: generated/ xarray.backends.api.open_datatree - xarray.core.datatree.DataTree.to_dict - xarray.core.datatree.DataTree.to_netcdf - xarray.core.datatree.DataTree.to_zarr + DataTree.to_dict + DataTree.to_netcdf + DataTree.to_zarr .. @@ -1384,7 +1384,7 @@ Test that two ``DataTree`` objects are similar. .. autosummary:: :toctree: generated/ - testing.assertions.assert_isomorphic + testing.assert_isomorphic testing.assert_equal testing.assert_identical @@ -1439,7 +1439,7 @@ Advanced API Coordinates Dataset.variables DataArray.variable - xarray.core.datatree.DataTree.variables + DataTree.variables Variable IndexVariable as_variable diff --git a/doc/internals/extending-xarray.rst b/doc/internals/extending-xarray.rst index 1f4ad0cd924..6c6ce002a7d 100644 --- a/doc/internals/extending-xarray.rst +++ b/doc/internals/extending-xarray.rst @@ -42,7 +42,7 @@ Writing Custom Accessors To resolve this issue for more complex cases, xarray has the :py:func:`~xarray.register_dataset_accessor`, :py:func:`~xarray.register_dataarray_accessor` and -:py:func:`~xarray.core.extensions.register_datatree_accessor` decorators for adding custom +:py:func:`~xarray.register_datatree_accessor` decorators for adding custom "accessors" on xarray objects, thereby "extending" the functionality of your xarray object. Here's how you might use these decorators to diff --git a/doc/internals/internal-design.rst b/doc/internals/internal-design.rst index 19cb3c6da70..93009b002c4 100644 --- a/doc/internals/internal-design.rst +++ b/doc/internals/internal-design.rst @@ -4,7 +4,6 @@ import numpy as np import pandas as pd import xarray as xr - from xarray.core.datatree import DataTree np.random.seed(123456) np.set_printoptions(threshold=20) @@ -22,11 +21,11 @@ In order of increasing complexity, they are: - :py:class:`xarray.Variable`, - :py:class:`xarray.DataArray`, - :py:class:`xarray.Dataset`, -- :py:class:`xarray.core.datatree.DataTree`. +- :py:class:`xarray.DataTree`. The user guide lists only :py:class:`xarray.DataArray` and :py:class:`xarray.Dataset`, but :py:class:`~xarray.Variable` is the fundamental object internally, -and :py:class:`~xarray.core.datatree.DataTree` is a natural generalisation of :py:class:`xarray.Dataset`. +and :py:class:`~xarray.DataTree` is a natural generalisation of :py:class:`xarray.Dataset`. .. note:: diff --git a/doc/internals/interoperability.rst b/doc/internals/interoperability.rst index 66149104f2a..5c14819fa0d 100644 --- a/doc/internals/interoperability.rst +++ b/doc/internals/interoperability.rst @@ -36,7 +36,7 @@ it is entirely possible today to: - track the physical units of the data through computations (e.g via `pint-xarray `_), - query the data via custom index logic optimized for specific applications (e.g. an :py:class:`~xarray.Index` object backed by a KDTree structure), - attach domain-specific logic via accessor methods (e.g. to understand geographic Coordinate Reference System metadata), -- organize hierarchical groups of xarray data in a :py:class:`xarray.core.datatree.DataTree` (e.g. to treat heterogeneous simulation and observational data together during analysis). +- organize hierarchical groups of xarray data in a :py:class:`xarray.DataTree` (e.g. to treat heterogeneous simulation and observational data together during analysis). All of these features can be provided simultaneously, using libraries compatible with the rest of the scientific python ecosystem. In this situation xarray would be essentially a thin wrapper acting as pure-python framework, providing a common interface and diff --git a/doc/roadmap.rst b/doc/roadmap.rst index a0d4ffb685c..da927de6854 100644 --- a/doc/roadmap.rst +++ b/doc/roadmap.rst @@ -225,7 +225,7 @@ multiple netCDF groups (see :issue:`4118`). Currently there are several libraries which have wrapped xarray in order to build domain-specific data structures (e.g. `xarray-multiscale `__.), -but a general ``xarray.core.datatree.DataTree`` object obviates the need for these and] +but a general ``xarray.DataTree`` object obviates the need for these and] consolidates effort in a single domain-agnostic tool, much as xarray has already achieved. Labeled array without coordinates diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index f2c61bfc649..9cd739069c8 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -574,10 +574,8 @@ Let's make a single datatree node with some example data in it: .. ipython:: python - from xarray.core.datatree import DataTree - ds1 = xr.Dataset({"foo": "orange"}) - dt = DataTree(name="root", data=ds1) # create root node + dt = xr.DataTree(name="root", data=ds1) # create root node dt @@ -590,7 +588,7 @@ the constructor of the second: ds2 = xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}) # add a child by referring to the parent node - node2 = DataTree(name="a", parent=dt, data=ds2) + node2 = xr.DataTree(name="a", parent=dt, data=ds2) or by dynamically updating the attributes of one node to refer to another: @@ -598,7 +596,7 @@ or by dynamically updating the attributes of one node to refer to another: # add a second child by first creating a new node ... ds3 = xr.Dataset({"zed": np.NaN}) - node3 = DataTree(name="b", data=ds3) + node3 = xr.DataTree(name="b", data=ds3) # ... then updating its .parent property node3.parent = dt @@ -620,7 +618,7 @@ Alternatively you can also create a ``DataTree`` object from - An ``xarray.Dataset`` using ``Dataset.to_node()`` (not yet implemented), - A dictionary mapping directory-like paths to either ``DataTree`` nodes or - data, using :py:meth:`DataTree.from_dict()`, + data, using :py:meth:`xarray.DataTree.from_dict()`, - A netCDF or Zarr file on disk with :py:func:`open_datatree()`. See :ref:`reading and writing files `. @@ -628,7 +626,7 @@ Alternatively you can also create a ``DataTree`` object from DataTree Contents ~~~~~~~~~~~~~~~~~ -Like ``xarray.Dataset``, ``DataTree`` implements the python mapping interface, +Like ``xarray.Dataset``, ``xarray.DataTree`` implements the python mapping interface, but with values given by either ``xarray.DataArray`` objects or other ``DataTree`` objects. @@ -649,7 +647,7 @@ This demonstrates the fact that the data in any one node is equivalent to the contents of a single ``xarray.Dataset`` object. The ``DataTree.ds`` property returns an immutable view, but we can instead extract the node's data contents as a new (and mutable) ``xarray.Dataset`` object via -:py:meth:`xarray.core.datatree.DataTree.to_dataset()`: +:py:meth:`xarray.DataTree.to_dataset()`: .. ipython:: python @@ -673,21 +671,21 @@ datatree from scratch, we could have written: .. ipython:: python - dt = DataTree(name="root") + dt = xr.DataTree(name="root") dt["foo"] = "orange" - dt["a"] = DataTree(data=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])})) + dt["a"] = xr.DataTree(data=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])})) dt["a/b/zed"] = np.NaN dt To change the variables in a node of a ``DataTree``, you can use all the standard dictionary methods, including ``values``, ``items``, ``__delitem__``, -``get`` and :py:meth:`xarray.core.datatree.DataTree.update`. +``get`` and :py:meth:`xarray.DataTree.update`. Note that assigning a ``DataArray`` object to a ``DataTree`` variable using ``__setitem__`` or ``update`` will :ref:`automatically align ` the array(s) to the original node's indexes. If you copy a ``DataTree`` using the :py:func:`copy` function or the -:py:meth:`xarray.core.datatree.DataTree.copy` method it will copy the subtree, +:py:meth:`xarray.DataTree.copy` method it will copy the subtree, meaning that node and children below it, but no parents above it. Like for ``Dataset``, this copy is shallow by default, but you can copy all the underlying data arrays by calling ``dt.copy(deep=True)``. diff --git a/doc/user-guide/datatree.rst b/doc/user-guide/datatree.rst index e336f9e29f7..ef210ef7bd9 100644 --- a/doc/user-guide/datatree.rst +++ b/doc/user-guide/datatree.rst @@ -3,7 +3,7 @@ DataTrees --------- -:py:class:`xarray.core.datatree.DataTree` is a tree-like container of :py:class:`xarray.DataArray` objects, organised into multiple mutually allignable groups. +:py:class:`xarray.DataTree` is a tree-like container of :py:class:`xarray.DataArray` objects, organised into multiple mutually allignable groups. You can think of it like a (recursive) ``dict`` of :py:class:`xarray.Dataset` objects. Let's first make some example xarray datasets (following on from xarray's @@ -13,7 +13,6 @@ Let's first make some example xarray datasets (following on from xarray's import numpy as np import xarray as xr - from xarray.core.datatree import DataTree data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]}) ds = xr.Dataset(dict(foo=data, bar=("x", [1, 2]), baz=np.pi)) @@ -32,7 +31,9 @@ Now we'll put this data into a multi-group tree: .. ipython:: python - dt = DataTree.from_dict({"simulation/coarse": ds, "simulation/fine": ds2, "/": ds3}) + dt = xr.DataTree.from_dict( + {"simulation/coarse": ds, "simulation/fine": ds2, "/": ds3} + ) dt This creates a datatree with various groups. We have one root group, containing information about individual people. @@ -76,4 +77,4 @@ This allows you to work with multiple groups of non-alignable variables at once. If all of your variables are mutually alignable (i.e. they live on the same grid, such that every common dimension name maps to the same length), - then you probably don't need :py:class:`xarray.core.datatree.DataTree`, and should consider just sticking with ``xarray.Dataset``. + then you probably don't need :py:class:`xarray.DataTree`, and should consider just sticking with ``xarray.Dataset``. diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 94b5a7481f2..543b52c2d5d 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -9,7 +9,6 @@ Working With Hierarchical Data import numpy as np import pandas as pd import xarray as xr - from xarray.core.datatree import DataTree np.random.seed(123456) np.set_printoptions(threshold=10) @@ -33,9 +32,9 @@ or even any combination of the above. Often datasets like this cannot easily fit into a single :py:class:`xarray.Dataset` object, or are more usefully thought of as groups of related ``xarray.Dataset`` objects. -For this purpose we provide the :py:class:`xarray.core.datatree.DataTree` class. +For this purpose we provide the :py:class:`xarray.DataTree` class. -This page explains in detail how to understand and use the different features of the :py:class:`xarray.core.datatree.DataTree` class for your own hierarchical data needs. +This page explains in detail how to understand and use the different features of the :py:class:`xarray.DataTree` class for your own hierarchical data needs. .. _node relationships: @@ -54,15 +53,15 @@ Let's start by defining nodes representing the two siblings, Bart and Lisa Simps .. ipython:: python - bart = DataTree(name="Bart") - lisa = DataTree(name="Lisa") + bart = xr.DataTree(name="Bart") + lisa = xr.DataTree(name="Lisa") -Each of these node objects knows their own :py:class:`~xarray.core.datatree.DataTree.name`, but they currently have no relationship to one another. +Each of these node objects knows their own :py:class:`~xarray.DataTree.name`, but they currently have no relationship to one another. We can connect them by creating another node representing a common parent, Homer Simpson: .. ipython:: python - homer = DataTree(name="Homer", children={"Bart": bart, "Lisa": lisa}) + homer = xr.DataTree(name="Homer", children={"Bart": bart, "Lisa": lisa}) Here we set the children of Homer in the node's constructor. We now have a small family tree @@ -72,17 +71,17 @@ We now have a small family tree homer where we can see how these individual Simpson family members are related to one another. -The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the :py:class:`~xarray.core.datatree.DataTree.siblings` property: +The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the :py:class:`~xarray.DataTree.siblings` property: .. ipython:: python list(bart.siblings) -But oops, we forgot Homer's third daughter, Maggie! Let's add her by updating Homer's :py:class:`~xarray.core.datatree.DataTree.children` property to include her: +But oops, we forgot Homer's third daughter, Maggie! Let's add her by updating Homer's :py:class:`~xarray.DataTree.children` property to include her: .. ipython:: python - maggie = DataTree(name="Maggie") + maggie = xr.DataTree(name="Maggie") homer.children = {"Bart": bart, "Lisa": lisa, "Maggie": maggie} homer @@ -99,14 +98,14 @@ That's good - updating the properties of our nodes does not break the internal c the fact that distant relatives can mate makes it a directed acyclic graph. Trees of ``DataTree`` objects cannot represent this. -Homer is currently listed as having no parent (the so-called "root node" of this tree), but we can update his :py:class:`~xarray.core.datatree.DataTree.parent` property: +Homer is currently listed as having no parent (the so-called "root node" of this tree), but we can update his :py:class:`~xarray.DataTree.parent` property: .. ipython:: python - abe = DataTree(name="Abe") + abe = xr.DataTree(name="Abe") homer.parent = abe -Abe is now the "root" of this tree, which we can see by examining the :py:class:`~xarray.core.datatree.DataTree.root` property of any node in the tree +Abe is now the "root" of this tree, which we can see by examining the :py:class:`~xarray.DataTree.root` property of any node in the tree .. ipython:: python @@ -122,11 +121,11 @@ We can see the whole tree by printing Abe's node or just part of the tree by pri We can see that Homer is aware of his parentage, and we say that Homer and his children form a "subtree" of the larger Simpson family tree. In episode 28, Abe Simpson reveals that he had another son, Herbert "Herb" Simpson. -We can add Herbert to the family tree without displacing Homer by :py:meth:`~xarray.core.datatree.DataTree.assign`-ing another child to Abe: +We can add Herbert to the family tree without displacing Homer by :py:meth:`~xarray.DataTree.assign`-ing another child to Abe: .. ipython:: python - herbert = DataTree(name="Herb") + herbert = xr.DataTree(name="Herb") abe.assign({"Herbert": herbert}) .. note:: @@ -154,7 +153,7 @@ Let's use a different example of a tree to discuss more complex relationships be .. ipython:: python - vertebrates = DataTree.from_dict( + vertebrates = xr.DataTree.from_dict( name="Vertebrae", d={ "/Sharks": None, @@ -172,7 +171,7 @@ Let's use a different example of a tree to discuss more complex relationships be "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs" ] -We have used the :py:meth:`~xarray.core.datatree.DataTree.from_dict` constructor method as an alternate way to quickly create a whole tree, +We have used the :py:meth:`~xarray.DataTree.from_dict` constructor method as an alternate way to quickly create a whole tree, and :ref:`filesystem paths` (to be explained shortly) to select two nodes of interest. .. ipython:: python @@ -184,7 +183,7 @@ rather than an evolutionary tree). Here both the species and the features used to group them are represented by ``DataTree`` node objects - there is no distinction in types of node. We can however get a list of only the nodes we used to represent species by using the fact that all those nodes have no children - they are "leaf nodes". -We can check if a node is a leaf with :py:meth:`~xarray.core.datatree.DataTree.is_leaf`, and get a list of all leaves with the :py:class:`~xarray.core.datatree.DataTree.leaves` property: +We can check if a node is a leaf with :py:meth:`~xarray.DataTree.is_leaf`, and get a list of all leaves with the :py:class:`~xarray.DataTree.leaves` property: .. ipython:: python @@ -208,7 +207,7 @@ an error will be raised. .. ipython:: python :okexcept: - alien = DataTree(name="Xenomorph") + alien = xr.DataTree(name="Xenomorph") primates.find_common_ancestor(alien) @@ -222,7 +221,7 @@ There are various ways to access the different nodes in a tree. Properties ~~~~~~~~~~ -We can navigate trees using the :py:class:`~xarray.core.datatree.DataTree.parent` and :py:class:`~xarray.core.datatree.DataTree.children` properties of each node, for example: +We can navigate trees using the :py:class:`~xarray.DataTree.parent` and :py:class:`~xarray.DataTree.children` properties of each node, for example: .. ipython:: python @@ -234,24 +233,24 @@ Dictionary-like interface ~~~~~~~~~~~~~~~~~~~~~~~~~ Children are stored on each node as a key-value mapping from name to child node. -They can be accessed and altered via the :py:class:`~xarray.core.datatree.DataTree.__getitem__` and :py:class:`~xarray.core.datatree.DataTree.__setitem__` syntax. -In general :py:class:`~xarray.core.datatree.DataTree.DataTree` objects support almost the entire set of dict-like methods, -including :py:meth:`~xarray.core.datatree.DataTree.keys`, :py:class:`~xarray.core.datatree.DataTree.values`, :py:class:`~xarray.core.datatree.DataTree.items`, -:py:meth:`~xarray.core.datatree.DataTree.__delitem__` and :py:meth:`~xarray.core.datatree.DataTree.update`. +They can be accessed and altered via the :py:class:`~xarray.DataTree.__getitem__` and :py:class:`~xarray.DataTree.__setitem__` syntax. +In general :py:class:`~xarray.DataTree.DataTree` objects support almost the entire set of dict-like methods, +including :py:meth:`~xarray.DataTree.keys`, :py:class:`~xarray.DataTree.values`, :py:class:`~xarray.DataTree.items`, +:py:meth:`~xarray.DataTree.__delitem__` and :py:meth:`~xarray.DataTree.update`. .. ipython:: python vertebrates["Bony Skeleton"]["Ray-finned Fish"] Note that the dict-like interface combines access to child ``DataTree`` nodes and stored ``DataArrays``, -so if we have a node that contains both children and data, calling :py:meth:`~xarray.core.datatree.DataTree.keys` will list both names of child nodes and +so if we have a node that contains both children and data, calling :py:meth:`~xarray.DataTree.keys` will list both names of child nodes and names of data variables: .. ipython:: python - dt = DataTree( + dt = xr.DataTree( data=xr.Dataset({"foo": 0, "bar": 1}), - children={"a": DataTree(), "b": DataTree()}, + children={"a": xr.DataTree(), "b": xr.DataTree()}, ) print(dt) list(dt.keys()) @@ -279,8 +278,8 @@ Each node is like a directory, and each directory can contain both more sub-dire .. note:: Future development will allow you to make the filesystem analogy concrete by - using :py:func:`~xarray.core.datatree.DataTree.open_mfdatatree` or - :py:func:`~xarray.core.datatree.DataTree.save_mfdatatree`. + using :py:func:`~xarray.DataTree.open_mfdatatree` or + :py:func:`~xarray.DataTree.save_mfdatatree`. (`See related issue in GitHub `_) Datatree objects support a syntax inspired by unix-like filesystems, @@ -311,7 +310,7 @@ We can use this with ``__setitem__`` to add a missing entry to our evolutionary .. ipython:: python - primates["../../Two Fenestrae/Crocodiles"] = DataTree() + primates["../../Two Fenestrae/Crocodiles"] = xr.DataTree() print(vertebrates) Given two nodes in a tree, we can also find their relative path: @@ -332,21 +331,21 @@ we can construct a complex tree quickly using the alternative constructor :py:me "/a/b": xr.Dataset({"zed": np.NaN}), "a/c/d": None, } - dt = DataTree.from_dict(d) + dt = xr.DataTree.from_dict(d) dt .. note:: Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path (i.e. the node labelled `"c"` in this case.) - This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`xarray.core.datatree.DataTree.from_dict`. + This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`xarray.DataTree.from_dict`. .. _iterating over trees: Iterating over trees ~~~~~~~~~~~~~~~~~~~~ -You can iterate over every node in a tree using the subtree :py:class:`~xarray.core.datatree.DataTree.subtree` property. +You can iterate over every node in a tree using the subtree :py:class:`~xarray.DataTree.subtree` property. This returns an iterable of nodes, which yields them in depth-first order. .. ipython:: python @@ -354,11 +353,11 @@ This returns an iterable of nodes, which yields them in depth-first order. for node in vertebrates.subtree: print(node.path) -A very useful pattern is to use :py:class:`~xarray.core.datatree.DataTree.subtree` conjunction with the :py:class:`~xarray.core.datatree.DataTree.path` property to manipulate the nodes however you wish, -then rebuild a new tree using :py:meth:`xarray.core.datatree.DataTree.from_dict()`. +A very useful pattern is to use :py:class:`~xarray.DataTree.subtree` conjunction with the :py:class:`~xarray.DataTree.path` property to manipulate the nodes however you wish, +then rebuild a new tree using :py:meth:`xarray.DataTree.from_dict()`. For example, we could keep only the nodes containing data by looping over all nodes, -checking if they contain any data using :py:class:`~xarray.core.datatree.DataTree.has_data`, +checking if they contain any data using :py:class:`~xarray.DataTree.has_data`, then rebuilding a new tree using only the paths of those nodes: .. ipython:: python @@ -381,11 +380,11 @@ Subsetting Tree Nodes We can subset our tree to select only nodes of interest in various ways. Similarly to on a real filesystem, matching nodes by common patterns in their paths is often useful. -We can use :py:meth:`xarray.core.datatree.DataTree.match` for this: +We can use :py:meth:`xarray.DataTree.match` for this: .. ipython:: python - dt = DataTree.from_dict( + dt = xr.DataTree.from_dict( { "/a/A": None, "/a/B": None, @@ -397,13 +396,13 @@ We can use :py:meth:`xarray.core.datatree.DataTree.match` for this: result We can also subset trees by the contents of the nodes. -:py:meth:`xarray.core.datatree.DataTree.filter` retains only the nodes of a tree that meet a certain condition. +:py:meth:`xarray.DataTree.filter` retains only the nodes of a tree that meet a certain condition. For example, we could recreate the Simpson's family tree with the ages of each individual, then filter for only the adults: First lets recreate the tree but with an `age` data variable in every node: .. ipython:: python - simpsons = DataTree.from_dict( + simpsons = xr.DataTree.from_dict( d={ "/": xr.Dataset({"age": 83}), "/Herbert": xr.Dataset({"age": 40}), @@ -424,7 +423,7 @@ Now let's filter out the minors: The result is a new tree, containing only the nodes matching the condition. -(Yes, under the hood :py:meth:`~xarray.core.datatree.DataTree.filter` is just syntactic sugar for the pattern we showed you in :ref:`iterating over trees` !) +(Yes, under the hood :py:meth:`~xarray.DataTree.filter` is just syntactic sugar for the pattern we showed you in :ref:`iterating over trees` !) .. _Tree Contents: @@ -437,7 +436,7 @@ Hollow Trees A concept that can sometimes be useful is that of a "Hollow Tree", which means a tree with data stored only at the leaf nodes. This is useful because certain useful tree manipulation operations only make sense for hollow trees. -You can check if a tree is a hollow tree by using the :py:class:`~xarray.core.datatree.DataTree.is_hollow` property. +You can check if a tree is a hollow tree by using the :py:class:`~xarray.DataTree.is_hollow` property. We can see that the Simpson's family is not hollow because the data variable ``"age"`` is present at some nodes which have children (i.e. Abe and Homer). @@ -475,7 +474,7 @@ let's first create a example scientific dataset. time_stamps1 = time_stamps(n_samples=15, T=1.5) time_stamps2 = time_stamps(n_samples=10, T=1.0) - voltages = DataTree.from_dict( + voltages = xr.DataTree.from_dict( { "/oscilloscope1": xr.Dataset( { @@ -541,7 +540,7 @@ See that the same change (fast-forwarding by adding 10 years to the age of each Mapping Custom Functions Over Trees ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -You can map custom computation over each node in a tree using :py:meth:`xarray.core.datatree.DataTree.map_over_subtree`. +You can map custom computation over each node in a tree using :py:meth:`xarray.DataTree.map_over_subtree`. You can map any function, so long as it takes `xarray.Dataset` objects as one (or more) of the input arguments, and returns one (or more) xarray datasets. @@ -585,14 +584,14 @@ We can check if any two trees are isomorphic using the :py:meth:`DataTree.isomor .. ipython:: python :okexcept: - dt1 = DataTree.from_dict({"a": None, "a/b": None}) - dt2 = DataTree.from_dict({"a": None}) + dt1 = xr.DataTree.from_dict({"a": None, "a/b": None}) + dt2 = xr.DataTree.from_dict({"a": None}) dt1.isomorphic(dt2) - dt3 = DataTree.from_dict({"a": None, "b": None}) + dt3 = xr.DataTree.from_dict({"a": None, "b": None}) dt1.isomorphic(dt3) - dt4 = DataTree.from_dict({"A": None, "A/B": xr.Dataset({"foo": 1})}) + dt4 = xr.DataTree.from_dict({"A": None, "A/B": xr.Dataset({"foo": 1})}) dt1.isomorphic(dt4) If the trees are not isomorphic a :py:class:`~TreeIsomorphismError` will be raised. @@ -606,7 +605,7 @@ we can do arithmetic between them. .. ipython:: python - currents = DataTree.from_dict( + currents = xr.DataTree.from_dict( { "/oscilloscope1": xr.Dataset( { diff --git a/doc/user-guide/io.rst b/doc/user-guide/io.rst index b9133097e32..6a578b917ab 100644 --- a/doc/user-guide/io.rst +++ b/doc/user-guide/io.rst @@ -155,9 +155,9 @@ Groups Whilst netCDF groups can only be loaded individually as ``Dataset`` objects, a whole file of many nested groups can be loaded as a single -:py:class:`xarray.core.datatree.DataTree` object. To open a whole netCDF file as a tree of groups -use the :py:func:`open_datatree` function. To save a DataTree object as a -netCDF file containing many groups, use the :py:meth:`xarray.core.datatree.DataTree.to_netcdf` method. +:py:class:`xarray.DataTree` object. To open a whole netCDF file as a tree of groups +use the :py:func:`xarray.open_datatree` function. To save a DataTree object as a +netCDF file containing many groups, use the :py:meth:`xarray.DataTree.to_netcdf` method. .. _netcdf.group.warning: @@ -919,10 +919,10 @@ Groups ~~~~~~ Nested groups in zarr stores can be represented by loading the store as a -:py:class:`xarray.core.datatree.DataTree` object, similarly to netCDF. To open a whole zarr store as +:py:class:`xarray.DataTree` object, similarly to netCDF. To open a whole zarr store as a tree of groups use the :py:func:`open_datatree` function. To save a ``DataTree`` object as a zarr store containing many groups, use the -:py:meth:`xarray.core.datatree.DataTree.to_zarr()` method. +:py:meth:`xarray.DataTree.to_zarr()` method. .. note:: Note that perfect round-tripping should always be possible with a zarr diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 58f87ace52a..90bbfd947b3 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -22,6 +22,12 @@ v2024.05.1 (unreleased) New Features ~~~~~~~~~~~~ +- ``DataTree`` related functionality is now exposed in the main ``xarray`` public + API. This includes: ``xarray.DataTree``, ``xarray.open_datatree``, + ``xarray.map_over_subtree``, ``xarray.register_datatree_accessor`` and + ``xarray.testing.assert_isomorphic``. + By `Owen Littlejohns `_ and + `Tom Nicholas `_. Breaking changes diff --git a/xarray/__init__.py b/xarray/__init__.py index 0c0d5995f72..3a8cad31cc8 100644 --- a/xarray/__init__.py +++ b/xarray/__init__.py @@ -6,6 +6,7 @@ load_dataset, open_dataarray, open_dataset, + open_datatree, open_mfdataset, save_mfdataset, ) @@ -31,9 +32,12 @@ from xarray.core.coordinates import Coordinates from xarray.core.dataarray import DataArray from xarray.core.dataset import Dataset +from xarray.core.datatree import DataTree +from xarray.core.datatree_mapping import map_over_subtree from xarray.core.extensions import ( register_dataarray_accessor, register_dataset_accessor, + register_datatree_accessor, ) from xarray.core.indexes import Index from xarray.core.indexing import IndexSelResult @@ -79,15 +83,18 @@ "load_dataarray", "load_dataset", "map_blocks", + "map_over_subtree", "merge", "ones_like", "open_dataarray", "open_dataset", + "open_datatree", "open_mfdataset", "open_zarr", "polyval", "register_dataarray_accessor", "register_dataset_accessor", + "register_datatree_accessor", "save_mfdataset", "set_options", "show_versions", @@ -100,6 +107,7 @@ "Coordinates", "DataArray", "Dataset", + "DataTree", "Index", "IndexSelResult", "IndexVariable", diff --git a/xarray/testing/__init__.py b/xarray/testing/__init__.py index 316b0ea5252..e6aa01659cb 100644 --- a/xarray/testing/__init__.py +++ b/xarray/testing/__init__.py @@ -12,6 +12,7 @@ assert_duckarray_equal, assert_equal, assert_identical, + assert_isomorphic, ) __all__ = [ @@ -21,4 +22,5 @@ "assert_duckarray_allclose", "assert_equal", "assert_identical", + "assert_isomorphic", ] From 8779f8f346a95ecb709cc40da2d2634979415823 Mon Sep 17 00:00:00 2001 From: Owen Littlejohns Date: Mon, 20 May 2024 23:55:33 -0400 Subject: [PATCH 05/57] DAS-2067 - Clean up DataTree imports in tests. --- xarray/tests/conftest.py | 5 ++- xarray/tests/test_datatree_mapping.py | 47 +++++++++++++++++---------- xarray/tests/test_extensions.py | 5 +-- xarray/tests/test_formatting.py | 25 +++++++------- xarray/tests/test_formatting_html.py | 7 ++-- 5 files changed, 47 insertions(+), 42 deletions(-) diff --git a/xarray/tests/conftest.py b/xarray/tests/conftest.py index a32b0e08bea..037e2eb6783 100644 --- a/xarray/tests/conftest.py +++ b/xarray/tests/conftest.py @@ -5,8 +5,7 @@ import pytest import xarray as xr -from xarray import DataArray, Dataset -from xarray.core.datatree import DataTree +from xarray import DataArray, Dataset, DataTree from xarray.tests import create_test_data, requires_dask @@ -144,7 +143,7 @@ def create_test_datatree(): """ Create a test datatree with this structure: - + |-- set1 | |-- | | Dimensions: () diff --git a/xarray/tests/test_datatree_mapping.py b/xarray/tests/test_datatree_mapping.py index b8b55613c4a..09a48fc9a05 100644 --- a/xarray/tests/test_datatree_mapping.py +++ b/xarray/tests/test_datatree_mapping.py @@ -2,7 +2,6 @@ import pytest import xarray as xr -from xarray.core.datatree import DataTree from xarray.core.datatree_mapping import ( TreeIsomorphismError, check_isomorphic, @@ -19,8 +18,8 @@ def test_not_a_tree(self): check_isomorphic("s", 1) # type: ignore[arg-type] def test_different_widths(self): - dt1 = DataTree.from_dict(d={"a": empty}) - dt2 = DataTree.from_dict(d={"b": empty, "c": empty}) + dt1 = xr.DataTree.from_dict(d={"a": empty}) + dt2 = xr.DataTree.from_dict(d={"b": empty, "c": empty}) expected_err_str = ( "Number of children on node '/' of the left object: 1\n" "Number of children on node '/' of the right object: 2" @@ -29,8 +28,8 @@ def test_different_widths(self): check_isomorphic(dt1, dt2) def test_different_heights(self): - dt1 = DataTree.from_dict({"a": empty}) - dt2 = DataTree.from_dict({"b": empty, "b/c": empty}) + dt1 = xr.DataTree.from_dict({"a": empty}) + dt2 = xr.DataTree.from_dict({"b": empty, "b/c": empty}) expected_err_str = ( "Number of children on node '/a' of the left object: 0\n" "Number of children on node '/b' of the right object: 1" @@ -39,8 +38,8 @@ def test_different_heights(self): check_isomorphic(dt1, dt2) def test_names_different(self): - dt1 = DataTree.from_dict({"a": xr.Dataset()}) - dt2 = DataTree.from_dict({"b": empty}) + dt1 = xr.DataTree.from_dict({"a": xr.Dataset()}) + dt2 = xr.DataTree.from_dict({"b": empty}) expected_err_str = ( "Node '/a' in the left object has name 'a'\n" "Node '/b' in the right object has name 'b'" @@ -49,31 +48,43 @@ def test_names_different(self): check_isomorphic(dt1, dt2, require_names_equal=True) def test_isomorphic_names_equal(self): - dt1 = DataTree.from_dict({"a": empty, "b": empty, "b/c": empty, "b/d": empty}) - dt2 = DataTree.from_dict({"a": empty, "b": empty, "b/c": empty, "b/d": empty}) + dt1 = xr.DataTree.from_dict( + {"a": empty, "b": empty, "b/c": empty, "b/d": empty} + ) + dt2 = xr.DataTree.from_dict( + {"a": empty, "b": empty, "b/c": empty, "b/d": empty} + ) check_isomorphic(dt1, dt2, require_names_equal=True) def test_isomorphic_ordering(self): - dt1 = DataTree.from_dict({"a": empty, "b": empty, "b/d": empty, "b/c": empty}) - dt2 = DataTree.from_dict({"a": empty, "b": empty, "b/c": empty, "b/d": empty}) + dt1 = xr.DataTree.from_dict( + {"a": empty, "b": empty, "b/d": empty, "b/c": empty} + ) + dt2 = xr.DataTree.from_dict( + {"a": empty, "b": empty, "b/c": empty, "b/d": empty} + ) check_isomorphic(dt1, dt2, require_names_equal=False) def test_isomorphic_names_not_equal(self): - dt1 = DataTree.from_dict({"a": empty, "b": empty, "b/c": empty, "b/d": empty}) - dt2 = DataTree.from_dict({"A": empty, "B": empty, "B/C": empty, "B/D": empty}) + dt1 = xr.DataTree.from_dict( + {"a": empty, "b": empty, "b/c": empty, "b/d": empty} + ) + dt2 = xr.DataTree.from_dict( + {"A": empty, "B": empty, "B/C": empty, "B/D": empty} + ) check_isomorphic(dt1, dt2) def test_not_isomorphic_complex_tree(self, create_test_datatree): dt1 = create_test_datatree() dt2 = create_test_datatree() - dt2["set1/set2/extra"] = DataTree(name="extra") + dt2["set1/set2/extra"] = xr.DataTree(name="extra") with pytest.raises(TreeIsomorphismError, match="/set1/set2"): check_isomorphic(dt1, dt2) def test_checking_from_root(self, create_test_datatree): dt1 = create_test_datatree() dt2 = create_test_datatree() - real_root: DataTree = DataTree(name="real root") + real_root: xr.DataTree = xr.DataTree(name="real root") dt2.name = "not_real_root" dt2.parent = real_root with pytest.raises(TreeIsomorphismError): @@ -92,7 +103,7 @@ def times_ten(ds): def test_not_isomorphic(self, create_test_datatree): dt1 = create_test_datatree() dt2 = create_test_datatree() - dt2["set1/set2/extra"] = DataTree(name="extra") + dt2["set1/set2/extra"] = xr.DataTree(name="extra") @map_over_subtree def times_ten(ds1, ds2): @@ -312,7 +323,7 @@ def test_construct_using_type(self): dims=["x", "y", "time"], coords={"area": (["x", "y"], np.random.rand(2, 6))}, ).to_dataset(name="data") - dt = DataTree.from_dict({"a": a, "b": b}) + dt = xr.DataTree.from_dict({"a": a, "b": b}) def weighted_mean(ds): return ds.weighted(ds.area).mean(["x", "y"]) @@ -320,7 +331,7 @@ def weighted_mean(ds): dt.map_over_subtree(weighted_mean) def test_alter_inplace_forbidden(self): - simpsons = DataTree.from_dict( + simpsons = xr.DataTree.from_dict( d={ "/": xr.Dataset({"age": 83}), "/Herbert": xr.Dataset({"age": 40}), diff --git a/xarray/tests/test_extensions.py b/xarray/tests/test_extensions.py index 7cfffd68620..06f84b3b06c 100644 --- a/xarray/tests/test_extensions.py +++ b/xarray/tests/test_extensions.py @@ -5,9 +5,6 @@ import pytest import xarray as xr - -# TODO: Remove imports in favour of xr.DataTree etc, once part of public API -from xarray.core.datatree import DataTree from xarray.core.extensions import register_datatree_accessor from xarray.tests import assert_identical @@ -37,7 +34,7 @@ def __init__(self, xarray_obj): def foo(self): return "bar" - dt: DataTree = DataTree() + dt: xr.DataTree = xr.DataTree() assert dt.demo.foo == "bar" ds = xr.Dataset() diff --git a/xarray/tests/test_formatting.py b/xarray/tests/test_formatting.py index 2c40ac88f98..7f6518de4f2 100644 --- a/xarray/tests/test_formatting.py +++ b/xarray/tests/test_formatting.py @@ -9,7 +9,6 @@ import xarray as xr from xarray.core import formatting -from xarray.core.datatree import DataTree # TODO: Remove when can do xr.DataTree from xarray.tests import requires_cftime, requires_dask, requires_netCDF4 ON_WINDOWS = sys.platform == "win32" @@ -556,13 +555,13 @@ def test_array_scalar_format(self) -> None: assert "Using format_spec is only supported" in str(excinfo.value) def test_datatree_print_empty_node(self): - dt: DataTree = DataTree(name="root") + dt: xr.DataTree = xr.DataTree(name="root") printout = dt.__str__() assert printout == "DataTree('root', parent=None)" def test_datatree_print_empty_node_with_attrs(self): dat = xr.Dataset(attrs={"note": "has attrs"}) - dt: DataTree = DataTree(name="root", data=dat) + dt: xr.DataTree = xr.DataTree(name="root", data=dat) printout = dt.__str__() assert printout == dedent( """\ @@ -576,7 +575,7 @@ def test_datatree_print_empty_node_with_attrs(self): def test_datatree_print_node_with_data(self): dat = xr.Dataset({"a": [0, 2]}) - dt: DataTree = DataTree(name="root", data=dat) + dt: xr.DataTree = xr.DataTree(name="root", data=dat) printout = dt.__str__() expected = [ "DataTree('root', parent=None)", @@ -591,19 +590,19 @@ def test_datatree_print_node_with_data(self): def test_datatree_printout_nested_node(self): dat = xr.Dataset({"a": [0, 2]}) - root: DataTree = DataTree(name="root") - DataTree(name="results", data=dat, parent=root) + root: xr.DataTree = xr.DataTree(name="root") + xr.DataTree(name="results", data=dat, parent=root) printout = root.__str__() assert printout.splitlines()[2].startswith(" ") def test_datatree_repr_of_node_with_data(self): dat = xr.Dataset({"a": [0, 2]}) - dt: DataTree = DataTree(name="root", data=dat) + dt: xr.DataTree = xr.DataTree(name="root", data=dat) assert "Coordinates" in repr(dt) def test_diff_datatree_repr_structure(self): - dt_1: DataTree = DataTree.from_dict({"a": None, "a/b": None, "a/c": None}) - dt_2: DataTree = DataTree.from_dict({"d": None, "d/e": None}) + dt_1: xr.DataTree = xr.DataTree.from_dict({"a": None, "a/b": None, "a/c": None}) + dt_2: xr.DataTree = xr.DataTree.from_dict({"d": None, "d/e": None}) expected = dedent( """\ @@ -616,8 +615,8 @@ def test_diff_datatree_repr_structure(self): assert actual == expected def test_diff_datatree_repr_node_names(self): - dt_1: DataTree = DataTree.from_dict({"a": None}) - dt_2: DataTree = DataTree.from_dict({"b": None}) + dt_1: xr.DataTree = xr.DataTree.from_dict({"a": None}) + dt_2: xr.DataTree = xr.DataTree.from_dict({"b": None}) expected = dedent( """\ @@ -633,10 +632,10 @@ def test_diff_datatree_repr_node_data(self): # casting to int64 explicitly ensures that int64s are created on all architectures ds1 = xr.Dataset({"u": np.int64(0), "v": np.int64(1)}) ds3 = xr.Dataset({"w": np.int64(5)}) - dt_1: DataTree = DataTree.from_dict({"a": ds1, "a/b": ds3}) + dt_1: xr.DataTree = xr.DataTree.from_dict({"a": ds1, "a/b": ds3}) ds2 = xr.Dataset({"u": np.int64(0)}) ds4 = xr.Dataset({"w": np.int64(6)}) - dt_2: DataTree = DataTree.from_dict({"a": ds2, "a/b": ds4}) + dt_2: xr.DataTree = xr.DataTree.from_dict({"a": ds2, "a/b": ds4}) expected = dedent( """\ diff --git a/xarray/tests/test_formatting_html.py b/xarray/tests/test_formatting_html.py index ada7f75b21b..1984bc84884 100644 --- a/xarray/tests/test_formatting_html.py +++ b/xarray/tests/test_formatting_html.py @@ -7,7 +7,6 @@ import xarray as xr from xarray.core import formatting_html as fh from xarray.core.coordinates import Coordinates -from xarray.core.datatree import DataTree @pytest.fixture @@ -212,14 +211,14 @@ class Test_summarize_datatree_children: func = staticmethod(fh.summarize_datatree_children) @pytest.fixture(scope="class") - def childfree_tree_factory(self): + def childfree_tree_factory(self) -> xr.DataTree: """ Fixture for a child-free DataTree factory. """ from random import randint def _childfree_tree_factory(): - return DataTree( + return xr.DataTree( data=xr.Dataset({"z": ("y", [randint(1, 100) for _ in range(3)])}) ) @@ -264,7 +263,7 @@ def test_empty_mapping(self): """ Test with an empty mapping of children. """ - children: dict[str, DataTree] = {} + children: dict[str, xr.DataTree] = {} assert self.func(children) == ( "
" "
" From e2e88e365ddd7d5d1f1ae2d988f8667401f3db93 Mon Sep 17 00:00:00 2001 From: Owen Littlejohns Date: Tue, 21 May 2024 00:03:12 -0400 Subject: [PATCH 06/57] DAS-2155 - Revert typing change to Test_summarize_datatree_children.childfree_tree_factory. --- xarray/tests/test_formatting_html.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/tests/test_formatting_html.py b/xarray/tests/test_formatting_html.py index 1984bc84884..349a3a8fcd4 100644 --- a/xarray/tests/test_formatting_html.py +++ b/xarray/tests/test_formatting_html.py @@ -211,7 +211,7 @@ class Test_summarize_datatree_children: func = staticmethod(fh.summarize_datatree_children) @pytest.fixture(scope="class") - def childfree_tree_factory(self) -> xr.DataTree: + def childfree_tree_factory(self): """ Fixture for a child-free DataTree factory. """ From f813910b60d68570a30049421930daf3026c438e Mon Sep 17 00:00:00 2001 From: Owen Littlejohns Date: Tue, 21 May 2024 18:42:41 -0400 Subject: [PATCH 07/57] DAS-2155 - Move quick-start information to getting-started-guide. --- doc/getting-started-guide/quick-overview.rst | 79 +++++++++++++++++++ doc/user-guide/datatree.rst | 80 -------------------- doc/user-guide/index.rst | 1 - 3 files changed, 79 insertions(+), 81 deletions(-) delete mode 100644 doc/user-guide/datatree.rst diff --git a/doc/getting-started-guide/quick-overview.rst b/doc/getting-started-guide/quick-overview.rst index ee13fea8bf1..eb49a4565a4 100644 --- a/doc/getting-started-guide/quick-overview.rst +++ b/doc/getting-started-guide/quick-overview.rst @@ -228,3 +228,82 @@ You can directly read and write xarray objects to disk using :py:meth:`~xarray.D It is common for datasets to be distributed across multiple files (commonly one file per timestep). Xarray supports this use-case by providing the :py:meth:`~xarray.open_mfdataset` and the :py:meth:`~xarray.save_mfdataset` methods. For more, see :ref:`io`. + +DataTrees +--------- + +:py:class:`xarray.DataTree` is a tree-like container of :py:class:`xarray.DataArray` objects, organised into multiple mutually allignable groups. +You can think of it like a (recursive) ``dict`` of :py:class:`xarray.Dataset` objects. + +Let's first make some example xarray datasets (following on from xarray's +`quick overview `_ page): + +.. ipython:: python + + import numpy as np + import xarray as xr + + data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]}) + ds = xr.Dataset(dict(foo=data, bar=("x", [1, 2]), baz=np.pi)) + ds + + ds2 = ds.interp(coords={"x": [10, 12, 14, 16, 18, 20]}) + ds2 + + ds3 = xr.Dataset( + dict(people=["alice", "bob"], heights=("people", [1.57, 1.82])), + coords={"species": "human"}, + ) + ds3 + +Now we'll put this data into a multi-group tree: + +.. ipython:: python + + dt = xr.DataTree.from_dict( + {"simulation/coarse": ds, "simulation/fine": ds2, "/": ds3} + ) + dt + +This creates a datatree with various groups. We have one root group, containing information about individual people. +(This root group can be named, but here is unnamed, so is referred to with ``"/"``, same as the root of a unix-like filesystem.) +The root group then has one subgroup ``simulation``, which contains no data itself but does contain another two subgroups, +named ``fine`` and ``coarse``. + +The (sub-)sub-groups ``fine`` and ``coarse`` contain two very similar datasets. +They both have an ``"x"`` dimension, but the dimension is of different lengths in each group, which makes the data in each group unalignable. +In the root group we placed some completely unrelated information, showing how we can use a tree to store heterogenous data. + +The constraints on each group are therefore the same as the constraint on dataarrays within a single dataset. + +We created the sub-groups using a filesystem-like syntax, and accessing groups works the same way. +We can access individual dataarrays in a similar fashion + +.. ipython:: python + + dt["simulation/coarse/foo"] + +and we can also pull out the data in a particular group as a ``Dataset`` object using ``.ds``: + +.. ipython:: python + + dt["simulation/coarse"].ds + +Operations map over subtrees, so we can take a mean over the ``x`` dimension of both the ``fine`` and ``coarse`` groups just by + +.. ipython:: python + + avg = dt["simulation"].mean(dim="x") + avg + +Here the ``"x"`` dimension used is always the one local to that sub-group. + +You can do almost everything you can do with ``Dataset`` objects with ``DataTree`` objects +(including indexing and arithmetic), as operations will be mapped over every sub-group in the tree. +This allows you to work with multiple groups of non-alignable variables at once. + +.. note:: + + If all of your variables are mutually alignable + (i.e. they live on the same grid, such that every common dimension name maps to the same length), + then you probably don't need :py:class:`xarray.DataTree`, and should consider just sticking with ``xarray.Dataset``. diff --git a/doc/user-guide/datatree.rst b/doc/user-guide/datatree.rst deleted file mode 100644 index ef210ef7bd9..00000000000 --- a/doc/user-guide/datatree.rst +++ /dev/null @@ -1,80 +0,0 @@ -.. _datatree: - -DataTrees ---------- - -:py:class:`xarray.DataTree` is a tree-like container of :py:class:`xarray.DataArray` objects, organised into multiple mutually allignable groups. -You can think of it like a (recursive) ``dict`` of :py:class:`xarray.Dataset` objects. - -Let's first make some example xarray datasets (following on from xarray's -`quick overview `_ page): - -.. ipython:: python - - import numpy as np - import xarray as xr - - data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]}) - ds = xr.Dataset(dict(foo=data, bar=("x", [1, 2]), baz=np.pi)) - ds - - ds2 = ds.interp(coords={"x": [10, 12, 14, 16, 18, 20]}) - ds2 - - ds3 = xr.Dataset( - dict(people=["alice", "bob"], heights=("people", [1.57, 1.82])), - coords={"species": "human"}, - ) - ds3 - -Now we'll put this data into a multi-group tree: - -.. ipython:: python - - dt = xr.DataTree.from_dict( - {"simulation/coarse": ds, "simulation/fine": ds2, "/": ds3} - ) - dt - -This creates a datatree with various groups. We have one root group, containing information about individual people. -(This root group can be named, but here is unnamed, so is referred to with ``"/"``, same as the root of a unix-like filesystem.) -The root group then has one subgroup ``simulation``, which contains no data itself but does contain another two subgroups, -named ``fine`` and ``coarse``. - -The (sub-)sub-groups ``fine`` and ``coarse`` contain two very similar datasets. -They both have an ``"x"`` dimension, but the dimension is of different lengths in each group, which makes the data in each group unalignable. -In the root group we placed some completely unrelated information, showing how we can use a tree to store heterogenous data. - -The constraints on each group are therefore the same as the constraint on dataarrays within a single dataset. - -We created the sub-groups using a filesystem-like syntax, and accessing groups works the same way. -We can access individual dataarrays in a similar fashion - -.. ipython:: python - - dt["simulation/coarse/foo"] - -and we can also pull out the data in a particular group as a ``Dataset`` object using ``.ds``: - -.. ipython:: python - - dt["simulation/coarse"].ds - -Operations map over subtrees, so we can take a mean over the ``x`` dimension of both the ``fine`` and ``coarse`` groups just by - -.. ipython:: python - - avg = dt["simulation"].mean(dim="x") - avg - -Here the ``"x"`` dimension used is always the one local to that sub-group. - -You can do almost everything you can do with ``Dataset`` objects with ``DataTree`` objects -(including indexing and arithmetic), as operations will be mapped over every sub-group in the tree. -This allows you to work with multiple groups of non-alignable variables at once. - -.. note:: - - If all of your variables are mutually alignable - (i.e. they live on the same grid, such that every common dimension name maps to the same length), - then you probably don't need :py:class:`xarray.DataTree`, and should consider just sticking with ``xarray.Dataset``. diff --git a/doc/user-guide/index.rst b/doc/user-guide/index.rst index 57e3ded3c9a..d8c4964457b 100644 --- a/doc/user-guide/index.rst +++ b/doc/user-guide/index.rst @@ -27,5 +27,4 @@ examples that describe many common tasks that you can accomplish with xarray. options testing duckarrays - datatree hierarchical-data From c38796f9e532aeecd9f724f50473516a8cbd916c Mon Sep 17 00:00:00 2001 From: Owen Littlejohns Date: Thu, 23 May 2024 00:32:14 -0400 Subject: [PATCH 08/57] DAS-2155 - Minor items in documentation. --- doc/getting-started-guide/quick-overview.rst | 7 +++---- doc/roadmap.rst | 14 +++++--------- doc/user-guide/data-structures.rst | 8 ++++---- doc/user-guide/io.rst | 2 +- doc/user-guide/terminology.rst | 6 +++--- 5 files changed, 16 insertions(+), 21 deletions(-) diff --git a/doc/getting-started-guide/quick-overview.rst b/doc/getting-started-guide/quick-overview.rst index eb49a4565a4..8ca03331092 100644 --- a/doc/getting-started-guide/quick-overview.rst +++ b/doc/getting-started-guide/quick-overview.rst @@ -235,8 +235,7 @@ DataTrees :py:class:`xarray.DataTree` is a tree-like container of :py:class:`xarray.DataArray` objects, organised into multiple mutually allignable groups. You can think of it like a (recursive) ``dict`` of :py:class:`xarray.Dataset` objects. -Let's first make some example xarray datasets (following on from xarray's -`quick overview `_ page): +Let's first make some example xarray datasets: .. ipython:: python @@ -274,7 +273,7 @@ The (sub-)sub-groups ``fine`` and ``coarse`` contain two very similar datasets. They both have an ``"x"`` dimension, but the dimension is of different lengths in each group, which makes the data in each group unalignable. In the root group we placed some completely unrelated information, showing how we can use a tree to store heterogenous data. -The constraints on each group are therefore the same as the constraint on dataarrays within a single dataset. +The constraints on each group are therefore the same as the constraint on DataArrays within a single dataset. We created the sub-groups using a filesystem-like syntax, and accessing groups works the same way. We can access individual dataarrays in a similar fashion @@ -283,7 +282,7 @@ We can access individual dataarrays in a similar fashion dt["simulation/coarse/foo"] -and we can also pull out the data in a particular group as a ``Dataset`` object using ``.ds``: +and we can also view the data in a particular group as a ``Dataset`` object using ``.ds``: .. ipython:: python diff --git a/doc/roadmap.rst b/doc/roadmap.rst index da927de6854..4d6bebc8a5d 100644 --- a/doc/roadmap.rst +++ b/doc/roadmap.rst @@ -201,11 +201,7 @@ extensions. Tree-like data structure ++++++++++++++++++++++++ -.. note:: - Work on merging `DataTree `__ into - xarray is currently underway. - -Xarray’s highest-level object is currently an ``xarray.Dataset``, whose data +Xarray’s highest-level object was previously an ``xarray.Dataset``, whose data model echoes that of a single netCDF group. However real-world datasets are often better represented by a collection of related Datasets. Particular common examples include: @@ -219,13 +215,13 @@ examples include: - Whole netCDF files containing multiple groups. - Comparison of output from many similar models (such as in the IPCC's Coupled Model Intercomparison Projects) -A new tree-like data structure which is essentially a structured hierarchical -collection of Datasets could represent these cases, and would instead map to -multiple netCDF groups (see :issue:`4118`). +A new tree-like data structure, ``xarray.DataTree``, which is essentially a +structured hierarchical collection of Datasets, represents these cases and +instead maps to multiple netCDF groups (see :issue:`4118`). Currently there are several libraries which have wrapped xarray in order to build domain-specific data structures (e.g. `xarray-multiscale `__.), -but a general ``xarray.DataTree`` object obviates the need for these and] +but the general ``xarray.DataTree`` object obviates the need for these and consolidates effort in a single domain-agnostic tool, much as xarray has already achieved. Labeled array without coordinates diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 9cd739069c8..fade22c666d 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -527,9 +527,9 @@ A single datatree object is known as a "node", and its position relative to other nodes is defined by two more key properties: - ``children``: An ordered dictionary mapping from names to other ``DataTree`` - objects, known as its' "child nodes". + objects, known as its "child nodes". - ``parent``: The single ``DataTree`` object whose children this datatree is a - member of, known as its' "parent node". + member of, known as its "parent node". Each child automatically knows about its parent node, and a node without a parent is known as a "root" node (represented by the ``parent`` attribute @@ -607,7 +607,8 @@ Our tree now has three nodes within it: dt It is at tree construction time that consistency checks are enforced. For -instance, if we try to create a `cycle` the constructor will raise an error: +instance, if we try to create a `cycle` the constructor will raise an error +(``InvalidTreeError``): .. ipython:: python :okexcept: @@ -616,7 +617,6 @@ instance, if we try to create a `cycle` the constructor will raise an error: Alternatively you can also create a ``DataTree`` object from -- An ``xarray.Dataset`` using ``Dataset.to_node()`` (not yet implemented), - A dictionary mapping directory-like paths to either ``DataTree`` nodes or data, using :py:meth:`xarray.DataTree.from_dict()`, - A netCDF or Zarr file on disk with :py:func:`open_datatree()`. See diff --git a/doc/user-guide/io.rst b/doc/user-guide/io.rst index 6a578b917ab..6f4463a18c3 100644 --- a/doc/user-guide/io.rst +++ b/doc/user-guide/io.rst @@ -169,7 +169,7 @@ netCDF file containing many groups, use the :py:meth:`xarray.DataTree.to_netcdf` In particular in the netCDF data model dimensions are entities that can exist regardless of whether any variable possesses them. This is in contrast to `xarray's data model `_ - (and hence :ref:`datatree's data model `) in which the + (and hence :ref:`DataTree's data model `) in which the dimensions of a (Dataset/Tree) object are simply the set of dimensions present across all variables in that dataset. diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 7adfa16dedd..a22d24c617d 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -259,9 +259,9 @@ complete examples, please consult the relevant documentation.* DataTree A tree-like collection of ``Dataset`` objects. A *tree* is made up of one or more *nodes*, each of which can store the same information as a single ``Dataset`` (accessed via ``.ds``). - This data is stored in the same way as in a ``Dataset``, i.e. in the form of data variables - (see **Variable** in the `corresponding xarray terminology page `_), - dimensions, coordinates, and attributes. + This data is stored in the same way as in a ``Dataset``, i.e. in the form of data + :term:`variables`, :term:`dimensions`, :term:`coordinates`, + and attributes. The nodes in a tree are linked to one another, and each node is it's own instance of ``DataTree`` object. Each node can have zero or more *children* (stored in a dictionary-like From 02287770d38660bf845c485d652d5d8bc4cd83db Mon Sep 17 00:00:00 2001 From: Owen Littlejohns Date: Tue, 28 May 2024 17:14:05 -0400 Subject: [PATCH 09/57] DAS-2155 - Add DataTree related exceptions to public API. --- doc/api.rst | 6 +++--- doc/user-guide/data-structures.rst | 2 +- doc/user-guide/hierarchical-data.rst | 2 +- xarray/__init__.py | 6 +++++- 4 files changed, 10 insertions(+), 6 deletions(-) diff --git a/doc/api.rst b/doc/api.rst index 28b0f5adf2d..f0f884f15a4 100644 --- a/doc/api.rst +++ b/doc/api.rst @@ -1426,9 +1426,9 @@ Exceptions raised when manipulating trees. .. autosummary:: :toctree: generated/ - xarray.core.datatree_mapping.TreeIsomorphismError - xarray.core.treenode.InvalidTreeError - xarray.core.treenode.NotFoundInTreeError + xarray.TreeIsomorphismError + xarray.InvalidTreeError + xarray.NotFoundInTreeError Advanced API ============ diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index fade22c666d..dea5a720cfc 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -608,7 +608,7 @@ Our tree now has three nodes within it: It is at tree construction time that consistency checks are enforced. For instance, if we try to create a `cycle` the constructor will raise an error -(``InvalidTreeError``): +(:py:class:`~xarray.InvalidTreeError`): .. ipython:: python :okexcept: diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 543b52c2d5d..b1a7f43f4f5 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -137,7 +137,7 @@ We can add Herbert to the family tree without displacing Homer by :py:meth:`~xar Certain manipulations of our tree are forbidden, if they would create an inconsistent result. In episode 51 of the show Futurama, Philip J. Fry travels back in time and accidentally becomes his own Grandfather. -If we try similar time-travelling hijinks with Homer, we get a :py:class:`InvalidTreeError` raised: +If we try similar time-travelling hijinks with Homer, we get a :py:class:`~xarray.InvalidTreeError` raised: .. ipython:: python :okexcept: diff --git a/xarray/__init__.py b/xarray/__init__.py index 3a8cad31cc8..35ae718da3d 100644 --- a/xarray/__init__.py +++ b/xarray/__init__.py @@ -33,7 +33,7 @@ from xarray.core.dataarray import DataArray from xarray.core.dataset import Dataset from xarray.core.datatree import DataTree -from xarray.core.datatree_mapping import map_over_subtree +from xarray.core.datatree_mapping import TreeIsomorphismError, map_over_subtree from xarray.core.extensions import ( register_dataarray_accessor, register_dataset_accessor, @@ -44,6 +44,7 @@ from xarray.core.merge import Context, MergeError, merge from xarray.core.options import get_options, set_options from xarray.core.parallel import map_blocks +from xarray.core.treenode import InvalidTreeError, NotFoundInTreeError from xarray.core.variable import IndexVariable, Variable, as_variable from xarray.namedarray.core import NamedArray from xarray.util.print_versions import show_versions @@ -114,8 +115,11 @@ "Variable", "NamedArray", # Exceptions + "InvalidTreeError", "MergeError", + "NotFoundInTreeError", "SerializationWarning", + "TreeIsomorphismError", # Constants "__version__", "ALL_DIMS", From 4f8cc1c53a2b8155f69d4db717698ce5e60d8945 Mon Sep 17 00:00:00 2001 From: Owen Littlejohns Date: Thu, 30 May 2024 17:15:58 -0400 Subject: [PATCH 10/57] DAS-2155 - Add note regarding DataTree.to_netcdf. --- doc/user-guide/io.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/doc/user-guide/io.rst b/doc/user-guide/io.rst index 6f4463a18c3..824cc06cea1 100644 --- a/doc/user-guide/io.rst +++ b/doc/user-guide/io.rst @@ -109,6 +109,12 @@ string, e.g., to access subgroup 'bar' within group 'foo' pass pass ``mode='a'`` to ``to_netcdf`` to ensure that each call does not delete the file. +.. tip:: + + It is recommended to use :py:class:`~xarray.DataTree` to represent + hierarchical data, and to use the :py:meth:`xarray.DataTree.to_netcdf` method + when writing hierarchical data to a netCDF file. + Data is *always* loaded lazily from netCDF files. You can manipulate, slice and subset Dataset and DataArray objects, and no array values are loaded into memory until you try to perform some sort of actual computation. For an example of how these From 983de0cc861bf03189a45d4f991e1e82854860fc Mon Sep 17 00:00:00 2001 From: owenlittlejohns Date: Mon, 3 Jun 2024 11:29:11 -0400 Subject: [PATCH 11/57] Update doc/getting-started-guide/quick-overview.rst Co-authored-by: Matt Savoie --- doc/getting-started-guide/quick-overview.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/getting-started-guide/quick-overview.rst b/doc/getting-started-guide/quick-overview.rst index 8ca03331092..052cc503311 100644 --- a/doc/getting-started-guide/quick-overview.rst +++ b/doc/getting-started-guide/quick-overview.rst @@ -276,7 +276,7 @@ In the root group we placed some completely unrelated information, showing how w The constraints on each group are therefore the same as the constraint on DataArrays within a single dataset. We created the sub-groups using a filesystem-like syntax, and accessing groups works the same way. -We can access individual dataarrays in a similar fashion +We can access individual DataArrays in a similar fashion .. ipython:: python From b303d6255d762f0a82188ff6446b25a7bc82aadb Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Tue, 23 Jul 2024 16:51:53 -0600 Subject: [PATCH 12/57] DAS-2155: Updates imports from benchmark PR. --- asv_bench/benchmarks/dataset_io.py | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/asv_bench/benchmarks/dataset_io.py b/asv_bench/benchmarks/dataset_io.py index 0956be67dad..06035e9ae86 100644 --- a/asv_bench/benchmarks/dataset_io.py +++ b/asv_bench/benchmarks/dataset_io.py @@ -7,8 +7,6 @@ import pandas as pd import xarray as xr -from xarray.backends.api import open_datatree -from xarray.core.datatree import DataTree from . import _skip_slow, parameterized, randint, randn, requires_dask @@ -556,7 +554,7 @@ def make_datatree(self, nchildren=10): for group in range(self.nchildren) } dtree = root | nested_tree1 | nested_tree2 | nested_tree3 - self.dtree = DataTree.from_dict(dtree) + self.dtree = xr.DataTree.from_dict(dtree) class IOReadDataTreeNetCDF4(IONestedDataTree): @@ -574,10 +572,10 @@ def setup(self): dtree.to_netcdf(filepath=self.filepath) def time_load_datatree_netcdf4(self): - open_datatree(self.filepath, engine="netcdf4").load() + xr.open_datatree(self.filepath, engine="netcdf4").load() def time_open_datatree_netcdf4(self): - open_datatree(self.filepath, engine="netcdf4") + xr.open_datatree(self.filepath, engine="netcdf4") class IOWriteNetCDFDask: From bf4413ba0db6ce63cea7f3535d8c489750fb27b8 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Wed, 24 Jul 2024 08:54:38 -0600 Subject: [PATCH 13/57] DAS-2155: Oriol suggestions. Thanks --- doc/api.rst | 2 +- doc/conf.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/api.rst b/doc/api.rst index 6d89c6f7c14..b4e573fdec3 100644 --- a/doc/api.rst +++ b/doc/api.rst @@ -658,7 +658,7 @@ This interface echoes that of ``xarray.Dataset``. Dictionary Interface -------------------- -``DataTree`` objects also have a dict-like interface mapping keys to either ``xarray.DataArray``s or to child ``DataTree`` nodes. +``DataTree`` objects also have a dict-like interface mapping keys to either ``xarray.DataArray`` or to child ``DataTree`` nodes. .. autosummary:: :toctree: generated/ diff --git a/doc/conf.py b/doc/conf.py index ddf895b71a5..69ea5b1f443 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -156,7 +156,7 @@ "DataArray": "~xarray.DataArray", "Dataset": "~xarray.Dataset", "Variable": "~xarray.Variable", - "DataTree": "~xarray.core.datatree.DataTree", + "DataTree": "~xarray.DataTree", "DatasetGroupBy": "~xarray.core.groupby.DatasetGroupBy", "DataArrayGroupBy": "~xarray.core.groupby.DataArrayGroupBy", "Grouper": "~xarray.core.groupers.Grouper", From 068bab28bcaf8a5af25a229e6ec60b5647f983b2 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Wed, 24 Jul 2024 08:59:07 -0600 Subject: [PATCH 14/57] DAS-2155: Fix build errors in docs for numpy 2.0 --- doc/user-guide/data-structures.rst | 4 ++-- doc/user-guide/hierarchical-data.rst | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index dea5a720cfc..16a302f1562 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -595,7 +595,7 @@ or by dynamically updating the attributes of one node to refer to another: .. ipython:: python # add a second child by first creating a new node ... - ds3 = xr.Dataset({"zed": np.NaN}) + ds3 = xr.Dataset({"zed": np.nan}) node3 = xr.DataTree(name="b", data=ds3) # ... then updating its .parent property node3.parent = dt @@ -674,7 +674,7 @@ datatree from scratch, we could have written: dt = xr.DataTree(name="root") dt["foo"] = "orange" dt["a"] = xr.DataTree(data=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])})) - dt["a/b/zed"] = np.NaN + dt["a/b/zed"] = np.nan dt To change the variables in a node of a ``DataTree``, you can use all the diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index b1a7f43f4f5..e1b748deb3c 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -328,7 +328,7 @@ we can construct a complex tree quickly using the alternative constructor :py:me d = { "/": xr.Dataset({"foo": "orange"}), "/a": xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}), - "/a/b": xr.Dataset({"zed": np.NaN}), + "/a/b": xr.Dataset({"zed": np.nan}), "a/c/d": None, } dt = xr.DataTree.from_dict(d) From 2698681a4483925b488c251ac0dfd57e3e9170b6 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Wed, 24 Jul 2024 16:53:32 -0600 Subject: [PATCH 15/57] DAS-2155: First stab at updatating quick-overview.rst for inherited coords. --- doc/api.rst | 5 ++- doc/getting-started-guide/quick-overview.rst | 40 ++++++++++++++++---- xarray/core/datatree.py | 2 +- 3 files changed, 36 insertions(+), 11 deletions(-) diff --git a/doc/api.rst b/doc/api.rst index b4e573fdec3..4e9b64f30c3 100644 --- a/doc/api.rst +++ b/doc/api.rst @@ -658,7 +658,7 @@ This interface echoes that of ``xarray.Dataset``. Dictionary Interface -------------------- -``DataTree`` objects also have a dict-like interface mapping keys to either ``xarray.DataArray`` or to child ``DataTree`` nodes. +``DataTree`` objects also have a dict-like interface mapping keys to either ``xarray.DataArray`` s or to child ``DataTree`` nodes. .. autosummary:: :toctree: generated/ @@ -952,7 +952,8 @@ DataTree methods .. autosummary:: :toctree: generated/ - xarray.backends.api.open_datatree + open_datatree + map_over_subtree DataTree.to_dict DataTree.to_netcdf DataTree.to_zarr diff --git a/doc/getting-started-guide/quick-overview.rst b/doc/getting-started-guide/quick-overview.rst index 052cc503311..74792d70e5b 100644 --- a/doc/getting-started-guide/quick-overview.rst +++ b/doc/getting-started-guide/quick-overview.rst @@ -232,8 +232,7 @@ It is common for datasets to be distributed across multiple files (commonly one DataTrees --------- -:py:class:`xarray.DataTree` is a tree-like container of :py:class:`xarray.DataArray` objects, organised into multiple mutually allignable groups. -You can think of it like a (recursive) ``dict`` of :py:class:`xarray.Dataset` objects. +:py:class:`xarray.DataTree` is a tree-like container of :py:class:`xarray.DataArray` objects, organised into multiple mutually alignable groups. You can think of it like a (recursive) ``dict`` of :py:class:`xarray.Dataset` objects, where coordinate variables and their indexes are inherited down to children. Let's first make some example xarray datasets: @@ -243,19 +242,19 @@ Let's first make some example xarray datasets: import xarray as xr data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]}) - ds = xr.Dataset(dict(foo=data, bar=("x", [1, 2]), baz=np.pi)) + ds = xr.Dataset({"foo": data, "bar": ("x", [1, 2]), "baz": np.pi}) ds ds2 = ds.interp(coords={"x": [10, 12, 14, 16, 18, 20]}) ds2 ds3 = xr.Dataset( - dict(people=["alice", "bob"], heights=("people", [1.57, 1.82])), + {"people": ["alice", "bob"], "heights": ("people", [1.57, 1.82])}, coords={"species": "human"}, ) ds3 -Now we'll put this data into a multi-group tree: +Now we'll put this data into a multi-group DataTree: .. ipython:: python @@ -264,16 +263,17 @@ Now we'll put this data into a multi-group tree: ) dt -This creates a datatree with various groups. We have one root group, containing information about individual people. +This creates a datatree with a group hierarchy. We have one root group, containing information about individual people. (This root group can be named, but here is unnamed, so is referred to with ``"/"``, same as the root of a unix-like filesystem.) The root group then has one subgroup ``simulation``, which contains no data itself but does contain another two subgroups, named ``fine`` and ``coarse``. The (sub-)sub-groups ``fine`` and ``coarse`` contain two very similar datasets. They both have an ``"x"`` dimension, but the dimension is of different lengths in each group, which makes the data in each group unalignable. +Remember to keep unalignable coordinates in sibling groups as a DataTree inherits coordinates through its child nodes. All parent/descendent coordinates must be alignable to form a DataTree. In the root group we placed some completely unrelated information, showing how we can use a tree to store heterogenous data. -The constraints on each group are therefore the same as the constraint on DataArrays within a single dataset. +The constraints on each group are the same as the constraint on DataArrays within a single dataset with the addition of requiring parent/descendent coordinate agreement. We created the sub-groups using a filesystem-like syntax, and accessing groups works the same way. We can access individual DataArrays in a similar fashion @@ -282,12 +282,26 @@ We can access individual DataArrays in a similar fashion dt["simulation/coarse/foo"] -and we can also view the data in a particular group as a ``Dataset`` object using ``.ds``: +and we can also view the data in a particular group as a readonly ``DatasetView`` object using ``.ds``: .. ipython:: python dt["simulation/coarse"].ds +We can get a copy of the ``Dataset`` including the inherited coordinates by calling the ``.to_dataset`` method: + +.. ipython:: python + + ds_inherited = dt["simulation/coarse"].to_dataset() + ds_inherited + +And you can get a copy of just the node local values of ``Dataset`` by setting the ``inherited`` keyword to ``False``: + +.. ipython:: python + + ds_node_local = dt["simulation/coarse"].to_dataset(inherited=False) + ds_node_local + Operations map over subtrees, so we can take a mean over the ``x`` dimension of both the ``fine`` and ``coarse`` groups just by .. ipython:: python @@ -297,6 +311,16 @@ Operations map over subtrees, so we can take a mean over the ``x`` dimension of Here the ``"x"`` dimension used is always the one local to that sub-group. + +Finally we can try to create an invalid tree by putting the ``coarse`` data under the ``fine`` data: + +.. ipython:: python + + dt = xr.DataTree.from_dict( + {"simulation/fine/coarse": ds, "simulation/fine": ds2, "/": ds3} + ) + dt + You can do almost everything you can do with ``Dataset`` objects with ``DataTree`` objects (including indexing and arithmetic), as operations will be mapped over every sub-group in the tree. This allows you to work with multiple groups of non-alignable variables at once. diff --git a/xarray/core/datatree.py b/xarray/core/datatree.py index 65ff8667cb7..bdba0e15e03 100644 --- a/xarray/core/datatree.py +++ b/xarray/core/datatree.py @@ -569,7 +569,7 @@ def to_dataset(self, inherited: bool = True) -> Dataset: ---------- inherited : bool, optional If False, only include coordinates and indexes defined at the level - of this DataTree node, excluding inherited coordinates. + of this DataTree node, excluding any inherited coordinates and indexes. See Also -------- From 97c183d326f78bca130e5dd7ec70dc46c32d1834 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 25 Jul 2024 08:15:53 -0600 Subject: [PATCH 16/57] DAS-2155: removes erroneous example This was going to be an example of alignment errors, but then found #9276 And accidently checked this in. I will look for a better place to descibe alignment in detail. --- doc/getting-started-guide/quick-overview.rst | 9 --------- 1 file changed, 9 deletions(-) diff --git a/doc/getting-started-guide/quick-overview.rst b/doc/getting-started-guide/quick-overview.rst index 74792d70e5b..684fdafa7a7 100644 --- a/doc/getting-started-guide/quick-overview.rst +++ b/doc/getting-started-guide/quick-overview.rst @@ -312,15 +312,6 @@ Operations map over subtrees, so we can take a mean over the ``x`` dimension of Here the ``"x"`` dimension used is always the one local to that sub-group. -Finally we can try to create an invalid tree by putting the ``coarse`` data under the ``fine`` data: - -.. ipython:: python - - dt = xr.DataTree.from_dict( - {"simulation/fine/coarse": ds, "simulation/fine": ds2, "/": ds3} - ) - dt - You can do almost everything you can do with ``Dataset`` objects with ``DataTree`` objects (including indexing and arithmetic), as operations will be mapped over every sub-group in the tree. This allows you to work with multiple groups of non-alignable variables at once. From 2507475226e0782c68146ef273b9c6f7835b95c3 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 25 Jul 2024 18:14:04 -0600 Subject: [PATCH 17/57] DAS-2155: tidying for inheritance. --- doc/api.rst | 4 +- doc/getting-started-guide/quick-overview.rst | 48 ++++++++++++-------- doc/user-guide/data-structures.rst | 21 +++++---- 3 files changed, 43 insertions(+), 30 deletions(-) diff --git a/doc/api.rst b/doc/api.rst index 4e9b64f30c3..09fbfdafb4f 100644 --- a/doc/api.rst +++ b/doc/api.rst @@ -658,7 +658,7 @@ This interface echoes that of ``xarray.Dataset``. Dictionary Interface -------------------- -``DataTree`` objects also have a dict-like interface mapping keys to either ``xarray.DataArray`` s or to child ``DataTree`` nodes. +``DataTree`` objects also have a dict-like interface mapping keys to either ``xarray.DataArray``\s or to child ``DataTree`` nodes. .. autosummary:: :toctree: generated/ @@ -1466,7 +1466,7 @@ Advanced API Context register_dataset_accessor register_dataarray_accessor - xarray.core.extensions.register_datatree_accessor + register_datatree_accessor Dataset.set_close backends.BackendArray backends.BackendEntrypoint diff --git a/doc/getting-started-guide/quick-overview.rst b/doc/getting-started-guide/quick-overview.rst index 684fdafa7a7..cbe2af97598 100644 --- a/doc/getting-started-guide/quick-overview.rst +++ b/doc/getting-started-guide/quick-overview.rst @@ -254,7 +254,7 @@ Let's first make some example xarray datasets: ) ds3 -Now we'll put this data into a multi-group DataTree: +Now we'll put these datasets into a hierarchical DataTree: .. ipython:: python @@ -263,26 +263,35 @@ Now we'll put this data into a multi-group DataTree: ) dt -This creates a datatree with a group hierarchy. We have one root group, containing information about individual people. -(This root group can be named, but here is unnamed, so is referred to with ``"/"``, same as the root of a unix-like filesystem.) -The root group then has one subgroup ``simulation``, which contains no data itself but does contain another two subgroups, -named ``fine`` and ``coarse``. +This created a DataTree with nested groups. We have one root group, containing information about individual +people. This root group can be named, but here is unnamed, so is referred to with ``/``, same as the root of a +unix-like filesystem. The root group then has one subgroup ``simulation``, which contains no data itself but does +contain another two subgroups, named ``fine`` and ``coarse``. -The (sub-)sub-groups ``fine`` and ``coarse`` contain two very similar datasets. -They both have an ``"x"`` dimension, but the dimension is of different lengths in each group, which makes the data in each group unalignable. -Remember to keep unalignable coordinates in sibling groups as a DataTree inherits coordinates through its child nodes. All parent/descendent coordinates must be alignable to form a DataTree. -In the root group we placed some completely unrelated information, showing how we can use a tree to store heterogenous data. +The (sub)subgroups ``fine`` and ``coarse`` contain two very similar datasets. They both have an ``x`` +dimension, but the dimension is of different lengths in each group, which makes the data in each group +unalignable. In the root group we placed some completely unrelated information, in order to show how a tree can +store heterogenous data. -The constraints on each group are the same as the constraint on DataArrays within a single dataset with the addition of requiring parent/descendent coordinate agreement. +Remember to keep unalignable dimensions in sibling groups because a DataTree inherits coordinates down through its +child nodes. You can see this inheritance in the above representation of the DataTree. The coordinates +``people`` and ``species`` defined in the root ``/`` node are shown in the child nodes both +``/simulation/coarse`` and ``/simulation/fine``. All coordinates in parent-descendent lineage must be +alignable to form a DataTree. If your input data is not aligned, you can still get a nested ``dict`` of +``Dataset`` objects with :py:func:`~xarray.open_groups` and then apply any required changes to ensure alignment +before converting to a ``DataTree``. -We created the sub-groups using a filesystem-like syntax, and accessing groups works the same way. -We can access individual DataArrays in a similar fashion +The constraints on each group are the same as the constraint on DataArrays within a single dataset with the +addition of requiring parent-descendent coordinate agreement. + +We created the subgroups using a filesystem-like syntax, and accessing groups works the same way. We can access +individual DataArrays in a similar fashion. .. ipython:: python dt["simulation/coarse/foo"] -and we can also view the data in a particular group as a readonly ``DatasetView`` object using ``.ds``: +We can also view the data in a particular group as a readonly ``DatasetView`` using ``.ds``: .. ipython:: python @@ -302,22 +311,23 @@ And you can get a copy of just the node local values of ``Dataset`` by setting t ds_node_local = dt["simulation/coarse"].to_dataset(inherited=False) ds_node_local -Operations map over subtrees, so we can take a mean over the ``x`` dimension of both the ``fine`` and ``coarse`` groups just by +Operations map over subtrees, so we can take a mean over the ``x`` dimension of both the ``fine`` and ``coarse`` groups just by: .. ipython:: python avg = dt["simulation"].mean(dim="x") avg -Here the ``"x"`` dimension used is always the one local to that sub-group. +Here the ``x`` dimension used is always the one local to that subgroup. You can do almost everything you can do with ``Dataset`` objects with ``DataTree`` objects -(including indexing and arithmetic), as operations will be mapped over every sub-group in the tree. +(including indexing and arithmetic), as operations will be mapped over every subgroup in the tree. This allows you to work with multiple groups of non-alignable variables at once. .. note:: - If all of your variables are mutually alignable - (i.e. they live on the same grid, such that every common dimension name maps to the same length), - then you probably don't need :py:class:`xarray.DataTree`, and should consider just sticking with ``xarray.Dataset``. + If all of your variables are mutually alignable (i.e. they live on the same + grid, such that every common dimension name maps to the same length), then + you probably don't need :py:class:`xarray.DataTree`, and should consider + just sticking with ``xarray.Dataset``. diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 16a302f1562..32ce6fd3248 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -511,17 +511,17 @@ Each ``DataTree`` object (or "node") contains the same data that a single keys), and so has the same key properties: - ``dims``: a dictionary mapping of dimension names to lengths, for the - variables in this node, + variables in this node, and this node's ancestors, - ``data_vars``: a dict-like container of DataArrays corresponding to variables in this node, - ``coords``: another dict-like container of DataArrays, corresponding to - coordinate variables in this node, + coordinate variables in this node, and this node's ancestors, - ``attrs``: dict to hold arbitary metadata relevant to data in this node. A single ``DataTree`` object acts much like a single ``Dataset`` object, and -has a similar set of dict-like methods defined upon it. However, ``DataTree``'s +has a similar set of dict-like methods defined upon it. However, ``DataTree``\s can also contain other ``DataTree`` objects, so they can be thought of as -nested dict-like containers of both ``xarray.DataArray``'s and ``DataTree``'s. +nested dict-like containers of both ``xarray.DataArray``\s and ``DataTree``\s. A single datatree object is known as a "node", and its position relative to other nodes is defined by two more key properties: @@ -576,7 +576,6 @@ Let's make a single datatree node with some example data in it: ds1 = xr.Dataset({"foo": "orange"}) dt = xr.DataTree(name="root", data=ds1) # create root node - dt At this point our node is also the root node, as every tree has a root node. @@ -590,7 +589,7 @@ the constructor of the second: # add a child by referring to the parent node node2 = xr.DataTree(name="a", parent=dt, data=ds2) -or by dynamically updating the attributes of one node to refer to another: +or by dynamically updating the properties of one node to refer to another: .. ipython:: python @@ -607,7 +606,7 @@ Our tree now has three nodes within it: dt It is at tree construction time that consistency checks are enforced. For -instance, if we try to create a `cycle` the constructor will raise an error +instance, if we try to create a `cycle`, where the root node is also a child of a decendent, the constructor will raise an (:py:class:`~xarray.InvalidTreeError`): .. ipython:: python @@ -619,8 +618,12 @@ Alternatively you can also create a ``DataTree`` object from - A dictionary mapping directory-like paths to either ``DataTree`` nodes or data, using :py:meth:`xarray.DataTree.from_dict()`, -- A netCDF or Zarr file on disk with :py:func:`open_datatree()`. See - :ref:`reading and writing files `. +- A well formed netCDF or Zarr file on disk with +:py:func:`open_datatree()`. See :ref:`reading and writing files `. For +data files with groups that do not not align see :py:func:`xarray.open_groups()` or use +:py:func:`xarray.open_dataset(group='target_group')` + + DataTree Contents From 032f5cbc8171e8e7c7e28d2a2d68fce374ba04dc Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 25 Jul 2024 18:14:54 -0600 Subject: [PATCH 18/57] DAS-2155: Another first stab at adding a hierarchy structure example. I pulled this directly from https://github.com/pydata/xarray/issues/9077 Thanks etienne and shoyer --- doc/user-guide/data-structures.rst | 82 ++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 32ce6fd3248..651b091af59 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -624,6 +624,88 @@ data files with groups that do not not align see :py:func:`xarray.open_groups()` :py:func:`xarray.open_dataset(group='target_group')` +DataTree Inheritence +~~~~~~~~~~~~~~~~~~~~ + +DataTree implements a simple inheritance mechanism. Coordinates and their +associated indices are propagated from each node downward starting from the +root node. Coordinate inheritance was inspired by the NetCDF-CF inherited +dimensions, but DataTree's inheritance is slightly stricter and easier to +reason about. + +The constraint that this puts on a DataTree is that dimensions and indices that +are inherited must be aligned with any child's existing dimension or index. +This allows child nodes to use dimensions defined in ancestor nodes, without +duplicating that information, but on the flip side if a dimension dimname is +defined in on a node and that same dimname dimension in one of it's ancestors, +they must align (have the same index and size). + +Some examples: + +.. ipython:: python + + # Set up coordinates + times = xr.DataArray(data=["2022-01", "2023-01"], dims="time") + stations = xr.DataArray(data=list("abcdef"), dims="station") + lon = [-100, -80, -60] + lat = [10, 20, 30] + + # Set up fake data + wind_speed = xr.DataArray(np.ones((2, 6)) * 2, dims=("time", "station")) + pressure = xr.DataArray(np.ones((2, 6)) * 3, dims=("time", "station")) + air_temperature = xr.DataArray(np.ones((2, 6)) * 4, dims=("time", "station")) + dewpoint_temp = xr.DataArray(np.ones((2, 6)) * 5, dims=("time", "station")) + infrared = xr.DataArray(np.ones((2, 3, 3)) * 6, dims=("time", "lon", "lat")) + true_color = xr.DataArray(np.ones((2, 3, 3)) * 7, dims=("time", "lon", "lat")) + + xdt = xr.DataTree.from_dict( + { + "/": xr.Dataset( + coords={"time": times}, + ), + "/weather_data": xr.Dataset( + coords={"station": stations}, + data_vars={ + "wind_speed": wind_speed, + "pressure": pressure, + }, + ), + "/weather_data/temperature": xr.Dataset( + data_vars={ + "air_temperature": air_temperature, + "dewpoint_temp": dewpoint_temp, + }, + ), + "/satellite_image": xr.Dataset( + coords={"lat": lat, "lon": lon}, + data_vars={ + "infrared": infrared, + "true_color": true_color, + }, + ), + }, + ) + + +Here there are four different coordinate variables, which apply to variables in the DataTree in different ways: + +``time`` is a shared coordinate used by both ``weather`` and ``satellite`` variables +``station`` is used only for ``weather`` variables +``lat`` and ``lon`` are only use for ``satellite images`` + +Coordinate variables are inherited to descendent nodes, which means that +variables at different levels of a hierarchical DataTree are always +aligned. Placing the ``time`` variable at the root node automatically indicates +that it applies to all descendent nodes. Similarly, ``station`` is in the base +``weather_data`` node, because it applies to all weather variables, both directly +in ``weather_data`` and in the ``temperature`` sub-tree. + +Accessing any of the lower level trees as an ``xarray.Dataset`` would +automatically include coordinates from higher levels (e.g., time): + +.. ipython:: python + + dt["/weather_data/temperature"].ds DataTree Contents From f3edabbce8c70ebdd5f917dee4f74c20fa5e5378 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 25 Jul 2024 18:24:51 -0600 Subject: [PATCH 19/57] DAS-2155: Fix typo. --- doc/user-guide/data-structures.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 651b091af59..81f5a5cf27c 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -658,7 +658,7 @@ Some examples: infrared = xr.DataArray(np.ones((2, 3, 3)) * 6, dims=("time", "lon", "lat")) true_color = xr.DataArray(np.ones((2, 3, 3)) * 7, dims=("time", "lon", "lat")) - xdt = xr.DataTree.from_dict( + dt = xr.DataTree.from_dict( { "/": xr.Dataset( coords={"time": times}, From b6e61630928ba653aea7ae12bf4894a839197e19 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 25 Jul 2024 18:56:38 -0600 Subject: [PATCH 20/57] DAS-2155: Reorder, and fix bullet list --- doc/user-guide/data-structures.rst | 153 +++++++++++++++-------------- 1 file changed, 80 insertions(+), 73 deletions(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 81f5a5cf27c..250c47d517b 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -619,12 +619,86 @@ Alternatively you can also create a ``DataTree`` object from - A dictionary mapping directory-like paths to either ``DataTree`` nodes or data, using :py:meth:`xarray.DataTree.from_dict()`, - A well formed netCDF or Zarr file on disk with -:py:func:`open_datatree()`. See :ref:`reading and writing files `. For -data files with groups that do not not align see :py:func:`xarray.open_groups()` or use -:py:func:`xarray.open_dataset(group='target_group')` +:py:func:`open_datatree()`. See :ref:`reading and writing files `. +For data files with groups that do not not align see +:py:func:`xarray.open_groups()` or use +:py:func:`xarray.open_dataset(group='target_group')`. For more information +about coordinate alignment see :ref:`datatree-inheritance` -DataTree Inheritence + + +DataTree Contents +~~~~~~~~~~~~~~~~~ + +Like ``xarray.Dataset``, ``xarray.DataTree`` implements the python mapping interface, +but with values given by either ``xarray.DataArray`` objects or other +``DataTree`` objects. + +.. ipython:: python + + dt["a"] + dt["foo"] + +Iterating over keys will iterate over both the names of variables and child nodes. + +We can also access all the data in a single node through a dataset-like view + +.. ipython:: python + + dt["a"].ds + +This demonstrates the fact that the data in any one node is equivalent to the +contents of a single ``xarray.Dataset`` object. The ``DataTree.ds`` property +returns an immutable view, but we can instead extract the node's data contents +as a new (and mutable) ``xarray.Dataset`` object via +:py:meth:`xarray.DataTree.to_dataset()`: + +.. ipython:: python + + dt["a"].to_dataset() + +Like with ``Dataset``, you can access the data and coordinate variables of a +node separately via the ``data_vars`` and ``coords`` attributes: + +.. ipython:: python + + dt["a"].data_vars + dt["a"].coords + + +Dictionary-like methods +~~~~~~~~~~~~~~~~~~~~~~~ + +We can update a datatree in-place using Python's standard dictionary syntax, +similar to how we can for Dataset objects. For example, to create this example +datatree from scratch, we could have written: + +.. ipython:: python + + dt = xr.DataTree(name="root") + dt["foo"] = "orange" + dt["a"] = xr.DataTree(data=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])})) + dt["a/b/zed"] = np.nan + dt + +To change the variables in a node of a ``DataTree``, you can use all the +standard dictionary methods, including ``values``, ``items``, ``__delitem__``, +``get`` and :py:meth:`xarray.DataTree.update`. +Note that assigning a ``DataArray`` object to a ``DataTree`` variable using +``__setitem__`` or ``update`` will :ref:`automatically align ` the +array(s) to the original node's indexes. + +If you copy a ``DataTree`` using the :py:func:`copy` function or the +:py:meth:`xarray.DataTree.copy` method it will copy the subtree, +meaning that node and children below it, but no parents above it. +Like for ``Dataset``, this copy is shallow by default, but you can copy all the +underlying data arrays by calling ``dt.copy(deep=True)``. + + +.. _datatree-inheritance: + +DataTree Inheritance ~~~~~~~~~~~~~~~~~~~~ DataTree implements a simple inheritance mechanism. Coordinates and their @@ -658,7 +732,7 @@ Some examples: infrared = xr.DataArray(np.ones((2, 3, 3)) * 6, dims=("time", "lon", "lat")) true_color = xr.DataArray(np.ones((2, 3, 3)) * 7, dims=("time", "lon", "lat")) - dt = xr.DataTree.from_dict( + dt2 = xr.DataTree.from_dict( { "/": xr.Dataset( coords={"time": times}, @@ -705,75 +779,8 @@ automatically include coordinates from higher levels (e.g., time): .. ipython:: python - dt["/weather_data/temperature"].ds - - -DataTree Contents -~~~~~~~~~~~~~~~~~ - -Like ``xarray.Dataset``, ``xarray.DataTree`` implements the python mapping interface, -but with values given by either ``xarray.DataArray`` objects or other -``DataTree`` objects. - -.. ipython:: python - - dt["a"] - dt["foo"] - -Iterating over keys will iterate over both the names of variables and child nodes. + dt2["/weather_data/temperature"].ds -We can also access all the data in a single node through a dataset-like view - -.. ipython:: python - - dt["a"].ds - -This demonstrates the fact that the data in any one node is equivalent to the -contents of a single ``xarray.Dataset`` object. The ``DataTree.ds`` property -returns an immutable view, but we can instead extract the node's data contents -as a new (and mutable) ``xarray.Dataset`` object via -:py:meth:`xarray.DataTree.to_dataset()`: - -.. ipython:: python - - dt["a"].to_dataset() - -Like with ``Dataset``, you can access the data and coordinate variables of a -node separately via the ``data_vars`` and ``coords`` attributes: - -.. ipython:: python - - dt["a"].data_vars - dt["a"].coords - - -Dictionary-like methods -~~~~~~~~~~~~~~~~~~~~~~~ - -We can update a datatree in-place using Python's standard dictionary syntax, -similar to how we can for Dataset objects. For example, to create this example -datatree from scratch, we could have written: - -.. ipython:: python - - dt = xr.DataTree(name="root") - dt["foo"] = "orange" - dt["a"] = xr.DataTree(data=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])})) - dt["a/b/zed"] = np.nan - dt - -To change the variables in a node of a ``DataTree``, you can use all the -standard dictionary methods, including ``values``, ``items``, ``__delitem__``, -``get`` and :py:meth:`xarray.DataTree.update`. -Note that assigning a ``DataArray`` object to a ``DataTree`` variable using -``__setitem__`` or ``update`` will :ref:`automatically align ` the -array(s) to the original node's indexes. - -If you copy a ``DataTree`` using the :py:func:`copy` function or the -:py:meth:`xarray.DataTree.copy` method it will copy the subtree, -meaning that node and children below it, but no parents above it. -Like for ``Dataset``, this copy is shallow by default, but you can copy all the -underlying data arrays by calling ``dt.copy(deep=True)``. .. _coordinates: From 9e4b737ee0907e16703dab79e118e51b1fc36d46 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Fri, 26 Jul 2024 13:42:05 -0600 Subject: [PATCH 21/57] DAS-2155: Updates datastructures with extensive (too?) example Updates roadmap Squashes a bunch of it's -> its typos. --- doc/getting-started-guide/quick-overview.rst | 3 + doc/roadmap.rst | 17 +++-- doc/user-guide/data-structures.rst | 74 +++++++++++--------- doc/user-guide/terminology.rst | 2 +- xarray/backends/api.py | 2 +- 5 files changed, 58 insertions(+), 40 deletions(-) diff --git a/doc/getting-started-guide/quick-overview.rst b/doc/getting-started-guide/quick-overview.rst index cbe2af97598..8f363a4c34a 100644 --- a/doc/getting-started-guide/quick-overview.rst +++ b/doc/getting-started-guide/quick-overview.rst @@ -229,6 +229,9 @@ You can directly read and write xarray objects to disk using :py:meth:`~xarray.D It is common for datasets to be distributed across multiple files (commonly one file per timestep). Xarray supports this use-case by providing the :py:meth:`~xarray.open_mfdataset` and the :py:meth:`~xarray.save_mfdataset` methods. For more, see :ref:`io`. + +.. _quick-overview-datatrees: + DataTrees --------- diff --git a/doc/roadmap.rst b/doc/roadmap.rst index 4d6bebc8a5d..c065a76a925 100644 --- a/doc/roadmap.rst +++ b/doc/roadmap.rst @@ -201,6 +201,13 @@ extensions. Tree-like data structure ++++++++++++++++++++++++ +.. note:: + + After some time, the community DataTree project has now been updated and + merged into xarray exposing :py:class:`xarray.DataTree`. This is just + released and a bit experimental, but please try it out and let us know what + you think. Take a look at our :ref:`quick-overview-datatrees` quickstart. + Xarray’s highest-level object was previously an ``xarray.Dataset``, whose data model echoes that of a single netCDF group. However real-world datasets are often better represented by a collection of related Datasets. Particular common @@ -219,10 +226,12 @@ A new tree-like data structure, ``xarray.DataTree``, which is essentially a structured hierarchical collection of Datasets, represents these cases and instead maps to multiple netCDF groups (see :issue:`4118`). -Currently there are several libraries which have wrapped xarray in order to build -domain-specific data structures (e.g. `xarray-multiscale `__.), -but the general ``xarray.DataTree`` object obviates the need for these and -consolidates effort in a single domain-agnostic tool, much as xarray has already achieved. +Currently there are several libraries which have wrapped xarray in order to +build domain-specific data structures (e.g. `xarray-multiscale +`__.), but the general +``xarray.DataTree`` object obviates the need for these and consolidates effort +in a single domain-agnostic tool, much as xarray has already achieved. + Labeled array without coordinates +++++++++++++++++++++++++++++++++ diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 250c47d517b..16886f7a85a 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -97,7 +97,7 @@ Coordinates can be specified in the following ways: arguments for :py:class:`~xarray.Variable` * A pandas object or scalar value, which is converted into a ``DataArray`` * A 1D array or list, which is interpreted as values for a one dimensional - coordinate variable along the same dimension as it's name + coordinate variable along the same dimension as its name - A dictionary of ``{coord_name: coord}`` where values are of the same form as the list. Supplying coordinates as a dictionary allows other coordinates @@ -260,8 +260,6 @@ In this example, it would be natural to call ``temperature`` and variables" because they label the points along the dimensions. (see [1]_ for more background on this example). -.. _dataarray constructor: - Creating a Dataset ~~~~~~~~~~~~~~~~~~ @@ -276,7 +274,7 @@ variables (``data_vars``), coordinates (``coords``) and attributes (``attrs``). arguments for :py:class:`~xarray.Variable` * A pandas object, which is converted into a ``DataArray`` * A 1D array or list, which is interpreted as values for a one dimensional - coordinate variable along the same dimension as it's name + coordinate variable along the same dimension as its name - ``coords`` should be a dictionary of the same form as ``data_vars``. @@ -614,17 +612,15 @@ instance, if we try to create a `cycle`, where the root node is also a child of dt.parent = node3 -Alternatively you can also create a ``DataTree`` object from +Alternatively you can also create a ``DataTree`` object from: -- A dictionary mapping directory-like paths to either ``DataTree`` nodes or - data, using :py:meth:`xarray.DataTree.from_dict()`, -- A well formed netCDF or Zarr file on disk with -:py:func:`open_datatree()`. See :ref:`reading and writing files `. +- A dictionary mapping directory-like paths to either ``DataTree`` nodes or data, using :py:meth:`xarray.DataTree.from_dict()`, +- A well formed netCDF or Zarr file on disk with :py:func:`open_datatree()`. See :ref:`reading and writing files `. For data files with groups that do not not align see -:py:func:`xarray.open_groups()` or use -:py:func:`xarray.open_dataset(group='target_group')`. For more information -about coordinate alignment see :ref:`datatree-inheritance` +:py:func:`xarray.open_groups()` or target each group individually +:py:func:`xarray.open_dataset(group='groupname') `. For +more information about coordinate alignment see :ref:`datatree-inheritance` @@ -642,7 +638,7 @@ but with values given by either ``xarray.DataArray`` objects or other Iterating over keys will iterate over both the names of variables and child nodes. -We can also access all the data in a single node through a dataset-like view +We can also access all the data in a single node, and its inerited coordinates, through a dataset-like view .. ipython:: python @@ -658,6 +654,15 @@ as a new (and mutable) ``xarray.Dataset`` object via dt["a"].to_dataset() +This same call can be made to get only the local node variables without any +inherited ones, by setting the inherited keyword to False, but in this example +there are no inherited coordinates so the result is the same as the previous call. + +.. ipython:: python + + dt["a"].to_dataset(inherited=False) + + Like with ``Dataset``, you can access the data and coordinate variables of a node separately via the ``data_vars`` and ``coords`` attributes: @@ -702,24 +707,25 @@ DataTree Inheritance ~~~~~~~~~~~~~~~~~~~~ DataTree implements a simple inheritance mechanism. Coordinates and their -associated indices are propagated from each node downward starting from the -root node. Coordinate inheritance was inspired by the NetCDF-CF inherited -dimensions, but DataTree's inheritance is slightly stricter and easier to -reason about. +associated indices are propagated from downward starting from the root node to +all descendent nodes. Coordinate inheritance was inspired by the NetCDF-CF +inherited dimensions, but DataTree's inheritance is slightly stricter yet +easier to reason about. The constraint that this puts on a DataTree is that dimensions and indices that -are inherited must be aligned with any child's existing dimension or index. -This allows child nodes to use dimensions defined in ancestor nodes, without -duplicating that information, but on the flip side if a dimension dimname is -defined in on a node and that same dimname dimension in one of it's ancestors, -they must align (have the same index and size). +are inherited must be aligned with any direct decendent node's existing +dimension or index. This allows decendents to use dimensions defined in +ancestor nodes, without duplicating that information. But as a consequence, if +a dimension dimension-name is defined in on a node and that same dimension-name +exists in one of its ancestors, they must align (have the same index and +size). Some examples: .. ipython:: python # Set up coordinates - times = xr.DataArray(data=["2022-01", "2023-01"], dims="time") + time = xr.DataArray(data=["2022-01", "2023-01"], dims="time") stations = xr.DataArray(data=list("abcdef"), dims="station") lon = [-100, -80, -60] lat = [10, 20, 30] @@ -728,29 +734,29 @@ Some examples: wind_speed = xr.DataArray(np.ones((2, 6)) * 2, dims=("time", "station")) pressure = xr.DataArray(np.ones((2, 6)) * 3, dims=("time", "station")) air_temperature = xr.DataArray(np.ones((2, 6)) * 4, dims=("time", "station")) - dewpoint_temp = xr.DataArray(np.ones((2, 6)) * 5, dims=("time", "station")) + dewpoint = xr.DataArray(np.ones((2, 6)) * 5, dims=("time", "station")) infrared = xr.DataArray(np.ones((2, 3, 3)) * 6, dims=("time", "lon", "lat")) true_color = xr.DataArray(np.ones((2, 3, 3)) * 7, dims=("time", "lon", "lat")) dt2 = xr.DataTree.from_dict( { "/": xr.Dataset( - coords={"time": times}, + coords={"time": time}, ), - "/weather_data": xr.Dataset( + "/weather": xr.Dataset( coords={"station": stations}, data_vars={ "wind_speed": wind_speed, "pressure": pressure, }, ), - "/weather_data/temperature": xr.Dataset( + "/weather/temperature": xr.Dataset( data_vars={ "air_temperature": air_temperature, - "dewpoint_temp": dewpoint_temp, + "dewpoint": dewpoint, }, ), - "/satellite_image": xr.Dataset( + "/satellite": xr.Dataset( coords={"lat": lat, "lon": lon}, data_vars={ "infrared": infrared, @@ -765,21 +771,21 @@ Here there are four different coordinate variables, which apply to variables in ``time`` is a shared coordinate used by both ``weather`` and ``satellite`` variables ``station`` is used only for ``weather`` variables -``lat`` and ``lon`` are only use for ``satellite images`` +``lat`` and ``lon`` are only use for ``satellite`` images Coordinate variables are inherited to descendent nodes, which means that variables at different levels of a hierarchical DataTree are always aligned. Placing the ``time`` variable at the root node automatically indicates that it applies to all descendent nodes. Similarly, ``station`` is in the base -``weather_data`` node, because it applies to all weather variables, both directly -in ``weather_data`` and in the ``temperature`` sub-tree. +``weather`` node, because it applies to all weather variables, both directly +in ``weather`` and in the ``temperature`` sub-tree. Accessing any of the lower level trees as an ``xarray.Dataset`` would -automatically include coordinates from higher levels (e.g., time): +automatically include coordinates from higher levels (e.g., ``time`` and ``station``): .. ipython:: python - dt2["/weather_data/temperature"].ds + dt2["/weather/temperature"].ds .. _coordinates: diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index 4d0d80d8763..773344f16b6 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -263,7 +263,7 @@ complete examples, please consult the relevant documentation.* :term:`variables`, :term:`dimensions`, :term:`coordinates`, and attributes. - The nodes in a tree are linked to one another, and each node is it's own instance of + The nodes in a tree are linked to one another, and each node is its own instance of ``DataTree`` object. Each node can have zero or more *children* (stored in a dictionary-like manner under their corresponding *names*), and those child nodes can themselves have children. If a node is a child of another node that other node is said to be its *parent*. diff --git a/xarray/backends/api.py b/xarray/backends/api.py index ece60a2b161..2f3002898dd 100644 --- a/xarray/backends/api.py +++ b/xarray/backends/api.py @@ -1669,7 +1669,7 @@ def to_zarr( _validate_dataset_names(dataset) if zarr_version is None: - # default to 2 if store doesn't specify it's version (e.g. a path) + # default to 2 if store doesn't specify its version (e.g. a path) zarr_version = int(getattr(store, "_store_version", 2)) if consolidated is None and zarr_version > 2: From 3529b87153130c362b2dd31ef69b0b85b3e99871 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Mon, 29 Jul 2024 11:37:16 -0600 Subject: [PATCH 22/57] DAS-2155: Cleans what I can from hierarichal-data.rst. --- doc/user-guide/hierarchical-data.rst | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index e1b748deb3c..63392cdb130 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -34,7 +34,8 @@ Often datasets like this cannot easily fit into a single :py:class:`xarray.Datas or are more usefully thought of as groups of related ``xarray.Dataset`` objects. For this purpose we provide the :py:class:`xarray.DataTree` class. -This page explains in detail how to understand and use the different features of the :py:class:`xarray.DataTree` class for your own hierarchical data needs. +This page explains in detail how to understand and use the different features +of the :py:class:`xarray.DataTree` class for your own hierarchical data needs. .. _node relationships: @@ -95,7 +96,7 @@ That's good - updating the properties of our nodes does not break the internal c These children obviously have another parent, Marge Simpson, but ``DataTree`` nodes can only have a maximum of one parent. Genealogical `family trees are not even technically trees `_ in the mathematical sense - - the fact that distant relatives can mate makes it a directed acyclic graph. + the fact that distant relatives can mate makes them directed acyclic graphs. Trees of ``DataTree`` objects cannot represent this. Homer is currently listed as having no parent (the so-called "root node" of this tree), but we can update his :py:class:`~xarray.DataTree.parent` property: @@ -126,7 +127,7 @@ We can add Herbert to the family tree without displacing Homer by :py:meth:`~xar .. ipython:: python herbert = xr.DataTree(name="Herb") - abe.assign({"Herbert": herbert}) + abe = abe.assign({"Herbert": herbert}) .. note:: This example shows a minor subtlety - the returned tree has Homer's brother listed as ``"Herbert"``, @@ -319,9 +320,9 @@ Given two nodes in a tree, we can also find their relative path: bart.relative_to(lisa) -You can use this filepath feature to build a nested tree from a dictionary of filesystem-like paths and corresponding ``xarray.Dataset`` objects in a single step. +You can use this filepath feature to build a nested tree from a dictionary of filesystem-like paths and corresponding :py:class:`~xarray.Dataset` objects in a single step. If we have a dictionary where each key is a valid path, and each value is either valid data or ``None``, -we can construct a complex tree quickly using the alternative constructor :py:meth:`DataTree.from_dict()`: +we can construct a complex tree quickly using the alternative constructor :py:meth:`~xarray.DataTree.from_dict()`: .. ipython:: python @@ -337,7 +338,7 @@ we can construct a complex tree quickly using the alternative constructor :py:me .. note:: Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path - (i.e. the node labelled `"c"` in this case.) + (i.e. the node labelled `"/a/c"` in this case.) This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`xarray.DataTree.from_dict`. .. _iterating over trees: @@ -363,7 +364,7 @@ then rebuilding a new tree using only the paths of those nodes: .. ipython:: python non_empty_nodes = {node.path: node.ds for node in dt.subtree if node.has_data} - DataTree.from_dict(non_empty_nodes) + xr.DataTree.from_dict(non_empty_nodes) You can see this tree is similar to the ``dt`` object above, except that it is missing the empty nodes ``a/c`` and ``a/c/d``. @@ -437,7 +438,7 @@ A concept that can sometimes be useful is that of a "Hollow Tree", which means a This is useful because certain useful tree manipulation operations only make sense for hollow trees. You can check if a tree is a hollow tree by using the :py:class:`~xarray.DataTree.is_hollow` property. -We can see that the Simpson's family is not hollow because the data variable ``"age"`` is present at some nodes which +We can see that the Simpson's family is not hollow because the data variable ``age`` is present at some nodes which have children (i.e. Abe and Homer). .. ipython:: python @@ -564,7 +565,7 @@ Then calculate the RMS value of these signals: .. _multiple trees: -We can also use the :py:func:`map_over_subtree` decorator to promote a function which accepts datasets into one which +We can also use the :py:meth:`~xarray.map_over_subtree` decorator to promote a function which accepts datasets into one which accepts datatrees. Operating on Multiple Trees @@ -579,7 +580,7 @@ Comparing Trees for Isomorphism For it to make sense to map a single non-unary function over the nodes of multiple trees at once, each tree needs to have the same structure. Specifically two trees can only be considered similar, or "isomorphic", if they have the same number of nodes, and each corresponding node has the same number of children. -We can check if any two trees are isomorphic using the :py:meth:`DataTree.isomorphic` method. +We can check if any two trees are isomorphic using the :py:meth:`~xarray.DataTree.isomorphic` method. .. ipython:: python :okexcept: @@ -594,7 +595,7 @@ We can check if any two trees are isomorphic using the :py:meth:`DataTree.isomor dt4 = xr.DataTree.from_dict({"A": None, "A/B": xr.Dataset({"foo": 1})}) dt1.isomorphic(dt4) -If the trees are not isomorphic a :py:class:`~TreeIsomorphismError` will be raised. +If the trees are not isomorphic a :py:class:`~xarray.TreeIsomorphismError` will be raised. Notice that corresponding tree nodes do not need to have the same name or contain the same data in order to be considered isomorphic. Arithmetic Between Multiple Trees From 87597e92e9b1a164993019e1b6d7b5ce0d541476 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Mon, 29 Jul 2024 11:43:40 -0600 Subject: [PATCH 23/57] DAS-2155: Fix bad merge --- doc/whats-new.rst | 4 ---- 1 file changed, 4 deletions(-) diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 5340c45b256..e3918ba3778 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -242,10 +242,6 @@ Internal Changes consistent with their use of ``dim``. Using the existing kwarg will raise a warning. By `Maximilian Roos `_ - rather than ``dims`` or ``dimensions``. This is the final change to make xarray methods - consistent with their use of ``dim``. Using the existing kwarg will raise a - warning. By `Maximilian Roos `_ - .. _whats-new.2024.03.0: From ee7c3a9a2147f10b47b0278f5d11083fc0860995 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Mon, 29 Jul 2024 11:52:16 -0600 Subject: [PATCH 24/57] DAS-2155: Cleanup removes ``item`` for references. --- doc/user-guide/hierarchical-data.rst | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 63392cdb130..270b2120300 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -30,12 +30,12 @@ Examples of data which one might want organise in a grouped or hierarchical mann or even any combination of the above. -Often datasets like this cannot easily fit into a single :py:class:`xarray.Dataset` object, -or are more usefully thought of as groups of related ``xarray.Dataset`` objects. +Often datasets like this cannot easily fit into a single :py:class:`~xarray.Dataset` object, +or are more usefully thought of as groups of related :py:class:`~xarray.Dataset` objects. For this purpose we provide the :py:class:`xarray.DataTree` class. This page explains in detail how to understand and use the different features -of the :py:class:`xarray.DataTree` class for your own hierarchical data needs. +of the :py:class:`~xarray.DataTree` class for your own hierarchical data needs. .. _node relationships: @@ -47,7 +47,7 @@ Node Relationships Creating a Family Tree ~~~~~~~~~~~~~~~~~~~~~~ -The three main ways of creating a ``DataTree`` object are described briefly in :ref:`creating a datatree`. +The three main ways of creating a :py:class:`~xarray.DataTree` object are described briefly in :ref:`creating a datatree`. Here we go into more detail about how to create a tree node-by-node, using a famous family tree from the Simpsons cartoon as an example. Let's start by defining nodes representing the two siblings, Bart and Lisa Simpson: @@ -94,10 +94,10 @@ Let's check that Maggie knows who her Dad is: That's good - updating the properties of our nodes does not break the internal consistency of our tree, as changes of parentage are automatically reflected on both nodes. - These children obviously have another parent, Marge Simpson, but ``DataTree`` nodes can only have a maximum of one parent. + These children obviously have another parent, Marge Simpson, but :py:class:`~xarray.DataTree` nodes can only have a maximum of one parent. Genealogical `family trees are not even technically trees `_ in the mathematical sense - the fact that distant relatives can mate makes them directed acyclic graphs. - Trees of ``DataTree`` objects cannot represent this. + Trees of :py:class:`~xarray.DataTree` objects cannot represent this. Homer is currently listed as having no parent (the so-called "root node" of this tree), but we can update his :py:class:`~xarray.DataTree.parent` property: @@ -134,7 +134,7 @@ We can add Herbert to the family tree without displacing Homer by :py:meth:`~xar but the original node was named "Herbert". Not only are names overriden when stored as keys like this, but the new node is a copy, so that the original node that was reference is unchanged (i.e. ``herbert.name == "Herb"`` still). In other words, nodes are copied into trees, not inserted into them. - This is intentional, and mirrors the behaviour when storing named ``xarray.DataArray`` objects inside datasets. + This is intentional, and mirrors the behaviour when storing named :py:class:`~xarray.DataArray` objects inside datasets. Certain manipulations of our tree are forbidden, if they would create an inconsistent result. In episode 51 of the show Futurama, Philip J. Fry travels back in time and accidentally becomes his own Grandfather. @@ -182,7 +182,7 @@ and :ref:`filesystem paths` (to be explained shortly) to select two nodes of int This tree shows various families of species, grouped by their common features (making it technically a `"Cladogram" `_, rather than an evolutionary tree). -Here both the species and the features used to group them are represented by ``DataTree`` node objects - there is no distinction in types of node. +Here both the species and the features used to group them are represented by :py:class:`~xarray.DataTree` node objects - there is no distinction in types of node. We can however get a list of only the nodes we used to represent species by using the fact that all those nodes have no children - they are "leaf nodes". We can check if a node is a leaf with :py:meth:`~xarray.DataTree.is_leaf`, and get a list of all leaves with the :py:class:`~xarray.DataTree.leaves` property: @@ -243,7 +243,7 @@ including :py:meth:`~xarray.DataTree.keys`, :py:class:`~xarray.DataTree.values`, vertebrates["Bony Skeleton"]["Ray-finned Fish"] -Note that the dict-like interface combines access to child ``DataTree`` nodes and stored ``DataArrays``, +Note that the dict-like interface combines access to child :py:class:`~xarray.DataTree` nodes and stored :py:class:`~xarray.DataArrays`, so if we have a node that contains both children and data, calling :py:meth:`~xarray.DataTree.keys` will list both names of child nodes and names of data variables: @@ -368,7 +368,7 @@ then rebuilding a new tree using only the paths of those nodes: You can see this tree is similar to the ``dt`` object above, except that it is missing the empty nodes ``a/c`` and ``a/c/d``. -(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.root.name)``.) +(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`~xarray.DataTree.from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.root.name)``.) .. _manipulating trees: @@ -450,7 +450,7 @@ have children (i.e. Abe and Homer). Computation ----------- -``DataTree`` objects are also useful for performing computations, not just for organizing data. +:py:class:`~xarray.DataTree` objects are also useful for performing computations, not just for organizing data. Operations and Methods on Trees ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 421c404c59c18ab36bfb2ab9fe1db016a154d9ad Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Mon, 29 Jul 2024 14:52:43 -0600 Subject: [PATCH 25/57] DAS-2155: Replaces a bunch of ``item`` with references. Also adds back the ``"/"`` style for some quoted strings. --- doc/getting-started-guide/quick-overview.rst | 22 +++--- doc/user-guide/data-structures.rst | 70 ++++++++++---------- doc/user-guide/hierarchical-data.rst | 2 +- 3 files changed, 47 insertions(+), 47 deletions(-) diff --git a/doc/getting-started-guide/quick-overview.rst b/doc/getting-started-guide/quick-overview.rst index 8f363a4c34a..76ffb0c8184 100644 --- a/doc/getting-started-guide/quick-overview.rst +++ b/doc/getting-started-guide/quick-overview.rst @@ -235,7 +235,7 @@ It is common for datasets to be distributed across multiple files (commonly one DataTrees --------- -:py:class:`xarray.DataTree` is a tree-like container of :py:class:`xarray.DataArray` objects, organised into multiple mutually alignable groups. You can think of it like a (recursive) ``dict`` of :py:class:`xarray.Dataset` objects, where coordinate variables and their indexes are inherited down to children. +:py:class:`xarray.DataTree` is a tree-like container of :py:class:`~xarray.DataArray` objects, organised into multiple mutually alignable groups. You can think of it like a (recursive) ``dict`` of :py:class:`~xarray.Dataset` objects, where coordinate variables and their indexes are inherited down to children. Let's first make some example xarray datasets: @@ -267,11 +267,11 @@ Now we'll put these datasets into a hierarchical DataTree: dt This created a DataTree with nested groups. We have one root group, containing information about individual -people. This root group can be named, but here is unnamed, so is referred to with ``/``, same as the root of a +people. This root group can be named, but here is unnamed, so is referred to with ``"/"``, same as the root of a unix-like filesystem. The root group then has one subgroup ``simulation``, which contains no data itself but does contain another two subgroups, named ``fine`` and ``coarse``. -The (sub)subgroups ``fine`` and ``coarse`` contain two very similar datasets. They both have an ``x`` +The (sub)subgroups ``fine`` and ``coarse`` contain two very similar datasets. They both have an ``"x"`` dimension, but the dimension is of different lengths in each group, which makes the data in each group unalignable. In the root group we placed some completely unrelated information, in order to show how a tree can store heterogenous data. @@ -281,8 +281,8 @@ child nodes. You can see this inheritance in the above representation of the Da ``people`` and ``species`` defined in the root ``/`` node are shown in the child nodes both ``/simulation/coarse`` and ``/simulation/fine``. All coordinates in parent-descendent lineage must be alignable to form a DataTree. If your input data is not aligned, you can still get a nested ``dict`` of -``Dataset`` objects with :py:func:`~xarray.open_groups` and then apply any required changes to ensure alignment -before converting to a ``DataTree``. +:py:class:`~xarray.Dataset` objects with :py:func:`~xarray.open_group` and then apply any required changes to ensure alignment +before converting to a :py:class:`~xarray.DataTree`. The constraints on each group are the same as the constraint on DataArrays within a single dataset with the addition of requiring parent-descendent coordinate agreement. @@ -294,20 +294,20 @@ individual DataArrays in a similar fashion. dt["simulation/coarse/foo"] -We can also view the data in a particular group as a readonly ``DatasetView`` using ``.ds``: +We can also view the data in a particular group as a readonly :py:class:`~xarray.datatree.DatasetView` using :py:attr:`xarray.datatree.ds`: .. ipython:: python dt["simulation/coarse"].ds -We can get a copy of the ``Dataset`` including the inherited coordinates by calling the ``.to_dataset`` method: +We can get a copy of the :py:class:`~xarray.Dataset` including the inherited coordinates by calling the :py:class:`~xarray.datatree.to_dataset` method: .. ipython:: python ds_inherited = dt["simulation/coarse"].to_dataset() ds_inherited -And you can get a copy of just the node local values of ``Dataset`` by setting the ``inherited`` keyword to ``False``: +And you can get a copy of just the node local values of :py:class:`~xarray.Dataset` by setting the ``inherited`` keyword to ``False``: .. ipython:: python @@ -321,10 +321,10 @@ Operations map over subtrees, so we can take a mean over the ``x`` dimension of avg = dt["simulation"].mean(dim="x") avg -Here the ``x`` dimension used is always the one local to that subgroup. +Here the ``"x"`` dimension used is always the one local to that subgroup. -You can do almost everything you can do with ``Dataset`` objects with ``DataTree`` objects +You can do almost everything you can do with :py:class:`~xarray.Dataset` objects with :py:class:`~xarray.DataTree` objects (including indexing and arithmetic), as operations will be mapped over every subgroup in the tree. This allows you to work with multiple groups of non-alignable variables at once. @@ -333,4 +333,4 @@ This allows you to work with multiple groups of non-alignable variables at once. If all of your variables are mutually alignable (i.e. they live on the same grid, such that every common dimension name maps to the same length), then you probably don't need :py:class:`xarray.DataTree`, and should consider - just sticking with ``xarray.Dataset``. + just sticking with :py:class:`xarray.Dataset`. diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 16886f7a85a..acbcb2509bf 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -496,16 +496,16 @@ dimension and non-dimension variables: DataTree -------- -:py:class:`DataTree` is ``xarray``'s highest-level data structure, able to +:py:class:`~xarray.DataTree` is ``xarray``'s highest-level data structure, able to organise heterogeneous data which could not be stored inside a single -:py:class:`Dataset` object. This includes representing the recursive structure +:py:class:`~xarray.Dataset` object. This includes representing the recursive structure of multiple `groups`_ within a netCDF file or `Zarr Store`_. .. _groups: https://www.unidata.ucar.edu/software/netcdf/workshops/2011/groups-types/GroupsIntro.html .. _Zarr Store: https://zarr.readthedocs.io/en/stable/tutorial.html#groups -Each ``DataTree`` object (or "node") contains the same data that a single -``xarray.Dataset`` would (i.e. ``DataArray`` objects stored under hashable +Each :py:class:`~xarray.DataTree` object (or "node") contains the same data that a single +:py:class:`xarray.Dataset` would (i.e. :py:class:`~xarray.DataArray` objects stored under hashable keys), and so has the same key properties: - ``dims``: a dictionary mapping of dimension names to lengths, for the @@ -516,17 +516,17 @@ keys), and so has the same key properties: coordinate variables in this node, and this node's ancestors, - ``attrs``: dict to hold arbitary metadata relevant to data in this node. -A single ``DataTree`` object acts much like a single ``Dataset`` object, and -has a similar set of dict-like methods defined upon it. However, ``DataTree``\s -can also contain other ``DataTree`` objects, so they can be thought of as -nested dict-like containers of both ``xarray.DataArray``\s and ``DataTree``\s. +A single :py:class:`~xarray.DataTree` object acts much like a single :py:class:`~xarray.Dataset` object, and +has a similar set of dict-like methods defined upon it. However, :py:class:`~xarray.DataTree`\s +can also contain other :py:class:`~xarray.DataTree` objects, so they can be thought of as +nested dict-like containers of both :py:class:`xarray.DataArray`\s and :py:class:`~xarray.DataTree`\s. A single datatree object is known as a "node", and its position relative to other nodes is defined by two more key properties: -- ``children``: An ordered dictionary mapping from names to other ``DataTree`` +- ``children``: An ordered dictionary mapping from names to other :py:class:`~xarray.DataTree` objects, known as its "child nodes". -- ``parent``: The single ``DataTree`` object whose children this datatree is a +- ``parent``: The single :py:class:`~xarray.DataTree` object whose children this datatree is a member of, known as its "parent node". Each child automatically knows about its parent node, and a node without a @@ -539,15 +539,15 @@ otherwise known as a `"Tree" .. note:: - Technically a ``DataTree`` with more than one child node forms an + Technically a `:py:class:`~xarray.DataTree` with more than one child node forms an `"Ordered Tree" `_, because the children are stored in an Ordered Dictionary. However, this distinction only really matters for a few edge cases involving operations on multiple trees simultaneously, and can safely be ignored by most users. -``DataTree`` objects can also optionally have a ``name`` as well as ``attrs``, -just like a ``DataArray``. Again these are not normally used unless explicitly +:py:class:`~xarray.DataTree` objects can also optionally have a ``name`` as well as ``attrs``, +just like a :py:class:`~xarray.DataArray`. Again these are not normally used unless explicitly accessed by the user. @@ -556,16 +556,16 @@ accessed by the user. Creating a DataTree ~~~~~~~~~~~~~~~~~~~ -One way to create a ``DataTree`` from scratch is to create each node individually, +One way to create a :py:class:`~xarray.DataTree` from scratch is to create each node individually, specifying the nodes' relationship to one another as you create each one. -The ``DataTree`` constructor takes: +The :py:class:`~xarray.DataTree`` constructor takes: - ``data``: The data that will be stored in this node, represented by a single - ``xarray.Dataset``, or a named ``xarray.DataArray``. -- ``parent``: The parent node (if there is one), given as a ``DataTree`` object. + :py:class:`xarray.Dataset`, or a named :py:class:`xarray.DataArray`. +- ``parent``: The parent node (if there is one), given as a :py:class:`~xarray.DataTree` object. - ``children``: The various child nodes (if there are any), given as a mapping - from string keys to ``DataTree`` objects. + from string keys to :py:class:`~xarray.DataTree` objects. - ``name``: A string to use as the name of this node. Let's make a single datatree node with some example data in it: @@ -612,13 +612,13 @@ instance, if we try to create a `cycle`, where the root node is also a child of dt.parent = node3 -Alternatively you can also create a ``DataTree`` object from: +Alternatively you can also create a :py:class:`~xarray.DataTree` object from: -- A dictionary mapping directory-like paths to either ``DataTree`` nodes or data, using :py:meth:`xarray.DataTree.from_dict()`, -- A well formed netCDF or Zarr file on disk with :py:func:`open_datatree()`. See :ref:`reading and writing files `. +- A dictionary mapping directory-like paths to either :py:class:`~xarray.DataTree` nodes or data, using :py:meth:`xarray.DataTree.from_dict()`, +- A well formed netCDF or Zarr file on disk with :py:func:`~xarray.open_datatree()`. See :ref:`reading and writing files `. For data files with groups that do not not align see -:py:func:`xarray.open_groups()` or target each group individually +:py:func:`xarray.open_group` or target each group individually :py:func:`xarray.open_dataset(group='groupname') `. For more information about coordinate alignment see :ref:`datatree-inheritance` @@ -627,9 +627,9 @@ more information about coordinate alignment see :ref:`datatree-inheritance` DataTree Contents ~~~~~~~~~~~~~~~~~ -Like ``xarray.Dataset``, ``xarray.DataTree`` implements the python mapping interface, -but with values given by either ``xarray.DataArray`` objects or other -``DataTree`` objects. +Like :py:class:`~xarray.Dataset`, :py:class:`~xarray.DataTree` implements the python mapping interface, +but with values given by either :py:class:`~xarray.DataArray` objects or other +:py:class:`~xarray.DataTree` objects. .. ipython:: python @@ -645,10 +645,10 @@ We can also access all the data in a single node, and its inerited coordinates, dt["a"].ds This demonstrates the fact that the data in any one node is equivalent to the -contents of a single ``xarray.Dataset`` object. The ``DataTree.ds`` property +contents of a single :py:class:`~xarray.Dataset` object. The :py:attr:`DataTree.ds ` property returns an immutable view, but we can instead extract the node's data contents -as a new (and mutable) ``xarray.Dataset`` object via -:py:meth:`xarray.DataTree.to_dataset()`: +as a new and mutable :py:class:`~xarray.Dataset` object via +:py:meth:`DataTree.to_dataset() `: .. ipython:: python @@ -663,8 +663,8 @@ there are no inherited coordinates so the result is the same as the previous cal dt["a"].to_dataset(inherited=False) -Like with ``Dataset``, you can access the data and coordinate variables of a -node separately via the ``data_vars`` and ``coords`` attributes: +Like with :py:class:`~xarray.Dataset`, you can access the data and coordinate variables of a +node separately via the :py:attr:`~xarray.DataTree.data_vars` and :py:attr:`~xarray.DataTree.coords` attributes: .. ipython:: python @@ -687,17 +687,17 @@ datatree from scratch, we could have written: dt["a/b/zed"] = np.nan dt -To change the variables in a node of a ``DataTree``, you can use all the +To change the variables in a node of a :py:class:`~xarray.DataTree`, you can use all the standard dictionary methods, including ``values``, ``items``, ``__delitem__``, ``get`` and :py:meth:`xarray.DataTree.update`. -Note that assigning a ``DataArray`` object to a ``DataTree`` variable using -``__setitem__`` or ``update`` will :ref:`automatically align ` the +Note that assigning a :py:class:`~xarray.DataTree` object to a :py:class:`~xarray.DataTree` variable using +``__setitem__`` or :py:meth:`~xarray.DataTree.update` will :ref:`automatically align ` the array(s) to the original node's indexes. -If you copy a ``DataTree`` using the :py:func:`copy` function or the +If you copy a :py:class:`~xarray.DataTree` using the :py:func:`copy` function or the :py:meth:`xarray.DataTree.copy` method it will copy the subtree, meaning that node and children below it, but no parents above it. -Like for ``Dataset``, this copy is shallow by default, but you can copy all the +Like for :py:class:`~xarray.Dataset`, this copy is shallow by default, but you can copy all the underlying data arrays by calling ``dt.copy(deep=True)``. diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 270b2120300..c6a154249b1 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -438,7 +438,7 @@ A concept that can sometimes be useful is that of a "Hollow Tree", which means a This is useful because certain useful tree manipulation operations only make sense for hollow trees. You can check if a tree is a hollow tree by using the :py:class:`~xarray.DataTree.is_hollow` property. -We can see that the Simpson's family is not hollow because the data variable ``age`` is present at some nodes which +We can see that the Simpson's family is not hollow because the data variable ``"age"`` is present at some nodes which have children (i.e. Abe and Homer). .. ipython:: python From 49cc828b3c71b8a0f2f74269fafccd19cdf5f081 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Tue, 30 Jul 2024 08:51:15 -0600 Subject: [PATCH 26/57] DAS-2155: hand merge whats-new.rst --- doc/whats-new.rst | 101 +++++++++++++++++++++++++++++++--------------- 1 file changed, 68 insertions(+), 33 deletions(-) diff --git a/doc/whats-new.rst b/doc/whats-new.rst index b23ddcc7b5c..d05e9ebb512 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -15,20 +15,66 @@ What's New np.random.seed(123456) -.. _whats-new.2024.06.1: +.. _whats-new.2024.07.1: -v2024.06.1 (unreleased) +v2024.07.1 (unreleased) ----------------------- New Features ~~~~~~~~~~~~ +- ``DataTree`` related functionality is now exposed in the main ``xarray`` public + API. This includes: ``xarray.DataTree``, ``xarray.open_datatree``, + ``xarray.map_over_subtree``, ``xarray.register_datatree_accessor`` and + ``xarray.testing.assert_isomorphic``. + By `Owen Littlejohns `_ and + `Tom Nicholas `_. + + +Breaking changes +~~~~~~~~~~~~~~~~ + + +Deprecations +~~~~~~~~~~~~ + + +Bug fixes +~~~~~~~~~ + + +Documentation +~~~~~~~~~~~~~ +- Migrate documentation for ``datatree`` into main ``xarray`` documentation (:pull:`9033`). + For information on previous ``datatree`` releases, please see: + `datatree's historical release notes `_. + By `Owen Littlejohns `_, `Matt Savoie `_, and + `Tom Nicholas `_. + + + +Internal Changes +~~~~~~~~~~~~~~~~ + + +.. _whats-new.2024.07.0: + +v2024.07.0 (Jul 30, 2024) +------------------------- +This release extends the API for groupby operations with various `grouper objects `, and includes improvements to the documentation and numerous bugfixes. + +Thanks to the 22 contributors to this release: +Alfonso Ladino, ChrisCleaner, David Hoese, Deepak Cherian, Dieter WerthmĂ¼ller, Illviljan, Jessica Scheick, Joel Jaeschke, Justus Magin, K. Arthur Endsley, Kai MĂ¼hlbauer, Mark Harfouche, Martin Raspaud, Mathijs Verhaegh, Maximilian Roos, Michael Niklas, MichaÅ‚ GĂ³rny, Moritz Schreiber, Pontus Lurcock, Spencer Clark, Stephan Hoyer and Tom Nicholas + +New Features +~~~~~~~~~~~~ - Use fastpath when grouping both montonically increasing and decreasing variable - in :py:class:`GroupBy` (:issue:`6220`, :pull:`7427`). By `Joel Jaeschke `_. + in :py:class:`GroupBy` (:issue:`6220`, :pull:`7427`). + By `Joel Jaeschke `_. - Introduce new :py:class:`groupers.UniqueGrouper`, :py:class:`groupers.BinGrouper`, and :py:class:`groupers.TimeResampler` objects as a step towards supporting grouping by - multiple variables. See the `docs ` and the - `grouper design doc `_ for more. + multiple variables. See the `docs ` and the `grouper design doc + `_ for more. (:issue:`6610`, :pull:`8840`). By `Deepak Cherian `_. - Allow rechunking to a frequency using ``Dataset.chunk(time=TimeResampler("YE"))`` syntax. (:issue:`7559`, :pull:`9109`) @@ -39,12 +85,6 @@ New Features By `Mathijs Verhaegh `_. - Allow chunking for arrays with duplicated dimension names (:issue:`8759`, :pull:`9099`). By `Martin Raspaud `_. -- ``DataTree`` related functionality is now exposed in the main ``xarray`` public - API. This includes: ``xarray.DataTree``, ``xarray.open_datatree``, - ``xarray.map_over_subtree``, ``xarray.register_datatree_accessor`` and - ``xarray.testing.assert_isomorphic``. - By `Owen Littlejohns `_ and - `Tom Nicholas `_. - Extract the source url from fsspec objects (:issue:`9142`, :pull:`8923`). By `Justus Magin `_. - Add :py:meth:`DataArray.drop_attrs` & :py:meth:`Dataset.drop_attrs` methods, @@ -54,15 +94,17 @@ New Features Breaking changes ~~~~~~~~~~~~~~~~ -- The ``base`` and ``loffset`` parameters to :py:meth:`Dataset.resample` and :py:meth:`DataArray.resample` - is now removed. These parameters has been deprecated since v2023.03.0. Using the - ``origin`` or ``offset`` parameters is recommended as a replacement for using - the ``base`` parameter and using time offset arithmetic is recommended as a - replacement for using the ``loffset`` parameter. -- The ``squeeze`` kwarg to ``groupby`` is completely deprecated. This has been the source of some quite confusing - behaviour and has been deprecated since v2024.01.0. `groupby`` behavior is now always consistent - with the existing ``.groupby(..., squeeze=False)`` behavior. - By `Deepak Cherian `_. (:pull:`9280`) +- The ``base`` and ``loffset`` parameters to :py:meth:`Dataset.resample` and + :py:meth:`DataArray.resample` are now removed. These parameters have been deprecated since + v2023.03.0. Using the ``origin`` or ``offset`` parameters is recommended as a replacement for + using the ``base`` parameter and using time offset arithmetic is recommended as a replacement for + using the ``loffset`` parameter. (:pull:`9233`) + By `Deepak Cherian `_. +- The ``squeeze`` kwarg to ``groupby`` is now ignored. This has been the source of some + quite confusing behaviour and has been deprecated since v2024.01.0. `groupby`` behavior is now + always consistent with the existing ``.groupby(..., squeeze=False)`` behavior. No errors will + be raised if `squeeze=False`. (:pull:`9280`) + By `Deepak Cherian `_. Bug fixes @@ -81,29 +123,22 @@ Bug fixes By `Justus Magin `_. - Address regression introduced in :pull:`9002` that prevented objects returned by py:meth:`DataArray.convert_calendar` to be indexed by a time index in - certain circumstances (:issue:`9138`, :pull:`9192`). By `Mark Harfouche - `_ and `Spencer Clark - `. - -- Fiy static typing of tolerance arguments by allowing `str` type (:issue:`8892`, :pull:`9194`). + certain circumstances (:issue:`9138`, :pull:`9192`). + By `Mark Harfouche `_ and `Spencer Clark `_. +- Fix static typing of tolerance arguments by allowing `str` type (:issue:`8892`, :pull:`9194`). By `Michael Niklas `_. - Dark themes are now properly detected for ``html[data-theme=dark]``-tags (:pull:`9200`). By `Dieter WerthmĂ¼ller `_. - Reductions no longer fail for ``np.complex_`` dtype arrays when numbagg is - installed. - By `Maximilian Roos `_ + installed. (:pull:`9210`) + By `Maximilian Roos `_. Documentation ~~~~~~~~~~~~~ -- Migrate documentation for ``datatree`` into main ``xarray`` documentation (:pull:`9033`). - For information on previous ``datatree`` releases, please see: - `datatree's historical release notes `_. - By `Owen Littlejohns `_ and - `Tom Nicholas `_. - Adds intro to backend section of docs, including a flow-chart to navigate types of backends (:pull:`9175`). By `Jessica Scheick `_. -- Adds a flow-chart diagram to help users navigate help resources (`Discussion #8990 `_, :pull:`9147`). +- Adds a flow-chart diagram to help users navigate help resources (:discussion:`8990`, :pull:`9147`). By `Jessica Scheick `_. - Improvements to Zarr & chunking docs (:pull:`9139`, :pull:`9140`, :pull:`9132`) By `Maximilian Roos `_. From fb8eae1ff9c25ddf22413a48f548ac9a9ad86b68 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Wed, 31 Jul 2024 09:33:16 -0600 Subject: [PATCH 27/57] DAS-2155: updates developer meeting notes link. (unrelated) --- doc/developers-meeting.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/developers-meeting.rst b/doc/developers-meeting.rst index 153f3520f26..edf8af72059 100644 --- a/doc/developers-meeting.rst +++ b/doc/developers-meeting.rst @@ -5,7 +5,7 @@ Xarray developers meet bi-weekly every other Wednesday. The meeting occurs on `Zoom `__. -Find the `notes for the meeting here `__. +Find the `notes for the meeting here `__. There is a :issue:`GitHub issue for changes to the meeting<4001>`. From ec341ad8b9ea4c362b8c7ae0f9b7ca9aa6782bf9 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Wed, 31 Jul 2024 15:14:36 -0600 Subject: [PATCH 28/57] DAS-2155: These are the changes merged from #9298 --- doc/user-guide/io.rst | 10 ++++++++++ xarray/core/datatree.py | 13 ++++++++++++- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/doc/user-guide/io.rst b/doc/user-guide/io.rst index c48336ffa80..07de0619c73 100644 --- a/doc/user-guide/io.rst +++ b/doc/user-guide/io.rst @@ -241,6 +241,12 @@ use the :py:func:`xarray.open_datatree` function. To save a DataTree object as a netCDF file containing many groups, use the :py:meth:`xarray.DataTree.to_netcdf` method. +.. _netcdf.root_group.note: + +.. note:: + Due to file format specifications the on-disk root group name is always ``"/"``, + overriding any given ``DataTree`` root node name. + .. _netcdf.group.warning: .. warning:: @@ -989,6 +995,10 @@ a tree of groups use the :py:func:`open_datatree` function. To save a store (:ref:`unlike for netCDF files `), as zarr does not support "unused" dimensions. + For the root group the same restrictions (:ref:`as for netCDF files `) apply. + Due to file format specifications the on-disk root group name is always ``"/"`` + overriding any given ``DataTree`` root node name. + .. _io.zarr.consolidated_metadata: diff --git a/xarray/core/datatree.py b/xarray/core/datatree.py index bdba0e15e03..a33b680ac48 100644 --- a/xarray/core/datatree.py +++ b/xarray/core/datatree.py @@ -1546,7 +1546,13 @@ def to_netcdf( ``dask.delayed.Delayed`` object that can be computed later. Currently, ``compute=False`` is not supported. kwargs : - Addional keyword arguments to be passed to ``xarray.Dataset.to_netcdf`` + Additional keyword arguments to be passed to ``xarray.Dataset.to_netcdf`` + + Note + ---- + Due to file format specifications the on-disk root group name + is always ``"/"`` overriding any given ``DataTree`` root node name. + """ from xarray.core.datatree_io import _datatree_to_netcdf @@ -1602,6 +1608,11 @@ def to_zarr( supported. kwargs : Additional keyword arguments to be passed to ``xarray.Dataset.to_zarr`` + + Note + ---- + Due to file format specifications the on-disk root group name + is always ``"/"`` overriding any given ``DataTree`` root node name. """ from xarray.core.datatree_io import _datatree_to_zarr From fa459d9347a4b5cf896afb14b153b0de7e42c9ab Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Fri, 2 Aug 2024 11:54:48 -0600 Subject: [PATCH 29/57] DAS-2155: Format for merge --- xarray/core/datatree.py | 1 - 1 file changed, 1 deletion(-) diff --git a/xarray/core/datatree.py b/xarray/core/datatree.py index 8f10d69ca4f..4d647c9fd8d 100644 --- a/xarray/core/datatree.py +++ b/xarray/core/datatree.py @@ -1557,7 +1557,6 @@ def to_netcdf( ---- Due to file format specifications the on-disk root group name is always ``"/"`` overriding any given ``DataTree`` root node name. - """ from xarray.core.datatree_io import _datatree_to_netcdf From 3ce24c5f90339b1bd0e2fb817e3058f7a7d4396e Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Tue, 20 Aug 2024 23:33:30 -0400 Subject: [PATCH 30/57] DAS-2155: Adds open_groups to the API and docs --- doc/api.rst | 1 + xarray/__init__.py | 2 ++ 2 files changed, 3 insertions(+) diff --git a/doc/api.rst b/doc/api.rst index df6d95c1be0..63b53d9ffbe 100644 --- a/doc/api.rst +++ b/doc/api.rst @@ -953,6 +953,7 @@ DataTree methods :toctree: generated/ open_datatree + open_groups map_over_subtree DataTree.to_dict DataTree.to_netcdf diff --git a/xarray/__init__.py b/xarray/__init__.py index 26563fded93..e3b7ec469e9 100644 --- a/xarray/__init__.py +++ b/xarray/__init__.py @@ -7,6 +7,7 @@ open_dataarray, open_dataset, open_datatree, + open_groups, open_mfdataset, save_mfdataset, ) @@ -91,6 +92,7 @@ "open_dataarray", "open_dataset", "open_datatree", + "open_groups", "open_mfdataset", "open_zarr", "polyval", From c98be70751a92ba477cfc64b1013392f3a35fb34 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Mon, 2 Sep 2024 11:12:43 -0600 Subject: [PATCH 31/57] DAS-2155: Drop sphinx back even further. The ci/requirements/doc.yaml wouldn't resolve with the pin to 6.2.1 because docutils requirement docutils >=0.18.1,<0.20 but there are no viable options. --- ci/requirements/doc.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ci/requirements/doc.yml b/ci/requirements/doc.yml index ad48490427f..d18dd4a7fef 100644 --- a/ci/requirements/doc.yml +++ b/ci/requirements/doc.yml @@ -39,7 +39,7 @@ dependencies: - sphinx-copybutton - sphinx-design - sphinx-inline-tabs - - sphinx=6.2.1 # sphinx-book-theme issue: 749 + - sphinx>=5.0 - sphinxcontrib-srclinks - sphinx-remove-toctrees - sphinxext-opengraph From 508a5c77f7324a998778440100eedc88f9e2b8a2 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Wed, 11 Sep 2024 16:58:57 -0600 Subject: [PATCH 32/57] DAS-2155: Cleaning up data-structures.rst for new shallow parent updates --- doc/user-guide/data-structures.rst | 51 ++++++++++++++++-------------- xarray/datatree_/readthedocs.yml | 7 ---- 2 files changed, 28 insertions(+), 30 deletions(-) delete mode 100644 xarray/datatree_/readthedocs.yml diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index b3bc9984089..06fb9ac8366 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -573,44 +573,47 @@ Let's make a single datatree node with some example data in it: .. ipython:: python ds1 = xr.Dataset({"foo": "orange"}) - dt = xr.DataTree(name="root", data=ds1) # create root node + dt = xr.DataTree(name="root", dataset=ds1) # create root node dt At this point our node is also the root node, as every tree has a root node. -We can add a second node to this tree either by referring to the first node in -the constructor of the second: +We can add (a copy of) a second node to this tree, assigning it to the parent node dt: .. ipython:: python - ds2 = xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}) - # add a child by referring to the parent node - node2 = xr.DataTree(name="a", parent=dt, data=ds2) + dataset2 = xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}) + dt2 = xr.DataTree(name="a", dataset=dataset2) + # Make the second datatree a child of the original + dt.children = {"child-node": dt2} + dt + -or by dynamically updating the properties of one node to refer to another: +Or more idomatically you can create a tree from a dictionary of Datasets and +DataTrees. In this case we add a new node under ``dt``s 'child-node' by +providing the explicit path under 'child-node' as the dictionary key: .. ipython:: python - # add a second child by first creating a new node ... + # create a third Dataset ds3 = xr.Dataset({"zed": np.nan}) - node3 = xr.DataTree(name="b", data=ds3) - # ... then updating its .parent property - node3.parent = dt + # create a tree from a dictionary of DataTrees and Datasets + dt = xr.DataTree.from_dict({"/": dt, "/child-node/new-zed-node": ds3}) -Our tree now has three nodes within it: +We have created a tree with three nodes in it: .. ipython:: python dt -It is at tree construction time that consistency checks are enforced. For -instance, if we try to create a `cycle`, where the root node is also a child of a decendent, the constructor will raise an -(:py:class:`~xarray.InvalidTreeError`): +Consistency checks are enforced. For instance, if we try to create a `cycle`, +where the root node is also a child of a decendent, the constructor will raise +an (:py:class:`~xarray.InvalidTreeError`): .. ipython:: python :okexcept: - dt.parent = node3 + dt.children = {"child": dt} Alternatively you can also create a :py:class:`~xarray.DataTree` object from: @@ -633,7 +636,7 @@ but with values given by either :py:class:`~xarray.DataArray` objects or other .. ipython:: python - dt["a"] + dt["child-node"] dt["foo"] Iterating over keys will iterate over both the names of variables and child nodes. @@ -642,7 +645,7 @@ We can also access all the data in a single node, and its inerited coordinates, .. ipython:: python - dt["a"].ds + dt["child-node"].dataset This demonstrates the fact that the data in any one node is equivalent to the contents of a single :py:class:`~xarray.Dataset` object. The :py:attr:`DataTree.ds ` property @@ -652,7 +655,7 @@ as a new and mutable :py:class:`~xarray.Dataset` object via .. ipython:: python - dt["a"].to_dataset() + dt["child-node"].to_dataset() This same call can be made to get only the local node variables without any inherited ones, by setting the inherited keyword to False, but in this example @@ -660,7 +663,7 @@ there are no inherited coordinates so the result is the same as the previous cal .. ipython:: python - dt["a"].to_dataset(inherited=False) + dt["child-node"].to_dataset(inherited=False) Like with :py:class:`~xarray.Dataset`, you can access the data and coordinate variables of a @@ -668,8 +671,8 @@ node separately via the :py:attr:`~xarray.DataTree.data_vars` and :py:attr:`~xar .. ipython:: python - dt["a"].data_vars - dt["a"].coords + dt["child-node"].data_vars + dt["child-node"].coords Dictionary-like methods @@ -683,7 +686,9 @@ datatree from scratch, we could have written: dt = xr.DataTree(name="root") dt["foo"] = "orange" - dt["a"] = xr.DataTree(data=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])})) + dt["a"] = xr.DataTree( + dataset=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}) + ) dt["a/b/zed"] = np.nan dt diff --git a/xarray/datatree_/readthedocs.yml b/xarray/datatree_/readthedocs.yml deleted file mode 100644 index 9b04939c898..00000000000 --- a/xarray/datatree_/readthedocs.yml +++ /dev/null @@ -1,7 +0,0 @@ -version: 2 -conda: - environment: ci/doc.yml -build: - os: 'ubuntu-20.04' - tools: - python: 'mambaforge-4.10' From 3f20d34568faa8ac56748808ef36fed1e3a370e8 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Wed, 11 Sep 2024 17:02:47 -0600 Subject: [PATCH 33/57] DAS-2155: fix merge --- doc/whats-new.rst | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 2dcbe671df9..8b3a3ce1794 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -44,6 +44,13 @@ Bug fixes Documentation ~~~~~~~~~~~~~ +- Migrate documentation for ``datatree`` into main ``xarray`` documentation (:pull:`9033`). + For information on previous ``datatree`` releases, please see: + `datatree's historical release notes `_. + By `Owen Littlejohns `_, `Matt Savoie `_, and + `Tom Nicholas `_. + + Internal Changes ~~~~~~~~~~~~~~~~ @@ -135,12 +142,6 @@ Performance Documentation ~~~~~~~~~~~~~ -- Migrate documentation for ``datatree`` into main ``xarray`` documentation (:pull:`9033`). - For information on previous ``datatree`` releases, please see: - `datatree's historical release notes `_. - By `Owen Littlejohns `_, `Matt Savoie `_, and - `Tom Nicholas `_. - Internal Changes ~~~~~~~~~~~~~~~~ From 6bfc31b525509e8ab4f3ad08ae9e726e7a274e53 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Wed, 11 Sep 2024 17:25:34 -0600 Subject: [PATCH 34/57] DAS-2155: work around sphinx-book-theme/#749 --- ci/requirements/doc.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ci/requirements/doc.yml b/ci/requirements/doc.yml index d18dd4a7fef..183aa28d703 100644 --- a/ci/requirements/doc.yml +++ b/ci/requirements/doc.yml @@ -39,7 +39,7 @@ dependencies: - sphinx-copybutton - sphinx-design - sphinx-inline-tabs - - sphinx>=5.0 + - sphinx>=5.0,<7.0 # https://github.com/executablebooks/sphinx-book-theme/issues/749 - sphinxcontrib-srclinks - sphinx-remove-toctrees - sphinxext-opengraph From 15d16aa979bd811d4c5e342dc36daade7f4d3531 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Wed, 11 Sep 2024 18:42:21 -0600 Subject: [PATCH 35/57] DAS-2155: change datatree.ds -> datatree.dataset Also wording changes in data-structures.rst --- doc/api.rst | 2 +- doc/getting-started-guide/quick-overview.rst | 4 ++-- doc/user-guide/data-structures.rst | 23 +++++++++++++------- doc/user-guide/hierarchical-data.rst | 2 +- doc/user-guide/terminology.rst | 2 +- 5 files changed, 20 insertions(+), 13 deletions(-) diff --git a/doc/api.rst b/doc/api.rst index e10407557b6..87f116514cc 100644 --- a/doc/api.rst +++ b/doc/api.rst @@ -650,7 +650,7 @@ This interface echoes that of ``xarray.Dataset``. DataTree.encoding DataTree.indexes DataTree.nbytes - DataTree.ds + DataTree.dataset DataTree.to_dataset DataTree.has_data DataTree.has_attrs diff --git a/doc/getting-started-guide/quick-overview.rst b/doc/getting-started-guide/quick-overview.rst index 76ffb0c8184..5d7fb48b23e 100644 --- a/doc/getting-started-guide/quick-overview.rst +++ b/doc/getting-started-guide/quick-overview.rst @@ -294,11 +294,11 @@ individual DataArrays in a similar fashion. dt["simulation/coarse/foo"] -We can also view the data in a particular group as a readonly :py:class:`~xarray.datatree.DatasetView` using :py:attr:`xarray.datatree.ds`: +We can also view the data in a particular group as a readonly :py:class:`~xarray.datatree.DatasetView` using :py:attr:`xarray.datatree.dataset`: .. ipython:: python - dt["simulation/coarse"].ds + dt["simulation/coarse"].dataset We can get a copy of the :py:class:`~xarray.Dataset` including the inherited coordinates by calling the :py:class:`~xarray.datatree.to_dataset` method: diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 06fb9ac8366..17e18acc4a8 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -539,7 +539,7 @@ otherwise known as a `"Tree" .. note:: - Technically a `:py:class:`~xarray.DataTree` with more than one child node forms an + Technically a :py:class:`~xarray.DataTree` with more than one child node forms an `"Ordered Tree" `_, because the children are stored in an Ordered Dictionary. However, this distinction only really matters for a few edge cases involving operations @@ -559,7 +559,7 @@ Creating a DataTree One way to create a :py:class:`~xarray.DataTree` from scratch is to create each node individually, specifying the nodes' relationship to one another as you create each one. -The :py:class:`~xarray.DataTree`` constructor takes: +The :py:class:`~xarray.DataTree` constructor takes: - ``data``: The data that will be stored in this node, represented by a single :py:class:`xarray.Dataset`, or a named :py:class:`xarray.DataArray`. @@ -576,9 +576,14 @@ Let's make a single datatree node with some example data in it: dt = xr.DataTree(name="root", dataset=ds1) # create root node dt -At this point our node is also the root node, as every tree has a root node. +At this point we have created a single node datatree with no parent and no children. -We can add (a copy of) a second node to this tree, assigning it to the parent node dt: +.. ipython:: python + + dt.parent is None + dt.children + +We can add a copy of a second node to this tree, assigning it to the parent node ``dt``: .. ipython:: python @@ -590,8 +595,8 @@ We can add (a copy of) a second node to this tree, assigning it to the parent no Or more idomatically you can create a tree from a dictionary of Datasets and -DataTrees. In this case we add a new node under ``dt``s 'child-node' by -providing the explicit path under 'child-node' as the dictionary key: +DataTrees. In this case we add a new node under ``dt["child-node"]`` by +providing the explicit path under ``"child-node"`` as the dictionary key: .. ipython:: python @@ -606,6 +611,8 @@ We have created a tree with three nodes in it: dt + + Consistency checks are enforced. For instance, if we try to create a `cycle`, where the root node is also a child of a decendent, the constructor will raise an (:py:class:`~xarray.InvalidTreeError`): @@ -648,7 +655,7 @@ We can also access all the data in a single node, and its inerited coordinates, dt["child-node"].dataset This demonstrates the fact that the data in any one node is equivalent to the -contents of a single :py:class:`~xarray.Dataset` object. The :py:attr:`DataTree.ds ` property +contents of a single :py:class:`~xarray.Dataset` object. The :py:attr:`DataTree.dataset ` property returns an immutable view, but we can instead extract the node's data contents as a new and mutable :py:class:`~xarray.Dataset` object via :py:meth:`DataTree.to_dataset() `: @@ -790,7 +797,7 @@ automatically include coordinates from higher levels (e.g., ``time`` and ``stati .. ipython:: python - dt2["/weather/temperature"].ds + dt2["/weather/temperature"].dataset .. _coordinates: diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 7f4f05f8280..ceed9f0959a 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -363,7 +363,7 @@ then rebuilding a new tree using only the paths of those nodes: .. ipython:: python - non_empty_nodes = {node.path: node.ds for node in dt.subtree if node.has_data} + non_empty_nodes = {node.path: node.dataset for node in dt.subtree if node.has_data} xr.DataTree.from_dict(non_empty_nodes) You can see this tree is similar to the ``dt`` object above, except that it is missing the empty nodes ``a/c`` and ``a/c/d``. diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst index a11c97de55a..f453fd400d6 100644 --- a/doc/user-guide/terminology.rst +++ b/doc/user-guide/terminology.rst @@ -258,7 +258,7 @@ complete examples, please consult the relevant documentation.* DataTree A tree-like collection of ``Dataset`` objects. A *tree* is made up of one or more *nodes*, - each of which can store the same information as a single ``Dataset`` (accessed via ``.ds``). + each of which can store the same information as a single ``Dataset`` (accessed via ``.dataset``). This data is stored in the same way as in a ``Dataset``, i.e. in the form of data :term:`variables`, :term:`dimensions`, :term:`coordinates`, and attributes. From 06aa8291bd617be158642d555ac547dda5eafafe Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Wed, 11 Sep 2024 20:33:28 -0600 Subject: [PATCH 36/57] DAS-2155: Lots of simpsons fixes for shallow parentage. --- doc/user-guide/data-structures.rst | 18 ++++++++----- doc/user-guide/hierarchical-data.rst | 40 ++++++++++++++++------------ 2 files changed, 34 insertions(+), 24 deletions(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 17e18acc4a8..6be3f74d6d7 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -13,6 +13,10 @@ Data Structures np.random.seed(123456) np.set_printoptions(threshold=10) + %xmode minimal + + + DataArray --------- @@ -589,13 +593,13 @@ We can add a copy of a second node to this tree, assigning it to the parent node dataset2 = xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}) dt2 = xr.DataTree(name="a", dataset=dataset2) - # Make the second datatree a child of the original + # Add a copy of the second Datatree to the root dt.children = {"child-node": dt2} dt -Or more idomatically you can create a tree from a dictionary of Datasets and -DataTrees. In this case we add a new node under ``dt["child-node"]`` by +Or more idiomatically you can create a tree from a dictionary of ``Datasets`` and +`DataTrees`. In this case we add a new node under ``dt["child-node"]`` by providing the explicit path under ``"child-node"`` as the dictionary key: .. ipython:: python @@ -620,7 +624,7 @@ an (:py:class:`~xarray.InvalidTreeError`): .. ipython:: python :okexcept: - dt.children = {"child": dt} + dt["child-node"].children = {"new-child": dt} Alternatively you can also create a :py:class:`~xarray.DataTree` object from: @@ -628,7 +632,7 @@ Alternatively you can also create a :py:class:`~xarray.DataTree` object from: - A well formed netCDF or Zarr file on disk with :py:func:`~xarray.open_datatree()`. See :ref:`reading and writing files `. For data files with groups that do not not align see -:py:func:`xarray.open_group` or target each group individually +:py:func:`xarray.open_groups` or target each group individually :py:func:`xarray.open_dataset(group='groupname') `. For more information about coordinate alignment see :ref:`datatree-inheritance` @@ -693,10 +697,10 @@ datatree from scratch, we could have written: dt = xr.DataTree(name="root") dt["foo"] = "orange" - dt["a"] = xr.DataTree( + dt["child-node"] = xr.DataTree( dataset=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}) ) - dt["a/b/zed"] = np.nan + dt["child-node/new-zed-node/zed"] = np.nan dt To change the variables in a node of a :py:class:`~xarray.DataTree`, you can use all the diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index ceed9f0959a..b43d5ec8af0 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -104,7 +104,7 @@ Homer is currently listed as having no parent (the so-called "root node" of this .. ipython:: python abe = xr.DataTree(name="Abe") - homer.parent = abe + abe.children = {"Homer": homer} Abe is now the "root" of this tree, which we can see by examining the :py:class:`~xarray.DataTree.root` property of any node in the tree @@ -117,9 +117,8 @@ We can see the whole tree by printing Abe's node or just part of the tree by pri .. ipython:: python abe - homer + abe["Homer"] -We can see that Homer is aware of his parentage, and we say that Homer and his children form a "subtree" of the larger Simpson family tree. In episode 28, Abe Simpson reveals that he had another son, Herbert "Herb" Simpson. We can add Herbert to the family tree without displacing Homer by :py:meth:`~xarray.DataTree.assign`-ing another child to Abe: @@ -128,10 +127,14 @@ We can add Herbert to the family tree without displacing Homer by :py:meth:`~xar herbert = xr.DataTree(name="Herb") abe = abe.assign({"Herbert": herbert}) + abe + + abe["Herbert"].name + herbert.name .. note:: - This example shows a minor subtlety - the returned tree has Homer's brother listed as ``"Herbert"``, - but the original node was named "Herbert". Not only are names overridden when stored as keys like this, + This example shows a subtlety - the returned tree has Homer's brother listed as ``"Herbert"``, + but the original node was named "Herb". Not only are names overridden when stored as keys like this, but the new node is a copy, so that the original node that was reference is unchanged (i.e. ``herbert.name == "Herb"`` still). In other words, nodes are copied into trees, not inserted into them. This is intentional, and mirrors the behaviour when storing named :py:class:`~xarray.DataArray` objects inside datasets. @@ -143,7 +146,7 @@ If we try similar time-travelling hijinks with Homer, we get a :py:class:`~xarra .. ipython:: python :okexcept: - abe.parent = homer + abe["Homer"].children = {"Abe": abe} .. _evolutionary tree: @@ -155,8 +158,7 @@ Let's use a different example of a tree to discuss more complex relationships be .. ipython:: python vertebrates = xr.DataTree.from_dict( - name="Vertebrae", - d={ + { "/Sharks": None, "/Bony Skeleton/Ray-finned Fish": None, "/Bony Skeleton/Four Limbs/Amphibians": None, @@ -165,6 +167,7 @@ Let's use a different example of a tree to discuss more complex relationships be "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs": None, "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Birds": None, }, + name="Vertebrae", ) primates = vertebrates["/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Primates"] @@ -172,7 +175,7 @@ Let's use a different example of a tree to discuss more complex relationships be "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs" ] -We have used the :py:meth:`~xarray.DataTree.from_dict` constructor method as an alternate way to quickly create a whole tree, +We have used the :py:meth:`~xarray.DataTree.from_dict` constructor method as a prefered way to quickly create a whole tree, and :ref:`filesystem paths` (to be explained shortly) to select two nodes of interest. .. ipython:: python @@ -250,7 +253,7 @@ names of data variables: .. ipython:: python dt = xr.DataTree( - data=xr.Dataset({"foo": 0, "bar": 1}), + dataset=xr.Dataset({"foo": 0, "bar": 1}), children={"a": xr.DataTree(), "b": xr.DataTree()}, ) print(dt) @@ -303,8 +306,11 @@ The root node is referred to by ``"/"``, so the path from the root node to its g .. ipython:: python - # absolute path will start from root node - lisa["/Homer/Bart"].name + # access lisa's sibling by a relative path. + lisa["../Bart"] + # or from absolute path + lisa["/Homer/Bart"] + Relative paths between nodes also support the ``"../"`` syntax to mean the parent of the current node. We can use this with ``__setitem__`` to add a missing entry to our evolutionary tree, but add it relative to a more familiar node of interest: @@ -338,7 +344,7 @@ we can construct a complex tree quickly using the alternative constructor :py:me .. note:: Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path - (i.e. the node labelled `"/a/c"` in this case.) + (i.e. the node labelled ``"/a/c"`` in this case.) This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`xarray.DataTree.from_dict`. .. _iterating over trees: @@ -404,7 +410,7 @@ First lets recreate the tree but with an `age` data variable in every node: .. ipython:: python simpsons = xr.DataTree.from_dict( - d={ + { "/": xr.Dataset({"age": 83}), "/Herbert": xr.Dataset({"age": 40}), "/Homer": xr.Dataset({"age": 39}), @@ -542,13 +548,13 @@ Mapping Custom Functions Over Trees ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can map custom computation over each node in a tree using :py:meth:`xarray.DataTree.map_over_subtree`. -You can map any function, so long as it takes `xarray.Dataset` objects as one (or more) of the input arguments, +You can map any function, so long as it takes :py:class:`xarray.Dataset` objects as one (or more) of the input arguments, and returns one (or more) xarray datasets. .. note:: - Functions passed to :py:func:`map_over_subtree` cannot alter nodes in-place. - Instead they must return new `xarray.Dataset` objects. + Functions passed to :py:func:`~xarray.DataTree.map_over_subtree` cannot alter nodes in-place. + Instead they must return new :py:class:`xarray.Dataset` objects. For example, we can define a function to calculate the Root Mean Square of a timeseries From e8464b3008c6edec86930c9f5b313c19f1e11545 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Wed, 11 Sep 2024 20:45:10 -0600 Subject: [PATCH 37/57] DAS-2155: Delete datatree_ directory --- xarray/datatree_/.flake8 | 15 -- xarray/datatree_/.git_archival.txt | 4 - xarray/datatree_/.github/dependabot.yml | 11 - .../.github/pull_request_template.md | 7 - xarray/datatree_/.github/workflows/main.yaml | 97 --------- .../.github/workflows/pypipublish.yaml | 84 -------- xarray/datatree_/.gitignore | 136 ------------ xarray/datatree_/.pre-commit-config.yaml | 58 ----- xarray/datatree_/LICENSE | 201 ------------------ xarray/datatree_/README.md | 95 --------- xarray/datatree_/ci/doc.yml | 25 --- xarray/datatree_/ci/environment.yml | 16 -- xarray/datatree_/codecov.yml | 21 -- xarray/datatree_/conftest.py | 3 - xarray/datatree_/datatree/__init__.py | 7 - xarray/datatree_/datatree/py.typed | 0 16 files changed, 780 deletions(-) delete mode 100644 xarray/datatree_/.flake8 delete mode 100644 xarray/datatree_/.git_archival.txt delete mode 100644 xarray/datatree_/.github/dependabot.yml delete mode 100644 xarray/datatree_/.github/pull_request_template.md delete mode 100644 xarray/datatree_/.github/workflows/main.yaml delete mode 100644 xarray/datatree_/.github/workflows/pypipublish.yaml delete mode 100644 xarray/datatree_/.gitignore delete mode 100644 xarray/datatree_/.pre-commit-config.yaml delete mode 100644 xarray/datatree_/LICENSE delete mode 100644 xarray/datatree_/README.md delete mode 100644 xarray/datatree_/ci/doc.yml delete mode 100644 xarray/datatree_/ci/environment.yml delete mode 100644 xarray/datatree_/codecov.yml delete mode 100644 xarray/datatree_/conftest.py delete mode 100644 xarray/datatree_/datatree/__init__.py delete mode 100644 xarray/datatree_/datatree/py.typed diff --git a/xarray/datatree_/.flake8 b/xarray/datatree_/.flake8 deleted file mode 100644 index f1e3f9271e1..00000000000 --- a/xarray/datatree_/.flake8 +++ /dev/null @@ -1,15 +0,0 @@ -[flake8] -ignore = - # whitespace before ':' - doesn't work well with black - E203 - # module level import not at top of file - E402 - # line too long - let black worry about that - E501 - # do not assign a lambda expression, use a def - E731 - # line break before binary operator - W503 -exclude= - .eggs - doc diff --git a/xarray/datatree_/.git_archival.txt b/xarray/datatree_/.git_archival.txt deleted file mode 100644 index 3994ec0a83e..00000000000 --- a/xarray/datatree_/.git_archival.txt +++ /dev/null @@ -1,4 +0,0 @@ -node: $Format:%H$ -node-date: $Format:%cI$ -describe-name: $Format:%(describe:tags=true)$ -ref-names: $Format:%D$ diff --git a/xarray/datatree_/.github/dependabot.yml b/xarray/datatree_/.github/dependabot.yml deleted file mode 100644 index d1d1190be70..00000000000 --- a/xarray/datatree_/.github/dependabot.yml +++ /dev/null @@ -1,11 +0,0 @@ -version: 2 -updates: - - package-ecosystem: pip - directory: "/" - schedule: - interval: daily - - package-ecosystem: "github-actions" - directory: "/" - schedule: - # Check for updates to GitHub Actions every weekday - interval: "daily" diff --git a/xarray/datatree_/.github/pull_request_template.md b/xarray/datatree_/.github/pull_request_template.md deleted file mode 100644 index 8270498108a..00000000000 --- a/xarray/datatree_/.github/pull_request_template.md +++ /dev/null @@ -1,7 +0,0 @@ - - -- [ ] Closes #xxxx -- [ ] Tests added -- [ ] Passes `pre-commit run --all-files` -- [ ] New functions/methods are listed in `api.rst` -- [ ] Changes are summarized in `docs/source/whats-new.rst` diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml deleted file mode 100644 index 37034fc5900..00000000000 --- a/xarray/datatree_/.github/workflows/main.yaml +++ /dev/null @@ -1,97 +0,0 @@ -name: CI - -on: - push: - branches: - - main - pull_request: - branches: - - main - schedule: - - cron: "0 0 * * *" - -jobs: - - test: - name: ${{ matrix.python-version }}-build - runs-on: ubuntu-latest - defaults: - run: - shell: bash -l {0} - strategy: - matrix: - python-version: ["3.9", "3.10", "3.11", "3.12"] - steps: - - uses: actions/checkout@v4 - - - name: Create conda environment - uses: mamba-org/provision-with-micromamba@main - with: - cache-downloads: true - micromamba-version: 'latest' - environment-file: ci/environment.yml - extra-specs: | - python=${{ matrix.python-version }} - - - name: Conda info - run: conda info - - - name: Install datatree - run: | - python -m pip install -e . --no-deps --force-reinstall - - - name: Conda list - run: conda list - - - name: Running Tests - run: | - python -m pytest --cov=./ --cov-report=xml --verbose - - - name: Upload code coverage to Codecov - uses: codecov/codecov-action@v3.1.4 - with: - file: ./coverage.xml - flags: unittests - env_vars: OS,PYTHON - name: codecov-umbrella - fail_ci_if_error: false - - - test-upstream: - name: ${{ matrix.python-version }}-dev-build - runs-on: ubuntu-latest - defaults: - run: - shell: bash -l {0} - strategy: - matrix: - python-version: ["3.9", "3.10", "3.11", "3.12"] - steps: - - uses: actions/checkout@v4 - - - name: Create conda environment - uses: mamba-org/provision-with-micromamba@main - with: - cache-downloads: true - micromamba-version: 'latest' - environment-file: ci/environment.yml - extra-specs: | - python=${{ matrix.python-version }} - - - name: Conda info - run: conda info - - - name: Install dev reqs - run: | - python -m pip install --no-deps --upgrade \ - git+https://github.com/pydata/xarray \ - git+https://github.com/Unidata/netcdf4-python - - python -m pip install -e . --no-deps --force-reinstall - - - name: Conda list - run: conda list - - - name: Running Tests - run: | - python -m pytest --verbose diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml deleted file mode 100644 index 7dc36d87691..00000000000 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ /dev/null @@ -1,84 +0,0 @@ -name: Build distribution -on: - release: - types: - - published - push: - branches: - - main - pull_request: - branches: - - main - -concurrency: - group: ${{ github.workflow }}-${{ github.ref }} - cancel-in-progress: true - -jobs: - build-artifacts: - runs-on: ubuntu-latest - if: github.repository == 'xarray-contrib/datatree' - steps: - - uses: actions/checkout@v4 - with: - fetch-depth: 0 - - uses: actions/setup-python@v5 - name: Install Python - with: - python-version: 3.9 - - - name: Install dependencies - run: | - python -m pip install --upgrade pip - python -m pip install build - - - name: Build tarball and wheels - run: | - git clean -xdf - git restore -SW . - python -m build --sdist --wheel . - - - - uses: actions/upload-artifact@v4 - with: - name: releases - path: dist - - test-built-dist: - needs: build-artifacts - runs-on: ubuntu-latest - steps: - - uses: actions/setup-python@v5 - name: Install Python - with: - python-version: '3.10' - - uses: actions/download-artifact@v4 - with: - name: releases - path: dist - - name: List contents of built dist - run: | - ls -ltrh - ls -ltrh dist - - - name: Verify the built dist/wheel is valid - run: | - python -m pip install --upgrade pip - python -m pip install dist/xarray_datatree*.whl - python -c "import datatree; print(datatree.__version__)" - - upload-to-pypi: - needs: test-built-dist - if: github.event_name == 'release' - runs-on: ubuntu-latest - steps: - - uses: actions/download-artifact@v4 - with: - name: releases - path: dist - - name: Publish package to PyPI - uses: pypa/gh-action-pypi-publish@v1.8.11 - with: - user: ${{ secrets.PYPI_USERNAME }} - password: ${{ secrets.PYPI_PASSWORD }} - verbose: true diff --git a/xarray/datatree_/.gitignore b/xarray/datatree_/.gitignore deleted file mode 100644 index 88af9943a90..00000000000 --- a/xarray/datatree_/.gitignore +++ /dev/null @@ -1,136 +0,0 @@ -# Byte-compiled / optimized / DLL files -__pycache__/ -*.py[cod] -*$py.class - -# C extensions -*.so - -# Distribution / packaging -.Python -build/ -develop-eggs/ -dist/ -downloads/ -eggs/ -.eggs/ -lib/ -lib64/ -parts/ -sdist/ -var/ -wheels/ -pip-wheel-metadata/ -share/python-wheels/ -*.egg-info/ -.installed.cfg -*.egg -MANIFEST - -# PyInstaller -# Usually these files are written by a python script from a template -# before PyInstaller builds the exe, so as to inject date/other infos into it. -*.manifest -*.spec - -# Installer logs -pip-log.txt -pip-delete-this-directory.txt - -# Unit test / coverage reports -htmlcov/ -.tox/ -.nox/ -.coverage -.coverage.* -.cache -nosetests.xml -coverage.xml -*.cover -*.py,cover -.hypothesis/ -.pytest_cache/ - -# Translations -*.mo -*.pot - -# Django stuff: -*.log -local_settings.py -db.sqlite3 -db.sqlite3-journal - -# Flask stuff: -instance/ -.webassets-cache - -# Scrapy stuff: -.scrapy - -# Sphinx documentation -docs/_build/ -docs/source/generated - -# PyBuilder -target/ - -# Jupyter Notebook -.ipynb_checkpoints - -# IPython -profile_default/ -ipython_config.py - -# pyenv -.python-version - -# pipenv -# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. -# However, in case of collaboration, if having platform-specific dependencies or dependencies -# having no cross-platform support, pipenv may install dependencies that don't work, or not -# install all needed dependencies. -#Pipfile.lock - -# PEP 582; used by e.g. github.com/David-OConnor/pyflow -__pypackages__/ - -# Celery stuff -celerybeat-schedule -celerybeat.pid - -# SageMath parsed files -*.sage.py - -# Environments -.env -.venv -env/ -venv/ -ENV/ -env.bak/ -venv.bak/ - -# Spyder project settings -.spyderproject -.spyproject - -# Rope project settings -.ropeproject - -# mkdocs documentation -/site - -# mypy -.mypy_cache/ -.dmypy.json -dmypy.json - -# Pyre type checker -.pyre/ - -# version -_version.py - -# Ignore vscode specific settings -.vscode/ diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml deleted file mode 100644 index ea73c38d73e..00000000000 --- a/xarray/datatree_/.pre-commit-config.yaml +++ /dev/null @@ -1,58 +0,0 @@ -# https://pre-commit.com/ -ci: - autoupdate_schedule: monthly -repos: - - repo: https://github.com/pre-commit/pre-commit-hooks - rev: v4.5.0 - hooks: - - id: trailing-whitespace - - id: end-of-file-fixer - - id: check-yaml - # isort should run before black as black sometimes tweaks the isort output - - repo: https://github.com/PyCQA/isort - rev: 5.13.2 - hooks: - - id: isort - # https://github.com/python/black#version-control-integration - - repo: https://github.com/psf/black - rev: 23.12.1 - hooks: - - id: black - - repo: https://github.com/keewis/blackdoc - rev: v0.3.9 - hooks: - - id: blackdoc - - repo: https://github.com/PyCQA/flake8 - rev: 6.1.0 - hooks: - - id: flake8 - # - repo: https://github.com/Carreau/velin - # rev: 0.0.8 - # hooks: - # - id: velin - # args: ["--write", "--compact"] - - repo: https://github.com/pre-commit/mirrors-mypy - rev: v1.8.0 - hooks: - - id: mypy - # Copied from setup.cfg - exclude: "properties|asv_bench|docs" - additional_dependencies: [ - # Type stubs - types-python-dateutil, - types-pkg_resources, - types-PyYAML, - types-pytz, - # Dependencies that are typed - numpy, - typing-extensions>=4.1.0, - ] - # run this occasionally, ref discussion https://github.com/pydata/xarray/pull/3194 - # - repo: https://github.com/asottile/pyupgrade - # rev: v1.22.1 - # hooks: - # - id: pyupgrade - # args: - # - "--py3-only" - # # remove on f-strings in Py3.7 - # - "--keep-percent-format" diff --git a/xarray/datatree_/LICENSE b/xarray/datatree_/LICENSE deleted file mode 100644 index d68e7230919..00000000000 --- a/xarray/datatree_/LICENSE +++ /dev/null @@ -1,201 +0,0 @@ - Apache License - Version 2.0, January 2004 - http://www.apache.org/licenses/ - - TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION - - 1. Definitions. - - "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. - - "Licensor" shall mean the copyright owner or entity authorized by - the copyright owner that is granting the License. - - "Legal Entity" shall mean the union of the acting entity and all - other entities that control, are controlled by, or are under common - control with that entity. For the purposes of this definition, - "control" means (i) the power, direct or indirect, to cause the - direction or management of such entity, whether by contract or - otherwise, or (ii) ownership of fifty percent (50%) or more of the - outstanding shares, or (iii) beneficial ownership of such entity. - - "You" (or "Your") shall mean an individual or Legal Entity - exercising permissions granted by this License. - - "Source" form shall mean the preferred form for making modifications, - including but not limited to software source code, documentation - source, and configuration files. - - "Object" form shall mean any form resulting from mechanical - transformation or translation of a Source form, including but - not limited to compiled object code, generated documentation, - and conversions to other media types. - - "Work" shall mean the work of authorship, whether in Source or - Object form, made available under the License, as indicated by a - copyright notice that is included in or attached to the work - (an example is provided in the Appendix below). - - "Derivative Works" shall mean any work, whether in Source or Object - form, that is based on (or derived from) the Work and for which the - editorial revisions, annotations, elaborations, or other modifications - represent, as a whole, an original work of authorship. For the purposes - of this License, Derivative Works shall not include works that remain - separable from, or merely link (or bind by name) to the interfaces of, - the Work and Derivative Works thereof. - - "Contribution" shall mean any work of authorship, including - the original version of the Work and any modifications or additions - to that Work or Derivative Works thereof, that is intentionally - submitted to Licensor for inclusion in the Work by the copyright owner - or by an individual or Legal Entity authorized to submit on behalf of - the copyright owner. For the purposes of this definition, "submitted" - means any form of electronic, verbal, or written communication sent - to the Licensor or its representatives, including but not limited to - communication on electronic mailing lists, source code control systems, - and issue tracking systems that are managed by, or on behalf of, the - Licensor for the purpose of discussing and improving the Work, but - excluding communication that is conspicuously marked or otherwise - designated in writing by the copyright owner as "Not a Contribution." - - "Contributor" shall mean Licensor and any individual or Legal Entity - on behalf of whom a Contribution has been received by Licensor and - subsequently incorporated within the Work. - - 2. Grant of Copyright License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - copyright license to reproduce, prepare Derivative Works of, - publicly display, publicly perform, sublicense, and distribute the - Work and such Derivative Works in Source or Object form. - - 3. Grant of Patent License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - (except as stated in this section) patent license to make, have made, - use, offer to sell, sell, import, and otherwise transfer the Work, - where such license applies only to those patent claims licensable - by such Contributor that are necessarily infringed by their - Contribution(s) alone or by combination of their Contribution(s) - with the Work to which such Contribution(s) was submitted. If You - institute patent litigation against any entity (including a - cross-claim or counterclaim in a lawsuit) alleging that the Work - or a Contribution incorporated within the Work constitutes direct - or contributory patent infringement, then any patent licenses - granted to You under this License for that Work shall terminate - as of the date such litigation is filed. - - 4. Redistribution. You may reproduce and distribute copies of the - Work or Derivative Works thereof in any medium, with or without - modifications, and in Source or Object form, provided that You - meet the following conditions: - - (a) You must give any other recipients of the Work or - Derivative Works a copy of this License; and - - (b) You must cause any modified files to carry prominent notices - stating that You changed the files; and - - (c) You must retain, in the Source form of any Derivative Works - that You distribute, all copyright, patent, trademark, and - attribution notices from the Source form of the Work, - excluding those notices that do not pertain to any part of - the Derivative Works; and - - (d) If the Work includes a "NOTICE" text file as part of its - distribution, then any Derivative Works that You distribute must - include a readable copy of the attribution notices contained - within such NOTICE file, excluding those notices that do not - pertain to any part of the Derivative Works, in at least one - of the following places: within a NOTICE text file distributed - as part of the Derivative Works; within the Source form or - documentation, if provided along with the Derivative Works; or, - within a display generated by the Derivative Works, if and - wherever such third-party notices normally appear. The contents - of the NOTICE file are for informational purposes only and - do not modify the License. You may add Your own attribution - notices within Derivative Works that You distribute, alongside - or as an addendum to the NOTICE text from the Work, provided - that such additional attribution notices cannot be construed - as modifying the License. - - You may add Your own copyright statement to Your modifications and - may provide additional or different license terms and conditions - for use, reproduction, or distribution of Your modifications, or - for any such Derivative Works as a whole, provided Your use, - reproduction, and distribution of the Work otherwise complies with - the conditions stated in this License. - - 5. Submission of Contributions. Unless You explicitly state otherwise, - any Contribution intentionally submitted for inclusion in the Work - by You to the Licensor shall be under the terms and conditions of - this License, without any additional terms or conditions. - Notwithstanding the above, nothing herein shall supersede or modify - the terms of any separate license agreement you may have executed - with Licensor regarding such Contributions. - - 6. Trademarks. This License does not grant permission to use the trade - names, trademarks, service marks, or product names of the Licensor, - except as required for reasonable and customary use in describing the - origin of the Work and reproducing the content of the NOTICE file. - - 7. Disclaimer of Warranty. Unless required by applicable law or - agreed to in writing, Licensor provides the Work (and each - Contributor provides its Contributions) on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied, including, without limitation, any warranties or conditions - of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A - PARTICULAR PURPOSE. You are solely responsible for determining the - appropriateness of using or redistributing the Work and assume any - risks associated with Your exercise of permissions under this License. - - 8. Limitation of Liability. In no event and under no legal theory, - whether in tort (including negligence), contract, or otherwise, - unless required by applicable law (such as deliberate and grossly - negligent acts) or agreed to in writing, shall any Contributor be - liable to You for damages, including any direct, indirect, special, - incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the - Work (including but not limited to damages for loss of goodwill, - work stoppage, computer failure or malfunction, or any and all - other commercial damages or losses), even if such Contributor - has been advised of the possibility of such damages. - - 9. Accepting Warranty or Additional Liability. While redistributing - the Work or Derivative Works thereof, You may choose to offer, - and charge a fee for, acceptance of support, warranty, indemnity, - or other liability obligations and/or rights consistent with this - License. However, in accepting such obligations, You may act only - on Your own behalf and on Your sole responsibility, not on behalf - of any other Contributor, and only if You agree to indemnify, - defend, and hold each Contributor harmless for any liability - incurred by, or claims asserted against, such Contributor by reason - of your accepting any such warranty or additional liability. - - END OF TERMS AND CONDITIONS - - APPENDIX: How to apply the Apache License to your work. - - To apply the Apache License to your work, attach the following - boilerplate notice, with the fields enclosed by brackets "[]" - replaced with your own identifying information. (Don't include - the brackets!) The text should be enclosed in the appropriate - comment syntax for the file format. We also recommend that a - file or class name and description of purpose be included on the - same "printed page" as the copyright notice for easier - identification within third-party archives. - - Copyright (c) 2022 onwards, datatree developers - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md deleted file mode 100644 index e41a13b4cb6..00000000000 --- a/xarray/datatree_/README.md +++ /dev/null @@ -1,95 +0,0 @@ -# datatree - -| CI | [![GitHub Workflow Status][github-ci-badge]][github-ci-link] [![Code Coverage Status][codecov-badge]][codecov-link] [![pre-commit.ci status][pre-commit.ci-badge]][pre-commit.ci-link] | -| :---------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| **Docs** | [![Documentation Status][rtd-badge]][rtd-link] | -| **Package** | [![Conda][conda-badge]][conda-link] [![PyPI][pypi-badge]][pypi-link] | -| **License** | [![License][license-badge]][repo-link] | - - -**Datatree is a prototype implementation of a tree-like hierarchical data structure for xarray.** - -Datatree was born after the xarray team recognised a [need for a new hierarchical data structure](https://github.com/pydata/xarray/issues/4118), -that was more flexible than a single `xarray.Dataset` object. -The initial motivation was to represent netCDF files / Zarr stores with multiple nested groups in a single in-memory object, -but `datatree.DataTree` objects have many other uses. - -### DEPRECATION NOTICE - -Datatree is in the process of being merged upstream into xarray (as of [v0.0.14](https://github.com/xarray-contrib/datatree/releases/tag/v0.0.14), see xarray issue [#8572](https://github.com/pydata/xarray/issues/8572)). We are aiming to preserve the record of contributions to this repository during the migration process. However whilst we will hapily accept new PRs to this repository, this repo will be deprecated and any PRs since [v0.0.14](https://github.com/xarray-contrib/datatree/releases/tag/v0.0.14) might be later copied across to xarray without full git attribution. - -Hopefully for users the disruption will be minimal - and just mean that in some future version of xarray you only need to do `from xarray import DataTree` rather than `from datatree import DataTree`. Once the migration is complete this repository will be archived. - -### Installation -You can install datatree via pip: -```shell -pip install xarray-datatree -``` - -or via conda-forge -```shell -conda install -c conda-forge xarray-datatree -``` - -### Why Datatree? - -You might want to use datatree for: - -- Organising many related datasets, e.g. results of the same experiment with different parameters, or simulations of the same system using different models, -- Analysing similar data at multiple resolutions simultaneously, such as when doing a convergence study, -- Comparing heterogenous but related data, such as experimental and theoretical data, -- I/O with nested data formats such as netCDF / Zarr groups. - -[**Talk slides on Datatree from AMS-python 2023**](https://speakerdeck.com/tomnicholas/xarray-datatree-hierarchical-data-structures-for-multi-model-science) - -### Features - -The approach used here is based on benbovy's [`DatasetNode` example](https://gist.github.com/benbovy/92e7c76220af1aaa4b3a0b65374e233a) - the basic idea is that each tree node wraps a up to a single `xarray.Dataset`. The differences are that this effort: -- Uses a node structure inspired by [anytree](https://github.com/xarray-contrib/datatree/issues/7) for the tree, -- Implements path-like getting and setting, -- Has functions for mapping user-supplied functions over every node in the tree, -- Automatically dispatches *some* of `xarray.Dataset`'s API over every node in the tree (such as `.isel`), -- Has a bunch of tests, -- Has a printable representation that currently looks like this: -drawing - -### Get Started - -You can create a `DataTree` object in 3 ways: -1) Load from a netCDF file (or Zarr store) that has groups via `open_datatree()`. -2) Using the init method of `DataTree`, which creates an individual node. - You can then specify the nodes' relationships to one other, either by setting `.parent` and `.children` attributes, - or through `__get/setitem__` access, e.g. `dt['path/to/node'] = DataTree()`. -3) Create a tree from a dictionary of paths to datasets using `DataTree.from_dict()`. - -### Development Roadmap - -Datatree currently lives in a separate repository to the main xarray package. -This allows the datatree developers to make changes to it, experiment, and improve it faster. - -Eventually we plan to fully integrate datatree upstream into xarray's main codebase, at which point the [github.com/xarray-contrib/datatree](https://github.com/xarray-contrib/datatree>) repository will be archived. -This should not cause much disruption to code that depends on datatree - you will likely only have to change the import line (i.e. from ``from datatree import DataTree`` to ``from xarray import DataTree``). - -However, until this full integration occurs, datatree's API should not be considered to have the same [level of stability as xarray's](https://docs.xarray.dev/en/stable/contributing.html#backwards-compatibility). - -### User Feedback - -We really really really want to hear your opinions on datatree! -At this point in development, user feedback is critical to help us create something that will suit everyone's needs. -Please raise any thoughts, issues, suggestions or bugs, no matter how small or large, on the [github issue tracker](https://github.com/xarray-contrib/datatree/issues). - - -[github-ci-badge]: https://img.shields.io/github/actions/workflow/status/xarray-contrib/datatree/main.yaml?branch=main&label=CI&logo=github -[github-ci-link]: https://github.com/xarray-contrib/datatree/actions?query=workflow%3ACI -[codecov-badge]: https://img.shields.io/codecov/c/github/xarray-contrib/datatree.svg?logo=codecov -[codecov-link]: https://codecov.io/gh/xarray-contrib/datatree -[rtd-badge]: https://img.shields.io/readthedocs/xarray-datatree/latest.svg -[rtd-link]: https://xarray-datatree.readthedocs.io/en/latest/?badge=latest -[pypi-badge]: https://img.shields.io/pypi/v/xarray-datatree?logo=pypi -[pypi-link]: https://pypi.org/project/xarray-datatree -[conda-badge]: https://img.shields.io/conda/vn/conda-forge/xarray-datatree?logo=anaconda -[conda-link]: https://anaconda.org/conda-forge/xarray-datatree -[license-badge]: https://img.shields.io/github/license/xarray-contrib/datatree -[repo-link]: https://github.com/xarray-contrib/datatree -[pre-commit.ci-badge]: https://results.pre-commit.ci/badge/github/xarray-contrib/datatree/main.svg -[pre-commit.ci-link]: https://results.pre-commit.ci/latest/github/xarray-contrib/datatree/main diff --git a/xarray/datatree_/ci/doc.yml b/xarray/datatree_/ci/doc.yml deleted file mode 100644 index f3b95f71bd4..00000000000 --- a/xarray/datatree_/ci/doc.yml +++ /dev/null @@ -1,25 +0,0 @@ -name: datatree-doc -channels: - - conda-forge -dependencies: - - pip - - python>=3.9 - - netcdf4 - - scipy - - sphinx>=4.2.0 - - sphinx-copybutton - - sphinx-panels - - sphinx-autosummary-accessors - - sphinx-book-theme >= 0.0.38 - - nbsphinx - - sphinxcontrib-srclinks - - pickleshare - - pydata-sphinx-theme>=0.4.3 - - ipython - - h5netcdf - - zarr - - xarray - - pip: - - -e .. - - sphinxext-rediraffe - - sphinxext-opengraph diff --git a/xarray/datatree_/ci/environment.yml b/xarray/datatree_/ci/environment.yml deleted file mode 100644 index fc0c6d97e9f..00000000000 --- a/xarray/datatree_/ci/environment.yml +++ /dev/null @@ -1,16 +0,0 @@ -name: datatree-test -channels: - - conda-forge - - nodefaults -dependencies: - - python>=3.9 - - netcdf4 - - pytest - - flake8 - - black - - codecov - - pytest-cov - - h5netcdf - - zarr - - pip: - - xarray>=2022.05.0.dev0 diff --git a/xarray/datatree_/codecov.yml b/xarray/datatree_/codecov.yml deleted file mode 100644 index 44fd739d417..00000000000 --- a/xarray/datatree_/codecov.yml +++ /dev/null @@ -1,21 +0,0 @@ -codecov: - require_ci_to_pass: false - max_report_age: off - -comment: false - -ignore: - - 'datatree/tests/*' - - 'setup.py' - - 'conftest.py' - -coverage: - precision: 2 - round: down - status: - project: - default: - target: 95 - informational: true - patch: off - changes: false diff --git a/xarray/datatree_/conftest.py b/xarray/datatree_/conftest.py deleted file mode 100644 index 7ef19174298..00000000000 --- a/xarray/datatree_/conftest.py +++ /dev/null @@ -1,3 +0,0 @@ -import pytest - -pytest.register_assert_rewrite("datatree.testing") diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py deleted file mode 100644 index 51c5f1b3073..00000000000 --- a/xarray/datatree_/datatree/__init__.py +++ /dev/null @@ -1,7 +0,0 @@ -# import public API -from xarray.core.treenode import InvalidTreeError, NotFoundInTreeError - -__all__ = ( - "InvalidTreeError", - "NotFoundInTreeError", -) diff --git a/xarray/datatree_/datatree/py.typed b/xarray/datatree_/datatree/py.typed deleted file mode 100644 index e69de29bb2d..00000000000 From c661a5886fb040bae2711f9b5ade99a26d9b9bfb Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Wed, 11 Sep 2024 20:46:56 -0600 Subject: [PATCH 38/57] DAS-2155: remove vestiges of datatree_ --- .github/workflows/ci-additional.yaml | 3 +-- .pre-commit-config.yaml | 1 - MANIFEST.in | 2 -- 3 files changed, 1 insertion(+), 5 deletions(-) delete mode 100644 MANIFEST.in diff --git a/.github/workflows/ci-additional.yaml b/.github/workflows/ci-additional.yaml index 21981e76cec..39781e345a7 100644 --- a/.github/workflows/ci-additional.yaml +++ b/.github/workflows/ci-additional.yaml @@ -81,8 +81,7 @@ jobs: # # If dependencies emit warnings we can't do anything about, add ignores to # `xarray/tests/__init__.py`. - # [MHS, 01/25/2024] Skip datatree_ documentation remove after #8572 - python -m pytest --doctest-modules xarray --ignore xarray/tests --ignore xarray/datatree_ -Werror + python -m pytest --doctest-modules xarray --ignore xarray/tests -Werror mypy: name: Mypy diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 6ebd66bdf69..a86ae0ac73b 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -2,7 +2,6 @@ ci: autoupdate_schedule: monthly autoupdate_commit_msg: 'Update pre-commit hooks' -exclude: 'xarray/datatree_.*' repos: - repo: https://github.com/pre-commit/pre-commit-hooks rev: v4.6.0 diff --git a/MANIFEST.in b/MANIFEST.in deleted file mode 100644 index a119e7df1fd..00000000000 --- a/MANIFEST.in +++ /dev/null @@ -1,2 +0,0 @@ -prune xarray/datatree_* -recursive-include xarray/datatree_/datatree *.py From bb659f8d15bf42215cc390de39373e7de4273c3f Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 08:57:29 -0600 Subject: [PATCH 39/57] DAS-2155: Some wording changes and link fixes Also started updating copy problems. abandoning for now. --- doc/getting-started-guide/quick-overview.rst | 2 +- doc/user-guide/data-structures.rst | 9 +++++---- doc/user-guide/hierarchical-data.rst | 2 +- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/doc/getting-started-guide/quick-overview.rst b/doc/getting-started-guide/quick-overview.rst index 5d7fb48b23e..082caf535c7 100644 --- a/doc/getting-started-guide/quick-overview.rst +++ b/doc/getting-started-guide/quick-overview.rst @@ -294,7 +294,7 @@ individual DataArrays in a similar fashion. dt["simulation/coarse/foo"] -We can also view the data in a particular group as a readonly :py:class:`~xarray.datatree.DatasetView` using :py:attr:`xarray.datatree.dataset`: +We can also view the data in a particular group as a readonly :py:class:`~xarray.Datatree.DatasetView` using :py:attr:`xarray.Datatree.dataset`: .. ipython:: python diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 6be3f74d6d7..2aefa8a9a08 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -577,7 +577,7 @@ Let's make a single datatree node with some example data in it: .. ipython:: python ds1 = xr.Dataset({"foo": "orange"}) - dt = xr.DataTree(name="root", dataset=ds1) # create root node + dt = xr.DataTree(name="root", dataset=ds1) dt At this point we have created a single node datatree with no parent and no children. @@ -587,18 +587,18 @@ At this point we have created a single node datatree with no parent and no child dt.parent is None dt.children -We can add a copy of a second node to this tree, assigning it to the parent node ``dt``: +We can add a second node to this tree, assigning it to the parent node ``dt``: .. ipython:: python dataset2 = xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}) dt2 = xr.DataTree(name="a", dataset=dataset2) - # Add a copy of the second Datatree to the root + # Add the child Datatree to the root node dt.children = {"child-node": dt2} dt -Or more idiomatically you can create a tree from a dictionary of ``Datasets`` and +More idiomatically you can create a tree from a dictionary of ``Datasets`` and `DataTrees`. In this case we add a new node under ``dt["child-node"]`` by providing the explicit path under ``"child-node"`` as the dictionary key: @@ -781,6 +781,7 @@ Some examples: ), }, ) + dt2 Here there are four different coordinate variables, which apply to variables in the DataTree in different ways: diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index b43d5ec8af0..4d4f8359425 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -76,7 +76,7 @@ The nodes representing Bart and Lisa are now connected - we can confirm their si .. ipython:: python - list(bart.siblings) + list(homer["Bart"].siblings) But oops, we forgot Homer's third daughter, Maggie! Let's add her by updating Homer's :py:class:`~xarray.DataTree.children` property to include her: From 9ddbfd5e8403e82e6a23c1be7b3463d665f7f394 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 10:58:21 -0600 Subject: [PATCH 40/57] Update doc/getting-started-guide/quick-overview.rst Co-authored-by: Eni <51421921+eni-awowale@users.noreply.github.com> --- doc/getting-started-guide/quick-overview.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/getting-started-guide/quick-overview.rst b/doc/getting-started-guide/quick-overview.rst index 082caf535c7..b96422b8835 100644 --- a/doc/getting-started-guide/quick-overview.rst +++ b/doc/getting-started-guide/quick-overview.rst @@ -281,7 +281,7 @@ child nodes. You can see this inheritance in the above representation of the Da ``people`` and ``species`` defined in the root ``/`` node are shown in the child nodes both ``/simulation/coarse`` and ``/simulation/fine``. All coordinates in parent-descendent lineage must be alignable to form a DataTree. If your input data is not aligned, you can still get a nested ``dict`` of -:py:class:`~xarray.Dataset` objects with :py:func:`~xarray.open_group` and then apply any required changes to ensure alignment +:py:class:`~xarray.Dataset` objects with :py:func:`~xarray.open_groups` and then apply any required changes to ensure alignment before converting to a :py:class:`~xarray.DataTree`. The constraints on each group are the same as the constraint on DataArrays within a single dataset with the From 77e4d0ad46e0b12bb44e456ed702c1517fb0c3d4 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 10:58:36 -0600 Subject: [PATCH 41/57] Update doc/user-guide/data-structures.rst Co-authored-by: Eni <51421921+eni-awowale@users.noreply.github.com> --- doc/user-guide/data-structures.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 2aefa8a9a08..779db3d4690 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -691,7 +691,7 @@ Dictionary-like methods We can update a datatree in-place using Python's standard dictionary syntax, similar to how we can for Dataset objects. For example, to create this example -datatree from scratch, we could have written: +DataTree from scratch, we could have written: .. ipython:: python From bf3a2fd9df05e4491975cb48a14eb7321258ce66 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 10:58:55 -0600 Subject: [PATCH 42/57] Update doc/user-guide/data-structures.rst Co-authored-by: Eni <51421921+eni-awowale@users.noreply.github.com> --- doc/user-guide/data-structures.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 779db3d4690..86a7e62a9d6 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -652,7 +652,7 @@ but with values given by either :py:class:`~xarray.DataArray` objects or other Iterating over keys will iterate over both the names of variables and child nodes. -We can also access all the data in a single node, and its inerited coordinates, through a dataset-like view +We can also access all the data in a single node, and its inherited coordinates, through a dataset-like view .. ipython:: python From 25543fbf63403477212c013874a96a43ab750caf Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 10:59:09 -0600 Subject: [PATCH 43/57] Update doc/getting-started-guide/quick-overview.rst Co-authored-by: Eni <51421921+eni-awowale@users.noreply.github.com> --- doc/getting-started-guide/quick-overview.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/getting-started-guide/quick-overview.rst b/doc/getting-started-guide/quick-overview.rst index b96422b8835..844cd0c230b 100644 --- a/doc/getting-started-guide/quick-overview.rst +++ b/doc/getting-started-guide/quick-overview.rst @@ -294,7 +294,7 @@ individual DataArrays in a similar fashion. dt["simulation/coarse/foo"] -We can also view the data in a particular group as a readonly :py:class:`~xarray.Datatree.DatasetView` using :py:attr:`xarray.Datatree.dataset`: +We can also view the data in a particular group as a read-only :py:class:`~xarray.Datatree.DatasetView` using :py:attr:`xarray.Datatree.dataset`: .. ipython:: python From 3f7a639ffa89f862d2be2b1f2626e9e1236f9448 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 11:00:18 -0600 Subject: [PATCH 44/57] Update doc/getting-started-guide/quick-overview.rst Co-authored-by: Eni <51421921+eni-awowale@users.noreply.github.com> --- doc/getting-started-guide/quick-overview.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/getting-started-guide/quick-overview.rst b/doc/getting-started-guide/quick-overview.rst index 844cd0c230b..5efe3acc609 100644 --- a/doc/getting-started-guide/quick-overview.rst +++ b/doc/getting-started-guide/quick-overview.rst @@ -267,7 +267,7 @@ Now we'll put these datasets into a hierarchical DataTree: dt This created a DataTree with nested groups. We have one root group, containing information about individual -people. This root group can be named, but here is unnamed, so is referred to with ``"/"``, same as the root of a +people. This root group can be named, but here it is unnamed, and is referenced with ``"/"``. This structure is similar to a unix-like filesystem. The root group then has one subgroup ``simulation``, which contains no data itself but does contain another two subgroups, named ``fine`` and ``coarse``. From 4cea57fb83873df2b75a6ef01c4ab89dff9c3ee0 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 12:26:22 -0600 Subject: [PATCH 45/57] remove ordered Co-authored-by: Stephan Hoyer --- doc/user-guide/data-structures.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 86a7e62a9d6..04d7f53d94a 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -528,7 +528,7 @@ nested dict-like containers of both :py:class:`xarray.DataArray`\s and :py:class A single datatree object is known as a "node", and its position relative to other nodes is defined by two more key properties: -- ``children``: An ordered dictionary mapping from names to other :py:class:`~xarray.DataTree` +- ``children``: An dictionary mapping from names to other :py:class:`~xarray.DataTree` objects, known as its "child nodes". - ``parent``: The single :py:class:`~xarray.DataTree` object whose children this datatree is a member of, known as its "parent node". From 43a9ecf670e5a76564731e805e170c1444bd0abd Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 12:27:06 -0600 Subject: [PATCH 46/57] data -> dataset Co-authored-by: Stephan Hoyer --- doc/user-guide/data-structures.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 04d7f53d94a..46ba7e95b8a 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -565,7 +565,7 @@ specifying the nodes' relationship to one another as you create each one. The :py:class:`~xarray.DataTree` constructor takes: -- ``data``: The data that will be stored in this node, represented by a single +- ``dataset``: The data that will be stored in this node, represented by a single :py:class:`xarray.Dataset`, or a named :py:class:`xarray.DataArray`. - ``parent``: The parent node (if there is one), given as a :py:class:`~xarray.DataTree` object. - ``children``: The various child nodes (if there are any), given as a mapping From f5a4993eb7df5a2e3798a31a4a00c40aa9f4b360 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 12:30:23 -0600 Subject: [PATCH 47/57] remove parent from constructor params Co-authored-by: Stephan Hoyer --- doc/user-guide/data-structures.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 46ba7e95b8a..40319ec82a6 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -567,7 +567,6 @@ The :py:class:`~xarray.DataTree` constructor takes: - ``dataset``: The data that will be stored in this node, represented by a single :py:class:`xarray.Dataset`, or a named :py:class:`xarray.DataArray`. -- ``parent``: The parent node (if there is one), given as a :py:class:`~xarray.DataTree` object. - ``children``: The various child nodes (if there are any), given as a mapping from string keys to :py:class:`~xarray.DataTree` objects. - ``name``: A string to use as the name of this node. From 47a363cda122184d384f688938ab3585608d8b8e Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 12:31:59 -0600 Subject: [PATCH 48/57] duplicate word typo Co-authored-by: Stephan Hoyer --- doc/user-guide/data-structures.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 40319ec82a6..855c15bd3bd 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -731,7 +731,7 @@ The constraint that this puts on a DataTree is that dimensions and indices that are inherited must be aligned with any direct decendent node's existing dimension or index. This allows decendents to use dimensions defined in ancestor nodes, without duplicating that information. But as a consequence, if -a dimension dimension-name is defined in on a node and that same dimension-name +a dimension-name is defined in on a node and that same dimension-name exists in one of its ancestors, they must align (have the same index and size). From 82bf8f3004042b54d31f7960f90c419929def530 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 12:36:51 -0600 Subject: [PATCH 49/57] DAS-2155: Drop detailed note on ordered trees --- doc/user-guide/data-structures.rst | 9 --------- 1 file changed, 9 deletions(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 855c15bd3bd..bb2abad608b 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -541,15 +541,6 @@ has at most one parent, there can only ever be one root node in a given tree. The overall structure is technically a `connected acyclic undirected rooted graph`, otherwise known as a `"Tree" `_. -.. note:: - - Technically a :py:class:`~xarray.DataTree` with more than one child node forms an - `"Ordered Tree" `_, - because the children are stored in an Ordered Dictionary. However, this - distinction only really matters for a few edge cases involving operations - on multiple trees simultaneously, and can safely be ignored by most users. - - :py:class:`~xarray.DataTree` objects can also optionally have a ``name`` as well as ``attrs``, just like a :py:class:`~xarray.DataArray`. Again these are not normally used unless explicitly accessed by the user. From d3ee6323ef0db512c6c323979fc5323a1c1ca570 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 12:38:09 -0600 Subject: [PATCH 50/57] DAS-2155: removes .to_dataset inherited flag example --- doc/user-guide/data-structures.rst | 9 --------- 1 file changed, 9 deletions(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index bb2abad608b..5254b2d8654 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -658,15 +658,6 @@ as a new and mutable :py:class:`~xarray.Dataset` object via dt["child-node"].to_dataset() -This same call can be made to get only the local node variables without any -inherited ones, by setting the inherited keyword to False, but in this example -there are no inherited coordinates so the result is the same as the previous call. - -.. ipython:: python - - dt["child-node"].to_dataset(inherited=False) - - Like with :py:class:`~xarray.Dataset`, you can access the data and coordinate variables of a node separately via the :py:attr:`~xarray.DataTree.data_vars` and :py:attr:`~xarray.DataTree.coords` attributes: From 1de00ac254c3a04aad0eac5c8671e36e7625f6db Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 13:47:47 -0600 Subject: [PATCH 51/57] explicitly mention dimensions are inherited Co-authored-by: Stephan Hoyer --- doc/user-guide/data-structures.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 5254b2d8654..a14e9b2db27 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -703,7 +703,7 @@ underlying data arrays by calling ``dt.copy(deep=True)``. DataTree Inheritance ~~~~~~~~~~~~~~~~~~~~ -DataTree implements a simple inheritance mechanism. Coordinates and their +DataTree implements a simple inheritance mechanism. Coordinates, dimensions and their associated indices are propagated from downward starting from the root node to all descendent nodes. Coordinate inheritance was inspired by the NetCDF-CF inherited dimensions, but DataTree's inheritance is slightly stricter yet From f705f4cc8af9b8141dcf50f99d999b69c1e3d6da Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 13:48:36 -0600 Subject: [PATCH 52/57] better header for Hierarchical data Co-authored-by: Stephan Hoyer --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 4d4f8359425..55a05a4b6f8 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -1,6 +1,6 @@ .. _hierarchical-data: -Working With Hierarchical Data +Hierarchical data ============================== .. ipython:: python From 97261c101ac592aee17f53b56b15a10e878354d7 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 13:49:12 -0600 Subject: [PATCH 53/57] typo Co-authored-by: Stephan Hoyer --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 55a05a4b6f8..28aea461656 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -135,7 +135,7 @@ We can add Herbert to the family tree without displacing Homer by :py:meth:`~xar .. note:: This example shows a subtlety - the returned tree has Homer's brother listed as ``"Herbert"``, but the original node was named "Herb". Not only are names overridden when stored as keys like this, - but the new node is a copy, so that the original node that was reference is unchanged (i.e. ``herbert.name == "Herb"`` still). + but the new node is a copy, so that the original node that was referenced is unchanged (i.e. ``herbert.name == "Herb"`` still). In other words, nodes are copied into trees, not inserted into them. This is intentional, and mirrors the behaviour when storing named :py:class:`~xarray.DataArray` objects inside datasets. From d698bdd3fa4587c2a07cf6f09f7d46e50af8e881 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 14:55:12 -0600 Subject: [PATCH 54/57] DAS-2155: update what's new --- doc/whats-new.rst | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 8b3a3ce1794..a35617ab8e4 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -22,10 +22,13 @@ v2024.09.1 (unreleased) New Features ~~~~~~~~~~~~ - ``DataTree`` related functionality is now exposed in the main ``xarray`` public - API. This includes: ``xarray.DataTree``, ``xarray.open_datatree``, + API. This includes: ``xarray.DataTree``, ``xarray.open_datatree``, ``xarray.open_groups``, ``xarray.map_over_subtree``, ``xarray.register_datatree_accessor`` and ``xarray.testing.assert_isomorphic``. - By `Owen Littlejohns `_ and + By `Owen Littlejohns `_, + `Eni Awowale `_, + `Matt Savoie `_, + `Stephan Hoyer `_ and `Tom Nicholas `_. @@ -134,15 +137,6 @@ Bug fixes (:issue:`9408`, :pull:`9413`). By `Oliver Higgs `_. -Performance -~~~~~~~~~~~ - -- Speed up grouping by avoiding deep-copy of non-dimension coordinates (:issue:`9426`, :pull:`9393`) - By `Deepak Cherian `_. - -Documentation -~~~~~~~~~~~~~ - Internal Changes ~~~~~~~~~~~~~~~~ From a1c2f11fbdfa640685087741a6a04497641b34c6 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 14:56:30 -0600 Subject: [PATCH 55/57] DAS-2155: typo --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 28aea461656..450daf3f06d 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -19,7 +19,7 @@ Why Hierarchical Data? ---------------------- Many real-world datasets are composed of multiple differing components, -and it can often be be useful to think of these in terms of a hierarchy of related groups of data. +and it can often be useful to think of these in terms of a hierarchy of related groups of data. Examples of data which one might want organise in a grouped or hierarchical manner include: - Simulation data at multiple resolutions, From ae8bb71186708310496dedc715b57a061872a847 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 15:00:08 -0600 Subject: [PATCH 56/57] DAS-2155: Fix earlier bad merge --- doc/whats-new.rst | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/doc/whats-new.rst b/doc/whats-new.rst index c1396eb7ffe..e6caac788b1 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -43,6 +43,11 @@ Deprecations Bug fixes ~~~~~~~~~ +- Make illegal path-like variable names when constructing a DataTree from a Dataset + (:issue:`9339`, :pull:`9378`) + By `Etienne Schalk `_. + + Documentation ~~~~~~~~~~~~~ @@ -116,9 +121,6 @@ Breaking changes Bug fixes ~~~~~~~~~ -- Make illegal path-like variable names when constructing a DataTree from a Dataset - (:issue:`9339`, :pull:`9378`) - By `Etienne Schalk `_. - Fix bug with rechunking to a frequency when some periods contain no data (:issue:`9360`). By `Deepak Cherian `_. - Fix bug causing `DataTree.from_dict` to be sensitive to insertion order (:issue:`9276`, :pull:`9292`). From 9e179d2a2bec06cb8412453e61f506609a8f9b76 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Thu, 12 Sep 2024 15:30:59 -0600 Subject: [PATCH 57/57] DAS-2155: Updates to call out inherited coordinates and inherited flag --- doc/user-guide/data-structures.rst | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index a14e9b2db27..b5e83789806 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -775,16 +775,31 @@ Coordinate variables are inherited to descendent nodes, which means that variables at different levels of a hierarchical DataTree are always aligned. Placing the ``time`` variable at the root node automatically indicates that it applies to all descendent nodes. Similarly, ``station`` is in the base -``weather`` node, because it applies to all weather variables, both directly -in ``weather`` and in the ``temperature`` sub-tree. +``weather`` node, because it applies to all weather variables, both directly in +``weather`` and in the ``temperature`` sub-tree. Notice the inherited coordinates are +explicitly shown in the tree representation under ``Inherited coordinates:``. -Accessing any of the lower level trees as an ``xarray.Dataset`` would -automatically include coordinates from higher levels (e.g., ``time`` and ``station``): +.. ipython:: python + + dt2["/weather"] + +Accessing any of the lower level trees through the :py:func:`.dataset ` property +automatically includes coordinates from higher levels (e.g., ``time`` and +``station``): .. ipython:: python dt2["/weather/temperature"].dataset +Similarly, when you retrieve a Dataset through :py:func:`~xarray.DataTree.to_dataset` , the inherited coordinates are +included by default unless you exclude them with the ``inherited`` flag: + +.. ipython:: python + + dt2["/weather/temperature"].to_dataset() + + dt2["/weather/temperature"].to_dataset(inherited=False) + .. _coordinates: