Loading CICE data is very expensive #287

MartinDix · 2022-06-08T05:50:04Z

Loading a CICE variable takes much more time and memory than a MOM variable. E.g.

import cosima_cookbook as cc
session = cc.database.create_session()
expt = '025deg_jra55_ryf9091_gadi'
aice = cc.querying.getvar(expt, 'aice_m', session, n=120)

takes 90 s and several GB of memory (from notebook on OOD) compared to

sea_level = cc.querying.getvar(expt, 'sea_level', session, n=120)

which takes ~15s. Trying to load the full run for a CICE variable takes a crazy amount of memory.

I think the issue is that the CICE variables have

                aice_m:coordinates = "TLON TLAT time" ;

where TLON and TLAT are 2D variables included in the CICE files. MOM variables have

                sea_level:coordinates = "geolon_t geolat_t" ;

where geolon_t and geolat_t are not in the files.

I think this means that xarray.open_mfdataset is reading TLON and TLAT for each file to check if it has to concatenate on those coordinates.

I couldn't see a way of persuading xarray that it should only try to concatenate on the time dimension.

The text was updated successfully, but these errors were encountered:

rmholmes · 2022-06-08T06:25:39Z

Hi Martin. I'm not sure of your specific case, but when loading datasets using xr.open_mfdataset I typically use something like:

OISST = xr.open_mfdataset('/g/data/ua8/NOAA_OISST/AVHRR/v2-1_modified/*_' + str(year) + '.nc',concat_dim="time", combine="nested", data_vars='minimal', coords='minimal', compat='override',parallel=True)

This makes some extra assumptions about concat variables etc. and makes the loading much quicker. It's described in more detail in the "Note" at https://xarray.pydata.org/en/stable/user-guide/io.html#reading-multi-file-datasets

I would have to differ to @angus-g or @aidanheerdegen as to whether these options are/should be implemented in the cookbook.

adele-morrison · 2022-06-08T06:28:37Z

decode_coords = False speeds it up a lot, as in this IcePlottingExample.

MartinDix · 2022-06-09T23:19:33Z

Thanks Adele, decode_coords is what I'd been looking for.

access-hive-bot · 2023-02-15T02:01:29Z

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/issues-loading-access-om2-01-data-from-cycle-4/418/3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading CICE data is very expensive #287

Loading CICE data is very expensive #287

MartinDix commented Jun 8, 2022

rmholmes commented Jun 8, 2022 •

edited

Loading

adele-morrison commented Jun 8, 2022

MartinDix commented Jun 9, 2022

access-hive-bot commented Feb 15, 2023

Loading CICE data is very expensive #287

Loading CICE data is very expensive #287

Comments

MartinDix commented Jun 8, 2022

rmholmes commented Jun 8, 2022 • edited Loading

adele-morrison commented Jun 8, 2022

MartinDix commented Jun 9, 2022

access-hive-bot commented Feb 15, 2023

rmholmes commented Jun 8, 2022 •

edited

Loading