-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time coordinates are sometimes integers, not datetime64 #2
Comments
@gjoseph92 I believe you're running into pydata/xarray#3291 I'm finally trying out stackstac and just ran into this. basically xarray doesn't like timezones, and since the ite spec has times always is UTC my strategy has been to either 1. rstrip('Z') on datestrings, or 2. use .tz_localize(None) on your pandas datetime index. If you'd like I can put in PR, should be an easy fix! import numpy as np
import xarray as xr
print(xr.__version__) #0.17.0
np.random.seed(0)
temperature = 15 + 8 * np.random.randn(2, 2, 3)
precipitation = 10 * np.random.rand(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
# https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#properties-object
# STAC ITEMS have datetime as UTC strings like this:
dates = ['2019-12-30T00:00:01.000Z', '2020-01-05T00:00:01.000Z', '2020-01-11T00:00:01.000Z']
times = pd.to_datetime(dates)
da = xr.DataArray(
data=temperature,
dims=["x", "y", "time"],
coords=dict(
lon=(["x", "y"], lon),
lat=(["x", "y"], lat),
#time=times, #results in dtype 'object'
time = times.tz_localize(None), #results in dype 'datetime64[ns]'
)
)
da |
So, after I looked into this a bit, I couldn't understand why your example with S2 cogs does work as expected. It looks to me like there is an issue in pd.to_datetime() with # NASA CMR-STAC Dates
dates = ['2019-12-30T00:00:01.000Z', '2020-01-05T00:00:01.000Z', '2020-01-11T00:00:01.000Z']
pd.to_datetime(dates, infer_datetime_format=True, errors="coerce")
# S2 dates
dates = ['2020-05-01T18:04:03Z', '2020-04-28T17:54:06Z', '2020-04-26T18:04:09Z']
pd.to_datetime(dates, infer_datetime_format=True, errors="coerce")
the UTC time zone info is there as expected if you don't use infer_datetime_format (default is False) # S2 dates
dates = ['2020-05-01T18:04:03Z', '2020-04-28T17:54:06Z', '2020-04-26T18:04:09Z']
pd.to_datetime(dates, infer_datetime_format=False, errors="coerce")
Seems like a pandas bug? cc @TomAugspurger. Should stackstac expose infer_datetime_format as a keyword? not sure how much of a speedup that gets. In any case, adding a |
It's probably a bug. My understanding of |
@scottyhq thanks a ton for looking into this! Based on pydata/xarray#3291 (comment), it sounds like dropping timezone information before going to xarray is currently the only option. I don't see anything in the STAC spec that explicitly requires UTC; it just says datetimes must follow RFC 3339, which does include How about we add a As for the
|
https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#properties-object states that datetime requires UTC:
@matthewhanson or @m-mohr would greatly appreciate if you could clarify if that is a typo. I'm used to seeing UTC times in metadata, but it's also convenient to have the timezone information somewhere to easily convert to local time without relying on location-based lookups.
yep -> pandas-dev/pandas#41047 . Also might rekindle discussion in xarray for handling timezones since I know there is current effort underway for more flexible indexing, and i feel like passing a pandas index with timezone information should work. cc @dcherian if you have pointers to relevant discussion here. |
I'm not sure myself. I'd suggest opening an issue in stac-spec with a use case description so that we can discuss there. I could imagine that this is a legacy thing that we still want to change, although start_datetime and end_datetime also require UTC. On the other hand, created and updated don't require UTC. |
Duh! Totally missed that, thanks for the careful read. Let's see what comes up on radiantearth/stac-spec#1095, but for now I'd vote for not assuming dates are always UTC, and doing the |
Sounds good. I can start a PR. What's your preference for tests @gjoseph92 ?, It'd be nice to put in a simple framework, if you're ok for it I could copy what we've been using over in intake-stac https://github.com/intake/intake-stac/blob/main/.github/workflows/main.yaml |
Tests are my biggest to-do right now, since IMO it's the main thing blocking contribution. I have a lot of thoughts on how I'd like this to look in #26. I'd prefer to not copy in the intake-stac yaml, since it seems to be conda-oriented to me, and I'd like to keep this only using poetry. I'd say, if you want to write a pytest-friendly test for this case and put it in |
Noticed this in https://gist.github.com/rmg55/b144cb273d9ccfdf979e9843fdf5e651, and I've had it happen before myself:
Pretty sure stackstac is correctly making it into a pandas DatetimeIndex:
stackstac/stackstac/prepare.py
Lines 354 to 358 in b652a07
but something is going weird when xarray receives that, and it reinterprets it as an
object
array.The text was updated successfully, but these errors were encountered: