-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add S3 support #160
Add S3 support #160
Conversation
…nda-forge-pinning 2022.12.22.13.58.53
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
Do you need an extra test to check if things are working or the ones we are already running are sufficient? |
No idea. |
I'm tempted to but let's wait for @dopplershift who may have a better idea on the state of the tests. I did see a ZARR S3 clean up test in the logs. Maybe that is enough but I did not looked into that. (On mobile only.) |
I tried building this PR locally and then loading an ABI file from AWS S3, but it doesn't seem to work:
This of course is using the Python netcdf4 library which uses libnetcdf underneath. I'm getting a different error than before this PR which was saying that S3 was not enabled, but now it doesn't seem to be able to complete the request because it is looking for a |
Here's the same thing with ncdump and verbose curl output:
|
Looks for a Also, if all you're doing is accessing a a netcdf4 file in S3, not actually doing any zarr stuff, you compiling against the S3 SDK is unnecessary. netcdf-c has support for http byte-range requests for doing that which...sigh isn't enabled by default. I enabled it in #107, but apparently only for the now removed static builds. Oops. 🐑 (And information on setting this up and using it seems to be difficult to find information about.) To use it:
|
The tests are probably adequate to test that S3 support is working, |
The fact that it is treating it as a DAP request means that the model inference |
In looking at the example above, something is unclear.
Is this a netcdf-4/HDF5 formatted file? |
@DennisHeimbigner yes:
|
@DennisHeimbigner the most recent test by @djhoese was on a build without byte-range support, but with S3 support. Would you expect that to work for this case? |
Yes, I just tested that case. The above URL suffixed with '#mode=bytes' |
Above I said @djhoese tested a build WITHOUT |
Then I guess you can close this issue. |
This PR? What? No. I'm so confused at this point. What is the point of enabling S3 support in netcdf-c? What does it do? Why does it require the AWS C++ library? Does it make no difference for reading data from s3? My assumption is that there would be generally better performance and smarter access to S3 resources. For byte range support, does that have to be enabled for S3 URIs to work at all? Do I have to use I'm pretty sure I've used @dopplershift if byte ranges are not currently enabled then I'd like to include them in this PR. |
@djhoese It is my understanding that the direct S3 support (through the SDK) is only used to implement support for accessing (nc)Zarr data that lives in object storage, due (I think) to the need to access different arbitrary keys. (@DennisHeimbigner @WardF) If you're accessing a single file, I'm not sure there's anything the direct S3 API offers over the byte-range requests. So I think the answer to your question about "do I need to enable byte range support?", the answer is yes. As noted above, the way to do that in this PR is add |
I and others in the pytroll community have definitely been doing benchmarks assuming this functionality was enabled in this feedstock's build. Any idea @dopplershift if netcdf-c will just download the entire file when |
@conda-forge-admin please rerender |
Not sure. As noted above, though, this was previously enabled on the static library in this feedstock, so it's possible it somehow worked that way? |
Hi! This is the friendly automated conda-forge-webservice. I tried to rerender for you, but it looks like there was nothing to do. This message was generated by GitHub actions workflow run https://github.com/conda-forge/libnetcdf-feedstock/actions/runs/3849084639. |
@dopplershift Thanks for tagging me in, I'm traveling and am on very limited bandwidth, but I'm getting caught up on the convo now. |
To summarize/think out loud, it looks like the initial issue was "Why is this S3 request being treated as a DAP request, despite S3 support being compiled in." Enabling The outstanding question is 'What behavior should be expected if Ok, I'm starting to get a better picture. @DennisHeimbigner, the following command is being interpreted as a
The question being asked is whether this is what should be happening, or what should they expect to see? I will dig into this, but do you have any ideas off the top of your head? Also, if I've misunderstood from my initial read-through, please let me know and I'll delete/revise this comment instead of muddying the water.
|
If I use a semi-recent conda-forge build of libnetcdf ( URL + no mode=bytes
URL + mode=bytes
@WardF This doesn't seem to match what you got. Thanks for the other info though and you are on the right track with what I'm trying to understand here. Based on your comment, when is an |
Tests are failing after trying to enable httprange:
|
Ah found Unidata/netcdf-c#2500 |
@djhoese It's my understanding that byterange was enabled in static conda-forge builds; is it possible to tell if 4.8.1 is static or not? If you can find the It will be an interesting datapoint; what I observed was generated by testing against the
|
Oh very interesting, looks like I have some non-static version that also has Byte-Range Support (libnetcdf.settings):
|
Some comments on the above discussion:
|
This all sounds good. Thank you for all the info @DennisHeimbigner. So I think we have two cases that are semi-expected given the above discussion, but I'll need to do more testing:
The current CI in this PR is failing due to the failing byte range test (see related netcdf-c issue mentioned above). Anybody know of a wait to ignore that failed test? Or do we have to wait for upstream fixes and release? |
I couldn't test the package locally because of the failing byterange test so I made a patch to comment it out, rebuilt the package, and installed it into a test environment. But I keep getting |
Can you give some more info about this failure. |
So I built what is currently in this PR, but with an additional pass to disable byte-range tests (because as you know they fail). When I run it locally I get these results:
|
This work has moved to #180 now. |
To possibly tie up one loose end here, @djhoese, are you working on Windows? I am asking because byte-range support in this feedstock was enabled in #107 for static builds on Unix and the Windows builds, but not the dynamic builds on Unix, so if you are/were on Windows, that would explain why your dynamic build had byte-range enabled. In any case, byte-range is now enabled on all builds since #178. |
Nope. Ubuntu/PopOS. |
Replacement of #125 based on 4.9.0 (main).
Checklist
0
(if the version changed)conda-smithy
(Use the phrase@conda-forge-admin, please rerender
in a comment in this PR for automated rerendering)