Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emscripten_fetch returns id 0 if not used with EMSCRIPTEN_FETCH_REPLACE #23124

Open
slowriot opened this issue Dec 10, 2024 · 6 comments
Open

Comments

@slowriot
Copy link
Contributor

slowriot commented Dec 10, 2024

Tested with version 3.1.73.

When calling emscripten_fetch with any combination of attributes, but without using EMSCRIPTEN_FETCH_REPLACE, the emscripten_fetch_t struct it returns always has the id member set to 0. With EMSCRIPTEN_FETCH_REPLACE, these increment normally.

Callbacks receive the correct id in all cases, just the struct initially returned by emscripten_fetch is wrong.

Minimal example to reproduce:

  emscripten_fetch_attr_t attr;
  emscripten_fetch_attr_init(&attr);
  std::strcpy(attr.requestMethod, "GET");

  attr.attributes = EMSCRIPTEN_FETCH_LOAD_TO_MEMORY | EMSCRIPTEN_FETCH_REPLACE;
  for(unsigned int i{0}; i != 10; ++i} {
    emscripten_fetch_t *fetch{emscripten_fetch(&attr, "example.com")};
    std::cout << "id: " << fetch->id << std::endl;  // incrementing normally: 1, 2, 3, ...
  }

  attr.attributes = EMSCRIPTEN_FETCH_LOAD_TO_MEMORY;
  for(unsigned int i{0}; i != 10; ++i} {
    emscripten_fetch_t *fetch{emscripten_fetch(&attr, "example.com")};
    std::cout << "id: " << fetch->id << std::endl;  // always 0
  }
@slowriot
Copy link
Contributor Author

Looking at the logic flow in Fetch.js, we have:

emscripten/src/Fetch.js

Lines 559 to 562 in d5c419f

} else if (!fetchAttrReplace) {
fetchLoadCachedData(Fetch.dbInstance, fetch, reportSuccess, fetchAttrNoDownload ? reportError : (fetchAttrPersistFile ? performCachedXhr : performUncachedXhr));
} else if (!fetchAttrNoDownload) {
fetchXHR(fetch, fetchAttrPersistFile ? cacheResultAndReportSuccess : reportSuccess, reportError, reportProgress, reportReadyStateChange);

In the case that EMSCRIPTEN_FETCH_REPLACE is set, we go straight to fetchXHR, which then sets id:

var id = Fetch.xhrs.allocate(xhr);

The trouble is that if EMSCRIPTEN_FETCH_REPLACE is not set, and we call fetchLoadCachedData, then we end up in the getRequest.onsuccess callback in the second branch:

emscripten/src/Fetch.js

Lines 165 to 174 in d5c419f

} else {
// Succeeded to load, but the load came back with the value of undefined, treat that as an error since we never store undefined in db.
#if FETCH_DEBUG
dbg(`fetch: File ${pathStr} not found in IndexedDB`);
#endif
{{{ makeSetValue('fetch', C_STRUCTS.emscripten_fetch_t.readyState, 4, 'i16') }}} // Mimic XHR readyState 4 === 'DONE: The operation is complete'
{{{ makeSetValue('fetch', C_STRUCTS.emscripten_fetch_t.status, 404, 'i16') }}} // Mimic XHR HTTP status code 404 "Not Found"
stringToUTF8("Not Found", fetch + {{{ C_STRUCTS.emscripten_fetch_t.statusText }}}, 64);
onerror(fetch, 0, 'no data');
}

Only at this point do we call the onerror callback, which calls either performCachedXhr or performUncachedXhr. But this happens async, so manwhile the fetchLoadCachedData function has returned without setting id to any value.

I think it might make more sense, if there's a possibility that an xhr will be made, to allocate an id anyway with Fetch.xhrs.allocate, so this can be returned immediately.

@slowriot
Copy link
Contributor Author

Note if I set -sFETCH_SUPPORT_INDEXEDDB=0, we don't see this problem, as we go straight to the XHRs without trying to query the db async, so we always non-zero ids in that case.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 10, 2024

Yes, it looks like id tracking is only for outstanding XHR request, not from IDB requests. I don't know why that is.

I'm also not sure why we need an ID at all since the address of the emscripten_fetch_t would make a perfectly good unique id I think. I guess its too late to change that.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 10, 2024

BTW, as the user of the API, why do you care about the ID that is stored in the struct? Don't all the callbakc functions take a unique emscripten_fetch_t pointer?

@slowriot
Copy link
Contributor Author

BTW, as the user of the API, why do you care about the ID that is stored in the struct? Don't all the callbakc functions take a unique emscripten_fetch_t pointer?

The use case for the ID for me is to match up multiple concurrent fetches between their invocation and subsequent callbacks without having to allocate and manage individual userdata for each call. A typical use is as a key in a map, representing the state of multiple ongoing downloads.

You make an interesting point, though - if the address of the emscripten_fetch_t is unique for each call and doesn't change from initial invocation through to the callbacks, then this address could be used as the key instead. This would need to be guaranteed though, and the documentation doesn't make any such guarantees at present - if this is indeed usable as a key, perhaps it's worth updating the docs to reflect this fact. In that case it would render the id partially redundant - but not entirely.

The other aspect of the id member is that it guarantees each id is unique, so it's possible to use it as a key to keep track of all fetches, past and present. Using the address of emscripten_fetch_t would lose that property presumably, as new invocations could reuse previously freed addresses, and you don't get steady increment. For these reasons, I would advocate for making the id usable in all scenarios, as it provides the most ergonomic and intuitive way to track multiple fetches over the life of the program.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 11, 2024

You are correct that that address of the emscripten_fetch_t is only unique for the lifetime of the fetch and its possible that the same memory region could be re-used later.

I would not be averse to patch to make the id allocation earlier in the call stack, but if you want a quick fix that works today I would just go with using the pointer as the key in your map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants