-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: cudaStreamWaitEvent #1139
Comments
API Proposal
@m4rs-mt what do you think? |
Alternate API Proposal API Proposal
This removes the ability to re-record the event. |
Alternate API Proposal 2 Instead of exposing a new object, introduce an Not sure how this will affect the lifetime of the native handles. |
Both your API proposal and your first Alternate API Proposal seem reasonable. The first more closely mimics the underlying API, while the second strikes me as harder to screw up. The "screw up" I'm thinking of is that when an event is created, I believe the only reasonable action to take with that event is to call RecordEvent() on it (that may, in fact, be the only action that doesn't produce an error). It thus makes some sense to combine creation with the call to RecordEvent(); but I'd be hesitant to make impossible something that's possible in the underlying API, even if most use cases are invalid. Here's the "screw up" I'm imagining. Imagine a scenario with 8 streams, each of which has three tasks to complete, in order: A, B and C. No stream can start C until every stream's A is done. It's natural to want to write code like this, but it fails:
The correct code would look like this for your first API proposal:
The second proposal (Alternate API proposal) would simply replace two lines with I don't think I understand the third proposal (Alternate API proposal 2) well enough to know how it would solve this problem. |
Thanks for the feedback @Ruberik. The first alternate proposal was to hide the concept of re-recording an event. Cuda supports it, but I'm not sure about OpenCL. The second alternate proposal was to hide the concept of an In your example, it would look more like:
|
@MoFtZ, I think that does something different from my code. In my example, a stream finishes A, then works on B, and only then must wait until other streams are done with A. In your code here, nobody can start B until everybody's done A. BTW, I think there's a problem with that code. Don't you need to enqueue all the As before you can have them wait on each other, like this?
You need three loops here, because you have to have enqueued A, but not B or C, when you call WaitForStream. |
Yes, you are correct. Perhaps that proposal is too simplified. It would need to change the meaning from "wait for all Task A to finish before starting C" to "synchronise the start of all Task C". |
As my teammates and I move to multiple, big GPUs with many streams doing interdependent work simultaneously, synchronizing at the GPU level is introducing significant amounts of inefficiency. We'd like to be able to use cudaStreamWaitEvent to make a stream wait on an event. There's already RecordEvent, CreateEvent, DestroyEvent and QueryEvent, so I'm hoping this is a straightforward addition.
The text was updated successfully, but these errors were encountered: