-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate race conditions and fix tests #3
base: master
Are you sure you want to change the base?
Conversation
Since `poll_one` is a blocking call, when service is stopped it throws with `X::AdHoc` due to -1 returned by `zmq_poll`. Unfortunately, `poll_one` doesn't use `zmq_die` and therefore there is no way to determine if the error is caused by socket closing or not. For this reason this PR relies on `$closer` value to find out if the exception must be rethrows or dropped. PR arnsholt/Net-ZMQ#22 is submitted to replace plain `die` with `zmq_die`. But until then this PR should be ok get things straighten out.
`Cro::ZeroMQ::Source::Impure` and `Cro::ZeroMQ::Source::Pure` both suffer from a race where their polling threads are started too late to actually receive first submitted messages. In particular, this caused t/zeromq-xpub-xsub.t to fail with nearly 100% probability. This commit makes sure that the polling thread is actually ready to receive data before completing a supply.
start { | ||
$poller-ready.keep; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sure I forgot a lot of this codebase, but shouldn't the keep call be closer to the poll_one call? This PR makes the race gap smaller, but it can be even smaller, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would. But here is how I see it. The closest point is right before the call itself. But it would have to be like this:
unless ($already-set) { $poller-ready.keep; $already-set = True; }
I don't feel really good about sticking in a piece of code which is basically nop 99.99999% of time. Sure, moar can optimize it away, but it's not guaranteed.
Another way is to
FIRST $poller-ready.keep;
But the only difference this makes with the PR code is that it would be done right after the loop
which makes too little difference to make sense.
The last option is once
. It could be placed right before poll_one
. But there is risk of immediate closing of supply which would result in last
activated before once
is ever reached – and therefore await
will stuck.
Ultimately, I wonder why such a complex approach is used? Why not simple supply { loop { emit poll_one } }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately, I wonder why such a complex approach is used? Why not simple supply { loop { emit poll_one } }?
I am quite sure it's just that this code was not written with best practices on mind. If you have time and mind to rewrite this piece in a better way (not just patch) with something simpler and more robust - you are welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do it straightforward way, but the thing I missed is that when poll_one
blocks – it blocks the whole supply chain. Combined with non-guaranteed message delivery by zmq in certain scenarios, I currently don't see a completely risk-free solution.
Unfortunately, I only study zmq for I possibly will have tasks for it in the future. It's unlikely I will be able to find a better solution until then. And even then it feels to me that Net::ZMQ
would need to be changed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have no more comments on this one, maybe @jnthn has.
This PR solves two problems:
X::AdHoc
produced byNet::ZMQ::poll_one
because of a closed socket.Cro::ZeroMQ::Source::Impure
andCro::ZeroMQ::Source::Pure
are not always ready to accept messages by the time the first ones are submitted. It was breaking t/zeromq-xpub-xsub.t test.