-
Hello, I was looking into edge cases with how I store my IPersistentState and I run into an odd implementation behavior. I am concerned about ... but the code does not consider the append or commit operation to fail! In a real world scenario it is quite possible that the underlying storage infrastructure is not able to handle the write operation just at that moment. Normally this should not be a problem. If an exception is thrown or a false is returned or a count of 0 logs was appended then the response to the appendentries should be It only does (RaftCluster.cs L: 431ff) await auditTrail.AppendAndCommitAsync(entries, prevLogIndex + 1L, true, commitIndex, token).ConfigureAwait(false);
result = true; Which assumes that both append and commit will succeed. If an exception is thrown in this context it will go up through the caller logic and be interpreted as "unable to deliver heartbeat" and the follower is marked as unavailable. You could consider this behavior as intended, but I think that is really not how it should work. In my test scenario my persistent state "drops" any incoming append or commit entries on followers. Thus So what can happen is:
I am not sure what the formal flow is in the raft algorithm when it comes to this. The paper imagines a world of without flaws and does not consider the log to not be available for operations. I wonder how this was dealt with for the existing file storage implementation. You could await the access to the resource BUT if that takes longer than the request timeout the leader will consider his request failed, because he got no answer in time and the follower is maked as unavailable. The only way around this would be to increase the timeout to such a high amount to compensate for the worst case log access times... but this feels like a flaw. Why not simply respond to the append entries rpc, thus the leader knows the follower is sill there but say "hey, I could not append any entries, you will need to try this again later". So my proposal is to update the code to something like this (and other places where this logic might be) long count = await auditTrail.AppendAndCommitAsync(entries, prevLogIndex + 1L, true, commitIndex, token).ConfigureAwait(false);
if(count > 0L) result = true; The default fall through there would be Edit: The behavior would be that the leader has to a) deliver the appendentries rpc AND b) the rpc must be successfully processed. Instead of listening for the count, it might also make sense to wrap This should give fine grain control over when a log entry really is appended and when a log entry really is committed. Let me know what you think about this :) Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 2 replies
-
If underlying storage device is inaccessible then it means that the entire node should be considered as unavailable (remember that the follower is able to process read requests). Obviously, the leader should mark this member as unavailable. Commit process as well as appending uncommitted log entries require disk I/O. If Btw, |
Beta Was this translation helpful? Give feedback.
-
Returning false forces the leader to decrement |
Beta Was this translation helpful? Give feedback.
-
I can see how that makes sense yes, but I still think its sub-optimal. Membership availability is reduced to "are members able to write to their raft log" instead of "are they there". It should not matter if they can write, saying "yeah i got your request but did nothing" should still result in another round of replication, and the leader not considering the successful write for a majority. I think the other replication logic could still work. A follower can of course still answer read requests. Any existing log entries that might be in a memory cache and do not need to be loaded from the io storage can be answered. Only if that would also fail they would return no log entries or throw an exception indicating that any read request fails. Throwing exceptions blows up the entire http middleware as well. They are not caught for the It would be a shame having to add your own availability for cluster members on top of the information of the raft cluster, just because their log is unavailable for a brief moment. Usually followers can do more than just maintain their log. For me they are also task workers, and the log is just their way to communicate tasks and results to ensure that everybody is aware of the overall cluster state. But judging from your comments a use case like this does not appear to be in scope of this library. I guess I was expecting it to allow me more control than it does. I saw all the return counts and how I had control over what is committed and what is not via |
Beta Was this translation helpful? Give feedback.
-
But it will work. The leader advances its state machine if the majority of nodes commits the entries. Generally, in Raft the follower may respond only with two responses: true and false. true marks the follower as replicated and increases |
Beta Was this translation helpful? Give feedback.
-
In case of HTTP transport, you can use custom middleware to handle the specific type of exception and convert it to |
Beta Was this translation helpful? Give feedback.
But it will work. The leader advances its state machine if the majority of nodes commits the entries.
Generally, in Raft the follower may respond only with two responses: true and false. true marks the follower as replicated and increases
NextIndex
while false indicates that the assumption about the state of the follower's log is wrong andNextIndex
must be decrement and replication must be repeated on the next iteration of heartbeats. There is no third option. To be more precise, the third option is an unavailable member because it cannot respond with true nor false. What if the majority could not advance their position due to busy …