Handle the case where the coordinator is replaced with a new host #2

gotascii · 2017-10-26T23:39:24Z

If the group coordinator is replaced with a different host, but the broker id remains the same, the client will go into and endless reconnection loop. This PR refreshes the cluster data if there is a ConnectionError when joining a group. The issue is reproducible by following these steps:

Start up a cluster with 3 nodes.
Publish some messages to a topic.
Connect to the topic and start an each_message loop.
A broker, say #0 for example, becomes memoized in @coordinator in ConsumerGroup.
Stop the each_message loop but do not exit the process.
Kill broker 0 and bring back a new host with a different ip as broker 0.
With the same consumer instance, run the each_message loop again.

When the above steps are taken:

ConsumerGroup#join is called.
Then coordinator.join_group on ConsumerGroup L:117 fails with ConnectionError.
ConsumerGroup#join sets @coordinator = nil.
Cluster#get_group_coordinator asks a broker for the broker id of the coordinator which is 0.
connect_to_broker pulls cached info for id 0 (i.e. the old IP).
Then coordinator.join_group on ConsumerGroup L:117 fails with ConnectionError restarting the loop.

Seeing as the retry for a ConnectionError is guarded by a sleep 1 I'm hoping this is a pretty safe place to refresh metadata.

Justin Marney added 3 commits October 26, 2017 16:37

Handle the case where the coordinator is replaced with a new host

9b6e9d9

Merge branch 'master' into jm-handle-replaced-coordinator

3f82e61

Fix missing !

e5eb76f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle the case where the coordinator is replaced with a new host #2

Handle the case where the coordinator is replaced with a new host #2

gotascii commented Oct 26, 2017 •

edited

Loading

Handle the case where the coordinator is replaced with a new host #2

Are you sure you want to change the base?

Handle the case where the coordinator is replaced with a new host #2

Conversation

gotascii commented Oct 26, 2017 • edited Loading

gotascii commented Oct 26, 2017 •

edited

Loading