You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current Context implementation in context.go populates the "variable namespace" with identifiers like .Service.ID, .Node.ID etc which are locally-scoped, e.g. they describe service, task or local node. There is no information about peering nodes placing service tasks.
If the global service based on Raft consensus is run on multiple nodes, they (potentially) depend on the proper peer configuration. You know these matters better than me, so correct me when I'm wrong:)
Use case: HashiCorp Vault (or its open source fork OpenBao) running with Raft integrated storage as a global service.
Cluster formation. The retry_join should list all possible sources for cluster initialization (read: all nodes of swarm except the current one).
Cluster node fail. The permanently failed peer has to be removed manually.
Frankly, the above node fail is a spherical vacuum one. I broke bad my 3-node Vault in a way that remaining follower FSMs dead-locked in leader election and API was unable to respond. Another example is offline recovery of Vault cluster using one remaining replica (TLDR: Vault peers are persisted in DB, but the peers list may be overridden in external JSON thus initiating recovery, followed by JSON deletion. Nice but manual).
IMO 21st century requires automation:)
We can of course create an automation that will subscribe to node state changes but it will be great to have like all-nodes list in the config template with ability to iterate (and optionally exclude the current one). Ideally, this list should contain Node IDs that match service placement constraints.
Question: is the above scenario when Context has peering nodes awareness something that aligns with general design?
Or is it considered anti-pattern clearly requiring external (off-swarm) automation?
I want to understand if it worth further R&D or not.
Many thanks
The text was updated successfully, but these errors were encountered:
Dear team,
Design question.
Current
Context
implementation in context.go populates the "variable namespace" with identifiers like.Service.ID
,.Node.ID
etc which are locally-scoped, e.g. they describe service, task or local node. There is no information about peering nodes placing service tasks.If the global service based on Raft consensus is run on multiple nodes, they (potentially) depend on the proper peer configuration. You know these matters better than me, so correct me when I'm wrong:)
Use case: HashiCorp Vault (or its open source fork OpenBao) running with Raft integrated storage as a global service.
Frankly, the above node fail is a spherical vacuum one. I broke bad my 3-node Vault in a way that remaining follower FSMs dead-locked in leader election and API was unable to respond. Another example is offline recovery of Vault cluster using one remaining replica (TLDR: Vault peers are persisted in DB, but the peers list may be overridden in external JSON thus initiating recovery, followed by JSON deletion. Nice but manual).
IMO 21st century requires automation:)
We can of course create an automation that will subscribe to node state changes but it will be great to have like all-nodes list in the config template with ability to iterate (and optionally exclude the current one). Ideally, this list should contain Node IDs that match service placement constraints.
Question: is the above scenario when
Context
has peering nodes awareness something that aligns with general design?Or is it considered anti-pattern clearly requiring external (off-swarm) automation?
I want to understand if it worth further R&D or not.
Many thanks
The text was updated successfully, but these errors were encountered: