[DESIGN] swarm config template context: global service: peering Node.ID awareness #3183

zarganum · 2024-10-01T09:40:55Z

Dear team,

Design question.

Current Context implementation in context.go populates the "variable namespace" with identifiers like .Service.ID, .Node.ID etc which are locally-scoped, e.g. they describe service, task or local node. There is no information about peering nodes placing service tasks.

If the global service based on Raft consensus is run on multiple nodes, they (potentially) depend on the proper peer configuration. You know these matters better than me, so correct me when I'm wrong:)

Use case: HashiCorp Vault (or its open source fork OpenBao) running with Raft integrated storage as a global service.

Cluster formation. The retry_join should list all possible sources for cluster initialization (read: all nodes of swarm except the current one).
Cluster node fail. The permanently failed peer has to be removed manually.

Frankly, the above node fail is a spherical vacuum one. I broke bad my 3-node Vault in a way that remaining follower FSMs dead-locked in leader election and API was unable to respond. Another example is offline recovery of Vault cluster using one remaining replica (TLDR: Vault peers are persisted in DB, but the peers list may be overridden in external JSON thus initiating recovery, followed by JSON deletion. Nice but manual).

IMO 21st century requires automation:)

We can of course create an automation that will subscribe to node state changes but it will be great to have like all-nodes list in the config template with ability to iterate (and optionally exclude the current one). Ideally, this list should contain Node IDs that match service placement constraints.

Question: is the above scenario when Context has peering nodes awareness something that aligns with general design?
Or is it considered anti-pattern clearly requiring external (off-swarm) automation?
I want to understand if it worth further R&D or not.

Many thanks

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DESIGN] swarm config template context: global service: peering Node.ID awareness #3183

[DESIGN] swarm config template context: global service: peering Node.ID awareness #3183

zarganum commented Oct 1, 2024

[DESIGN] swarm config template context: global service: peering Node.ID awareness #3183

[DESIGN] swarm config template context: global service: peering Node.ID awareness #3183

Comments

zarganum commented Oct 1, 2024