Backup nodes + httpchk #100

Jaykah · 2014-11-23T22:57:33Z

This patch allows to set stub backends as backup and adds the ability to use the Haproxy httpchk option with custom check ports for those nodes.

Jaykah · 2015-01-08T20:56:47Z

Can we please have this merged?

It has been quite a while, is the project still maintained?

Thanks

clizzin · 2015-02-19T18:52:23Z

Sorry @Jaykah, the project maintainer @igor47 is really busy with other projects for Airbnb these days. We're doing our best to find time to respond to pull requests, and we're sorry contributors are waiting so long. Please bear with us in the meantime.

Jaykah · 2015-04-15T18:18:07Z

Hi,

Any updates on the potential merge?

Thanks!

Jaykah · 2015-05-28T16:40:44Z

@clizzin @igor47 Hi guys, when do you think it will be possible to have this reviewed?

Jaykah · 2015-08-27T16:40:28Z

@clizzin @igor47 Just wanted to bump this, since it is a one-line change that, if merged, would make our lives much easier :) There's not much to review either...

igor47 · 2015-08-27T16:53:04Z

ah sorry about the delay; this could use a rebase.

also, we handle this problem for non-default servers with a server_options argument, which can cover all of the use cases you introduce plus any other arbitrary ones. maybe this PR should also just include a server_options argument for the backup servers section instead?

add backup servers & httpchk

Jaykah · 2015-08-27T19:02:53Z

Rebase done.

The reason it's done without server_options is to assign "backup" labels to individual backup nodes instead of the whole group (if I understand your question correctly).

jolynch · 2015-08-27T20:10:18Z

We're also adding something like this for the weight change as well in #131. Perhaps we should add something like server_options to the backend registrations themselves to allow this kind of metadata to be propagated from nerve? If we're going to do that it might make senseto keep all this frontend specific metadata in one top level key?

Jaykah · 2015-08-27T21:12:39Z

@jolynch Sure, that's a good approach for ZK-listed servers. However, this particular scenario deals with static default servers, so I don't see an overlap there.

jolynch · 2015-08-27T22:01:20Z

@Jaykah The static registrations from the default servers should (afaik) be identical to what nerve is registering as far as Synapse can tell. In BaseWatcher's set_backends method they are just merged all together.

I guess what I'm saying is that we can solve both use cases with a generic "server_options" top level key in backend registration hashes.

Jaykah · 2015-08-27T22:40:48Z

@jolynch

What I am referring to is the "stub" watcher, rather than any backend discovered via Zookeeper/Docker/etc. My use case is a server in Region A that would use backends from Region B only if all local nodes fail. Those backends are marked as "backup" in the Synapse config. At the same time, region B's backends are primary for their own region (so they won't be marked as "backup" in Nerve), and are discovered through a different path in Zookeeper.

A better option would be to discover backup nodes from another Zookeeper path, but that's a different discussion (we would still need default servers if ZK fails).

jolynch · 2015-08-28T00:41:16Z

Ah, that's good context. We're experimenting with zone failover by using multiple backends and a frontend acl. We've been relying on the implicit backend name for a bit but I opened #135 to make it explicit so you can reference backends in acls. That being said the HAProxy guys recommend combining multiple backends + acl with backups so we should probably support it. Definitely interested in coming up with a good way to do failover.

I think I'm starting to see your use case, I'm just trying to find the way to address a few similar feature requests. I feel like adding a bunch of HAProxy specific keys (like httpchk) is the wrong way going forward, especially if we ever want to be able to add more outputs than HAProxy/file output. I'm not sure the right abstraction though, what do you think?

Also just FYI you can use the state cache as an additional guard (it's the only guard we use) against zookeeper flakiness.

Jaykah · 2015-08-28T01:24:46Z

We've been relying on the implicit backend name for a bit but I opened #135 to make it explicit so you can reference backends in acls.

So if I understand correctly, in that case we would define two (or more) backends (main and backup) in frontend's Synapse and failover to that backup using ACL? If so, that seems to solve a part of what I am trying to achieve, since your example shows ZK discovery for the backup node, which can be done from a different ZK path.

Are there any limitations to such approach?

That being said the HAProxy guys recommend combining multiple backends + acl with backups so we should probably support it.

That's what I meant by

A better option would be to discover backup nodes from another Zookeeper path, but that's a different discussion (we would still need default servers if ZK fails).

Having the backup option available on a per-node basis for stub watcher is complementary to #135

I feel like adding a bunch of HAProxy specific keys (like httpchk) is the wrong way going forward, especially if we ever want to be able to add more outputs than HAProxy/file output. I'm not sure the right abstraction though, what do you think?

Sure, having more flexibility is always great, but we should not forget that even per Synapse's readme, "The heart of synapse is actually HAProxy, a stable and proven routing component."

Also just FYI you can use the state cache as an additional guard (it's the only guard we use) against zookeeper flakiness.

What cache are you referring to? We are not using Curator, if you are talking about the path cache.

jolynch · 2015-09-07T19:46:34Z

@Jaykah
I think that we need a general way to inform config generators (HAProxy, fileoutput etc ...) of relevant metadata for backends. @bobtfish is working on adding weights in #131 and I think this change and that change need to be generalized.

Are there any limitations to such approach?

From my conversations with the HAProxy folks acls are preferred because they give you more control for when failover occurs, but the blog article I linked to seems to indicate they can be used together. In my opinion I think it strictly dominates backups because I can reason about failover really easily (does the backup backend have any traffic).

Having the backup option available on a per-node basis for stub watcher is complementary to #135

I agree, I'm just trying to figure out a way that we can unify #100, #131 and other similar features. I really do think that the right way to do this is to attach metadata to backends and we can have well specified behavior per output (HAProxy, files, etc ...)

Sure, having more flexibility is always great, but we should not forget that even per Synapse's readme, "The heart of synapse is actually HAProxy, a stable and proven routing component."

I can change the readme, but I believe it is in the best interest of the project to not make additional choices that tie us directly to HAProxy. We've already added file output which has been instrumental to getting things like Cassandra bootstrapping off Smartstack. We had to do some crazy stuff to get HAProxy to not drop traffic during reloads. HAProxy might not be the only solution we offer going forwards.

What cache are you referring to? We are not using Curator, if you are talking about the path cache.

I'm referring to the state_file_path and state_file_ttl options which allow you to cache backends between restarts. That and the restart_interval restart_jitter HAProxy options have helped out a ton for us.

jolynch · 2015-09-07T19:48:53Z

I'm also potentially ok with using a bunch of top level keys as long as we namespace them or somehow make it obvious that httpchk doesn't do anything unless you're using HAProxy (as opposed to file or nginx or w.e. we build in future)

jolynch · 2015-09-07T19:50:49Z

#83 also is relevant

Jaykah · 2015-09-10T23:38:01Z

@jolynch

I agree, I'm just trying to figure out a way that we can unify #100, #131 and other similar features. I really do think that the right way to do this is to attach metadata to backends and we can have well specified behavior per output (HAProxy, files, etc ...)

Sure, if we can select main and backup paths per Synapse (e.g. main path for the same region as Synapse, and then a separate "backup" block with a different ZK path for another region) that would solve the need for the backup option.

I'm also potentially ok with using a bunch of top level keys as long as we namespace them or somehow make it obvious that httpchk doesn't do anything unless you're using HAProxy (as opposed to file or nginx or w.e. we build in future)

Maybe prefix them with "haproxy" ?

haproxy-httpchk : boolean, use haproxy httpchk for this server, requires "option httpchk /" to be set under haproxy options
haproxy-chkport : custom port for httpchk

jolynch · 2015-09-14T00:27:12Z

@Jaykah I think keeping it flat with well named tags can work going forward. Trying to come up with some kind of well defined nesting structure in the service registrations seems like a losing battle to me, but maybe I'm not creative enough.

Let's do this, but let's add some documentation in both nerve and synapse for what backend registrations can actually contain (e.g. right now it's host, port, name, weight?) and what they are. Let's try to keep metadata that could only ever be used for one config generator prefixed with that config generator.

jolynch · 2015-09-14T00:29:39Z

lib/synapse/haproxy.rb

@@ -717,7 +717,7 @@ def generate_backend_stanza(watcher, config)
        config.map {|c| "\t#{c}"},
        backends.keys.shuffle.map {|backend_name|
          backend = backends[backend_name]
-          b = "\tserver #{backend_name} #{backend['host']}:#{backend['port']}"
+          b = "\tserver #{backend_name} #{backend['host']}:#{backend['port']}#{' backup' if backend['backup']}#{' check port ' if backend['httpchk']}#{backend['chkport']}"


Thoughts on having a haproxy_server_options catch all registration info for backend registrations similar to the watcher's server_options?

So like, instead of haproxy_backup and haproxy_httpchk you just have haproxy_server_options which can contain weights, httpcheck lines, ports, backup status, etc ...

Jaykah · 2015-09-14T15:46:25Z

@jolynch The issue with server_options (or haproxy_server_options) for that matter is that those are blanket rules that apply to all backends instead of individually (unless I am missing something).

So in my case it would be:

 services:
  database:
   default_servers:
    - 
     name: "def-db1"
     host: "10.0.100.1"
     port: 3306
     haproxy-httpchk: true
     haproxy-chkport: 7201
    - 
     name: "def-db2"
     host: "10.0.100.2"
     port: 3307
     haproxy-httpchk: true
     haproxy-chkport: 7202

which would allow different check ports per instance (useful for flat infrastructure with the possibility of having two nodes coexist on the same machine).

However if we have haproxy_server_options per backend, that would be fine as well.

jolynch · 2015-09-14T15:53:06Z

@Jaykah, yes I mean in addition to the service backend's server options. So you can either set options per backend or per server.

Sweet, let's do that and I'll merge it.

Jaykah · 2015-10-06T16:34:28Z

#140

Jay J added 3 commits August 27, 2015 18:54

Allow setting backup nodes

e893dff

Allow haproxy httpchk on a custom port

c0c3719

Update README.md

360a7dc

add backup servers & httpchk

Jaykah force-pushed the patch-2 branch from 78e53d0 to 360a7dc Compare August 27, 2015 18:59

Update haproxy.rb

0662410

jolynch mentioned this pull request Sep 7, 2015

Weights in generated config #131

Open

jolynch reviewed Sep 14, 2015
View reviewed changes

jolynch mentioned this pull request Oct 3, 2015

Allow arbitrary haproxy_server_options to be supplied to HAProxy. #140

Merged

Jaykah closed this Oct 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup nodes + httpchk #100

Backup nodes + httpchk #100

Jaykah commented Nov 23, 2014

Jaykah commented Jan 8, 2015

clizzin commented Feb 19, 2015

Jaykah commented Apr 15, 2015

Jaykah commented May 28, 2015

Jaykah commented Aug 27, 2015

igor47 commented Aug 27, 2015

Jaykah commented Aug 27, 2015

jolynch commented Aug 27, 2015

Jaykah commented Aug 27, 2015

jolynch commented Aug 27, 2015

Jaykah commented Aug 27, 2015

jolynch commented Aug 28, 2015

Jaykah commented Aug 28, 2015

jolynch commented Sep 7, 2015

jolynch commented Sep 7, 2015

jolynch commented Sep 7, 2015

Jaykah commented Sep 10, 2015

jolynch commented Sep 14, 2015

jolynch Sep 14, 2015

Jaykah commented Sep 14, 2015

jolynch commented Sep 14, 2015

Jaykah commented Oct 6, 2015

Backup nodes + httpchk #100

Backup nodes + httpchk #100

Conversation

Jaykah commented Nov 23, 2014

Jaykah commented Jan 8, 2015

clizzin commented Feb 19, 2015

Jaykah commented Apr 15, 2015

Jaykah commented May 28, 2015

Jaykah commented Aug 27, 2015

igor47 commented Aug 27, 2015

Jaykah commented Aug 27, 2015

jolynch commented Aug 27, 2015

Jaykah commented Aug 27, 2015

jolynch commented Aug 27, 2015

Jaykah commented Aug 27, 2015

jolynch commented Aug 28, 2015

Jaykah commented Aug 28, 2015

jolynch commented Sep 7, 2015

jolynch commented Sep 7, 2015

jolynch commented Sep 7, 2015

Jaykah commented Sep 10, 2015

jolynch commented Sep 14, 2015

jolynch Sep 14, 2015

Choose a reason for hiding this comment

Jaykah commented Sep 14, 2015

jolynch commented Sep 14, 2015

Jaykah commented Oct 6, 2015