You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is not a bug per se, just something I wanted to bring to your attention -- on the off chance it's unintentional or has potential to cause anybody any grief.
Pooler grows the member pool by batches of init_count, but that is not strictly enforced. I don't necessarily think it should be enforced, but some interesting behavior might arise in certain cases because it's not.
Somebody tries to take a member. That results in add_members_async(10, ...) starting up another batch of 10, and it returns error_no_members to the caller. All good so far, and now:
starting_members = [...10 elements...]
Immediately thereafter (prior to any of the new batch being fully started/added/accepted), another caller tries to take a member. This gets evaluated:
...which equates to max(min(10 - 10, 80), 1), so NumCanAdd = 1. That results in add_members_async(1, ...) starting up another single member, and it returns error_no_members to the caller.
Now we have 11 members being started. This may not be a problem per se, but if (a) it takes long enough for new members to start, and (b) consumers are trying to take a member at a fast enough rate, it can result in a # of starting members that can grow quite large. Worst case scenario (I believe) is that you can have up to max_count - init_count members starting at once.
(a) and (b) is the perfect storm that can potentially overload things.
This may be a non-issue, but I wanted to bring it to your attention. If this ever bites anybody, a fix would be simple...just placing a configurable hard limit on it, i.e. max_starting_count. It could default to something like max(init_count,1) out of the box, and users could raise/lower it as they see fit.
What do you think?
The text was updated successfully, but these errors were encountered:
I've only had a chance to ready your report quickly, but looks to me like you've found a bug and something that could be a problem. Perhaps it would be worth adding the config you suggest.
@seth@oferrigni FWIW, I've been seeing this "member leakage" behavior in our production environment. We a setup where init_count is 1000, and we can see our pool grow like:
1000
2001
3003
4008
5017
The overflow is the number of consumers trying to get a member while pooler is off starting up a new batch of members.
If you're not opposed to my suggestion above, I'll submit a PR...
This is not a bug per se, just something I wanted to bring to your attention -- on the off chance it's unintentional or has potential to cause anybody any grief.
Pooler grows the member pool by batches of init_count, but that is not strictly enforced. I don't necessarily think it should be enforced, but some interesting behavior might arise in certain cases because it's not.
What I mean by not strictly enforced is:
https://github.com/seth/pooler/blob/master/src/pooler.erl#L560
Here's a sample case to illustrate the point...all members are currently in use, but there is room for growth:
init_count = 10
max_count = 100
free_pids = []
in_use_count = 10
starting_members = []
Somebody tries to take a member. That results in
add_members_async(10, ...)
starting up another batch of 10, and it returnserror_no_members
to the caller. All good so far, and now:starting_members = [...10 elements...]
Immediately thereafter (prior to any of the new batch being fully started/added/accepted), another caller tries to take a member. This gets evaluated:
NumToAdd = max(min(InitCount - NonStaleStartingMemberCount, NumCanAdd), 1)
...which equates to
max(min(10 - 10, 80), 1)
, so NumCanAdd = 1. That results inadd_members_async(1, ...)
starting up another single member, and it returnserror_no_members
to the caller.Now we have 11 members being started. This may not be a problem per se, but if (a) it takes long enough for new members to start, and (b) consumers are trying to take a member at a fast enough rate, it can result in a # of starting members that can grow quite large. Worst case scenario (I believe) is that you can have up to
max_count - init_count
members starting at once.(a) and (b) is the perfect storm that can potentially overload things.
This may be a non-issue, but I wanted to bring it to your attention. If this ever bites anybody, a fix would be simple...just placing a configurable hard limit on it, i.e.
max_starting_count
. It could default to something likemax(init_count,1)
out of the box, and users could raise/lower it as they see fit.What do you think?
The text was updated successfully, but these errors were encountered: