You have two web servers, and one load balancer. Every connection comes into the load balancer, which then decides which of the two web servers to send that connection to; the web server handles it, and the connection is closed. Connections are coming in at a rate of a few hundred a minute, and each of them takes a few seconds to complete, so each web server typically has a few connections open at any given time. The load balancer knows how many connections each server has open.
If the load balancer always picked the server with the fewest current connections, for each new connection (or picked at random if both have the same number), then load would be very evenly balanced - each web server would have the same number of open connections, or one would have 1 more than the other.
However, it may be desireable to avoid sending the same user's connections to different web servers on the same visit to the site. Each user typically makes many connections, seconds or minutes apart. So we change the load balancer algorithm a little bit:
At first blush, it seems like if you forget any place that hasn't connected in the past 20 minutes, and you don't have a significant percentage of connections coming from the same place (or the same few places), this should still distribute load fairly evening. However, I recently observed a pattern like this:
Assumptions (aka observed facts):
- Connections were coming in from a wide range of places, with no one place accounting for 1% or more
Variables (things which define the "circumstances" under which the algorithm behaves differently):
- Time to complete a connection can vary between under 1 second and as many as 30 seconds.
- Time to complete a connection could partially depend on number of current connections
- Distribution of places that make few connections vs. places that make more connections can vary widely. Maybe every place that connects connects 100-400 times; or maybe 50% connect just once or twice each, while the other 50% connect many times each.
If the load balancer always picked the server with the fewest current connections, for each new connection (or picked at random if both have the same number), then load would be very evenly balanced - each web server would have the same number of open connections, or one would have 1 more than the other.
However, it may be desireable to avoid sending the same user's connections to different web servers on the same visit to the site. Each user typically makes many connections, seconds or minutes apart. So we change the load balancer algorithm a little bit:
- When a connection comes in from a "new" place, pick a web server as before: either the one with the fewest connections right now, or randomly if they both have the same number.
Remember where that connection came from, and which server got selected.
If a connection comes on from a place that already has a web server picked for it, send it to that same web server.
Forget the association between a place and a web server if no connections have come from that place in the past 20 minutes.
At first blush, it seems like if you forget any place that hasn't connected in the past 20 minutes, and you don't have a significant percentage of connections coming from the same place (or the same few places), this should still distribute load fairly evening. However, I recently observed a pattern like this:
- A much larger number of people than usual visited the site during a half hour period.
- Web server #1 saw a sudden spike from about 2.5 connections per second to about 6-7 connections per second, in less than a minute. The high rate continued for about 20 minutes, then sharply dropped back to the normal rate of about 2.5 connections per second.
- Web server #2 saw a gradual climb, over the course of about five minutes, from 2.5 conn/s to about 5 conn/s. After 5 more minutes it peaked at around 5.5, then slowly went down, and eventually gradually came down to about 2.5 conn/s.
- Over the course of the highest-traffic 20 minutes, Web server #1 received a total of 35% more connections than Web server #2.
Assumptions (aka observed facts):
- Connections were coming in from a wide range of places, with no one place accounting for 1% or more
Variables (things which define the "circumstances" under which the algorithm behaves differently):
- Time to complete a connection can vary between under 1 second and as many as 30 seconds.
- Time to complete a connection could partially depend on number of current connections
- Distribution of places that make few connections vs. places that make more connections can vary widely. Maybe every place that connects connects 100-400 times; or maybe 50% connect just once or twice each, while the other 50% connect many times each.
no subject
Server A has no active sessions. Server B is currently busy completing responses for 10 ongoing requests.
All of a sudden, the stampeding herd shows up at your door. 5 requests come in all at the same time. Since Server A has less open requests than server B, the load balancer dumps all 5 requests onto Server A at once. Trampled.
Now, with your "sticky session" badness, all 5 of those requests, coming from 5 distinct /28's, make a second request. At the same time. Regardless of whether Server B completed responding to its 10 responses or not, the load balancer dumps those 5 new requests onto Server A. Trampled.
The moral of this story is: Friends don't let friends use "sticky session" load balancing. It's bad, mmmkay?
no subject
Thus, if the number of hits is large enough, the duration of the transient on Server B is a measure of the average stay. The model I have in mind is decay, for instance from a radioactive source. You may have encountered this problem in a calculus class. Is the average stay about five minutes? I'm guessing a little less, because the actual average stay would be the time at which the value is 1/e of the way there.
It's complicated by the fact that some new sessions go onto Server B which don't represent sessions leaving Server A. It's my bedtime now, but I'll try to come up with a specific mathematical model in the morning. As my theorist friends say, "It's an interesting problem..."
no subject
no subject
Is that information available as well? Also, in general, how could I see information like where connections are coming from? I remember diaryland.com how a tool that let you see who was looking at your page and for how long. Is there something like that for livejournal or for any site at all, for anyone?
no subject
That still doesn't try to answer the math question, though it suggests a scenario that could be one piece in a larger answer. Now:
1. How would we get into a situation where A has 0 connections and B has 10?
2. With requests coming in at rates as slow as I described, you're never going to get the kind of "slamming" you describe here. It's true that if A has 0 and B has 10, the next 5 requests will go to A as long as B hasn't finished any of those 10. But if we set with A:0, B:2, and 5 new connections come in, the first 2 will go to A, and the next 3 will be split up, such that we'll end with 4 or 5 on each server.
So, the scenario you describe depends on an extreme imbalance at its beginning, to create conditions that *might* cause an imbalance almost (but not quite as) extreme a little later on. Your scenario suggests one explanation for why, if we have an imbalance, it won't get rebalanced right away. That much I actually found intuitive about this problem. What it doesn't address is how an imbalance begins, and takes as strange a form as what I observed.
P.S. The moral of the story is totally irrelevant here. Really, I mean it. I'm not looking for a discussion of what other load balancing algorithm is better or worse for some application. I posted this post because I found the consequences of this particular algorithm strange, and am curious about the math that could explain it. I'm looking to explore how this algorithm works, not to judge it against others.
no subject
Unlikely scenario? Sure. It's like centipedes in your ... well ... anyway, it's more likely than you think. :-)
(*waves to