cos

You have two web servers, and one load balancer. Every connection comes into the load balancer, which then decides which of the two web servers to send that connection to; the web server handles it, and the connection is closed. Connections are coming in at a rate of a few hundred a minute, and each of them takes a few seconds to complete, so each web server typically has a few connections open at any given time. The load balancer knows how many connections each server has open.

If the load balancer always picked the server with the fewest current connections, for each new connection (or picked at random if both have the same number), then load would be very evenly balanced - each web server would have the same number of open connections, or one would have 1 more than the other.

However, it may be desireable to avoid sending the same user's connections to different web servers on the same visit to the site. Each user typically makes many connections, seconds or minutes apart. So we change the load balancer algorithm a little bit:

A "place" is a /28 IP range, but if you're not an Internet geek you can get away with just assuming that a "place" is a physical location - a house, an office, a wireless cafe. Multiple people may be browsing from the same place, but the load balancer can't tell the difference.

At first blush, it seems like if you forget any place that hasn't connected in the past 20 minutes, and you don't have a significant percentage of connections coming from the same place (or the same few places), this should still distribute load fairly evening. However, I recently observed a pattern like this:

A much larger number of people than usual visited the site during a half hour period.

Web server #1 saw a sudden spike from about 2.5 connections per second to about 6-7 connections per second, in less than a minute. The high rate continued for about 20 minutes, then sharply dropped back to the normal rate of about 2.5 connections per second.

Web server #2 saw a gradual climb, over the course of about five minutes, from 2.5 conn/s to about 5 conn/s. After 5 more minutes it peaked at around 5.5, then slowly went down, and eventually gradually came down to about 2.5 conn/s.

Over the course of the highest-traffic 20 minutes, Web server #1 received a total of 35% more connections than Web server #2.

Under what circumstances would the load balancing algorithm I describe behave like this?

Assumptions (aka observed facts):
- Connections were coming in from a wide range of places, with no one place accounting for 1% or more

Variables (things which define the "circumstances" under which the algorithm behaves differently):
- Time to complete a connection can vary between under 1 second and as many as 30 seconds.
- Time to complete a connection could partially depend on number of current connections
- Distribution of places that make few connections vs. places that make more connections can vary widely. Maybe every place that connects connects 100-400 times; or maybe 50% connect just once or twice each, while the other 50% connect many times each.

Threaded | Top-Level Comments Only

From:

tisiphone.livejournal.com

That's easy - Slashdot's front page refreshed :)

(Or someone died and New York Times tossed about a million or so breaking news alerts, or something else which generated a brief burst of interest in a particular site.)

Unrelated question: does the overhead of remembering the previous load outweigh the performance benefits from it?

From:

cos

Nonono, the question "why were a lot more people than usual accessing the site at that time" is not interesting (and in this case, the exact cause was obvious and unsurprising). The interesting question is about the uneven distribution of load between the two servers.

From:

dossy.livejournal.com

You've just observed the reason why people avoid "sticky session" support in load balancers, combined with the "stampeeding herd" effect.

From:

tisiphone.livejournal.com

Oh... because load balancers with conditions don't work properly to actually balance loads, basically. Consider if you had two different points of interest - but one was more interesting than the other. Say, a breaking news alert about Anna Nicole's death versus a really awesome, but long, new review for something shiny and technophilic (please excuse my lousy metaphorical setup, I'm really, really exhausted). In this case, Anna Nicole kicking it is likely to generate a spike of general interest, which will result in the Server #1 scenario - a high spike, with traffic remaining high. The second would result in the Server #2 scenario - steady, but less high, traffic (evened out by the fact that there are words of more than one syllable on the page, leading to lower numbers of page refreshes because things actually have to be read). If traffic is referred to one server if it has handled a request for the same site within the last 20 minutes, it's obvious that two separate events, one not as widespread as the other, would create an uneven load balance.

From:

cos

That's still not the question I'm posing, which is a *math* question not a sysadmin question. However, I do want to try to understand what you're saying - I'm not sure that I do (though I understand enough to know it's addressing a different question :).

You seem to be saying that two different kinds of traffic would somehow be sent to different servers, and I don't understand why you suggest that. Load balancer doesn't care what people are looking at. If there's an event with sudden spikey interest, it would still get balanced among the two servers, using the same algorithm. Some people would read about Anna Nicole from server #1, some would read about it from #2 (both serve identical content, of course - that's the whole point of a load balancer). So I don't understand how your last statement follows from the existence of different kinds of events that you describe.

From:

dossy.livejournal.com

I'm guessing my answer didn't help you understand what you were looking at. OK, let me explain:

Server A has no active sessions. Server B is currently busy completing responses for 10 ongoing requests.

All of a sudden, the stampeding herd shows up at your door. 5 requests come in all at the same time. Since Server A has less open requests than server B, the load balancer dumps all 5 requests onto Server A at once. Trampled.

Now, with your "sticky session" badness, all 5 of those requests, coming from 5 distinct /28's, make a second request. At the same time. Regardless of whether Server B completed responding to its 10 responses or not, the load balancer dumps those 5 new requests onto Server A. Trampled.

The moral of this story is: Friends don't let friends use "sticky session" load balancing. It's bad, mmmkay?

From:

477150n.livejournal.com

This makes sense regarding the near-instant slamming of Server A. It also seems to explain the long decay time on Server B. If a user spends about some amount of time tying up Server A, then leaves, a new request not from that person will be sent to Server B instead.

Thus, if the number of hits is large enough, the duration of the transient on Server B is a measure of the average stay. The model I have in mind is decay, for instance from a radioactive source. You may have encountered this problem in a calculus class. Is the average stay about five minutes? I'm guessing a little less, because the actual average stay would be the time at which the value is 1/e of the way there.

It's complicated by the fact that some new sessions go onto Server B which don't represent sessions leaving Server A. It's my bedtime now, but I'll try to come up with a specific mathematical model in the morning. As my theorist friends say, "It's an interesting problem..."

From:

cos

... I am interested in your line of thinking. It comes closest of any response so far to addressing what I'm thinking about. I hope you wake up and have more to say!

From:

cos

or to reword what I'm asking:

n this case, Anna Nicole kicking it is likely to generate a spike of general interest, which will result in the Server #1 scenario - a high spike, with traffic remaining high.

True, but that high spike, with traffic remaining high, would be distributed among both servers, because it's traffic from a large number of new locations most of which have not visitied in the past 20 minutes. Theoretically, you could then get different behavior from the set that got sent to #1 vs. the set that got sent to #2. Probablistically, though, that'd be weird - there's no reason I can think of for such a bias to be likely; usually, people sent to #1 and #2 should show the same average behavior afterward.

From:

cos

(bleh, that comment was sloppily written, trying again)

That still doesn't try to answer the math question, though it suggests a scenario that could be one piece in a larger answer. Now:

1. How would we get into a situation where A has 0 connections and B has 10?

2. With requests coming in at rates as slow as I described, you're never going to get the kind of "slamming" you describe here. It's true that if A has 0 and B has 10, the next 5 requests will go to A as long as B hasn't finished any of those 10. But if we set with A:0, B:2, and 5 new connections come in, the first 2 will go to A, and the next 3 will be split up, such that we'll end with 4 or 5 on each server.

So, the scenario you describe depends on an extreme imbalance at its beginning, to create conditions that *might* cause an imbalance almost (but not quite as) extreme a little later on. Your scenario suggests one explanation for why, if we have an imbalance, it won't get rebalanced right away. That much I actually found intuitive about this problem. What it doesn't address is how an imbalance begins, and takes as strange a form as what I observed.

P.S. The moral of the story is totally irrelevant here. Really, I mean it. I'm not looking for a discussion of what other load balancing algorithm is better or worse for some application. I posted this post because I found the consequences of this particular algorithm strange, and am curious about the math that could explain it. I'm looking to explore how this algorithm works, not to judge it against others.

From:

jes5199.livejournal.com

I don't have an answer, but I have also seen this effect on a farm of webservers. so I'm interested.

From:

tisiphone.livejournal.com

OK, I think I was misreading your question then - it appeared you were balancing from the server side, not the client side. Thanks for the further explanation.

From:

merinslips.livejournal.com

It sounds right that it has something to do with the time a given place is spending looking at the site.

Is that information available as well? Also, in general, how could I see information like where connections are coming from? I remember diaryland.com how a tool that let you see who was looking at your page and for how long. Is there something like that for livejournal or for any site at all, for anyone?

From:

dossy.livejournal.com

You explained that requests complete with varying durations. Is it possible that both server A and server B were each busy responding to 10 requests each (fair load balancing), server A completed more requests than server B at some point in time, then before server B finished responding to its requests, the stampeding herd arrived?

Unlikely scenario? Sure. It's like centipedes in your ... well ... anyway, it's more likely than you think. :-)

(*waves to

merinslips ... small world! :-)

From:

points.livejournal.com

It sounds like #2 got spidered, so a single location made many, many requests, possibly in parallel, over a smaller period of time. Given this, the assigned webserver (#2) will start having a consistently higher load, forcing more 'new' locations over to server #1. Since the spider continues to slam #2, connections will keep being assigned to #1 the larger percentage of the time. That seems to fit the pattern. For the odd starting behavior, you may have been at the tail end of a spider that started the initial load, that bounced the second spider to server #2, as #1 was starting to tail off.

From:

blimix.livejournal.com

If the load balancer always picked the server with the fewest current connections, for each new connection (or picked at random if both have the same number), then load would be very evenly balanced - each web server would have the same number of open connections, or one would have 1 more than the other.

Not precisely true. Either server could finish up a few connections before the other, and before more requests come in, thus creating a temporary imbalance.

As for the uneven distribution, you mentioned that web server #1 received 35% more connections, but you don't say whether there was ever an actual imbalance in the number of open connections. This leads me to speculate that web server #2 took longer, on average, to close its connections, leaving the two servers with the same average number of open connections at any given moment. Possible causes: Maybe the servers run at different speeds (for hardware or firmware reasons); maybe another program or a glitch slowed server #2; maybe one or two users on server #2 kept it busy with processor-intensive requests.

Alternately, your randomizer is broken. But that doesn't account for the different ways in which the rates changed.

From:

struct.livejournal.com

On first blush I'd say that's your huckleberry. That hypothesis accounts for both your spike on server #1 (spider slams your site for ~20 mins) and your increased load on server #2 (concurrent requests from one "place" stick to S1, causing incoming reqs from other "places" to shunt over to S2).

However, I've been burned plenty of times when I've surmised that the clever/elegant answer is the correct one... can you rule out the "stupid"/"obvious" explanation of faulty hardware or misconfigured software on S2 causing an intermittent or transient connect-time latency/lag? In other words, let's say a wumpus rears its ugly head on S2, causing it to take forever to establish connections. Impatient users start whamming on the reload/refresh button, making your S2 load steadily climb. Since S1 can handle traffic at a "healthy" rate, its load spikes. Over the next twenty minutes the S2 wumpus goes away, and everything eventually gets back to happy.

That second hypothesis does not account for why S1's load precipitously drops instead of tapering off, so I'd tend to dismiss it... but that could very well be where I'd get burnt. Perhaps hypothesis #2 is correct but incomplete -- maybe I'm missing something that could account for "phase transition" type behavior in S1.

At any rate, apropos of nothing: networking's one of my weaker subjects, so forgive me if this is a stupid question, but why is a /28 mask defined as a "place" and not a /24?

From:

struct.livejournal.com

Never mind about the "place" def question, I figured it out. For some reason I had a brain fart when I saw /28 and misinterpreted that to mean a range of +/- 127 on the last tuple of a given address, so I figured if you're going to go that wide, why not just allow x.x.x.* -- but of course /28 only allows a +/- 15 range, which makes a lot more sense.

From:

struct.livejournal.com

This must be an interesting problem, because I'm still thinking about it after 1.5 days... I want to simulate a few scenarios to see if I can generate some pretty looking charts, but first I want to make sure I'm understanding your algorithm correctly. Is the following pseudocode accurate?

/* 'place' is a given range of 16 IP addresses 'lastConnect' is the last time a 'place' made a connection 'time' is the current time 'load0' is the current load on server 0 'load1' is the current load on server 1 Returns 0 or 1 corresponding to one of two webservers */ chooseServer( place ) { lastConnect = lookup_last_connect_time( place ); if ( time - lastConnect < 20 minutes ) return lookup_last_server_connected( place ); else if ( load0 == load1 ) return random( 0 or 1 ); else return ( load0 < load1 )? 0 : 1; }

Threaded | Top-Level Comments Only

Profile

cos

CosWeb

February 2025

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Page Summary

Style Credit

Base style: Tectonic by daemon @ LJ and momijizukamori
Theme: City Dragon by momijizukamori

Expand Cut Tags

No cut tags

Page generated Mar. 16th, 2026 19:08

Practical Puzzle for Math Geeks

Navigation

Practical Puzzle for Math Geeks

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Profile

February 2025

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags