cos: (Default)
[personal profile] cos
You have two web servers, and one load balancer. Every connection comes into the load balancer, which then decides which of the two web servers to send that connection to; the web server handles it, and the connection is closed. Connections are coming in at a rate of a few hundred a minute, and each of them takes a few seconds to complete, so each web server typically has a few connections open at any given time. The load balancer knows how many connections each server has open.

If the load balancer always picked the server with the fewest current connections, for each new connection (or picked at random if both have the same number), then load would be very evenly balanced - each web server would have the same number of open connections, or one would have 1 more than the other.

However, it may be desireable to avoid sending the same user's connections to different web servers on the same visit to the site. Each user typically makes many connections, seconds or minutes apart. So we change the load balancer algorithm a little bit:
    When a connection comes in from a "new" place, pick a web server as before: either the one with the fewest connections right now, or randomly if they both have the same number.

    Remember where that connection came from, and which server got selected.

    If a connection comes on from a place that already has a web server picked for it, send it to that same web server.

    Forget the association between a place and a web server if no connections have come from that place in the past 20 minutes.
A "place" is a /28 IP range, but if you're not an Internet geek you can get away with just assuming that a "place" is a physical location - a house, an office, a wireless cafe. Multiple people may be browsing from the same place, but the load balancer can't tell the difference.

At first blush, it seems like if you forget any place that hasn't connected in the past 20 minutes, and you don't have a significant percentage of connections coming from the same place (or the same few places), this should still distribute load fairly evening. However, I recently observed a pattern like this:
  • A much larger number of people than usual visited the site during a half hour period.

  • Web server #1 saw a sudden spike from about 2.5 connections per second to about 6-7 connections per second, in less than a minute. The high rate continued for about 20 minutes, then sharply dropped back to the normal rate of about 2.5 connections per second.

  • Web server #2 saw a gradual climb, over the course of about five minutes, from 2.5 conn/s to about 5 conn/s. After 5 more minutes it peaked at around 5.5, then slowly went down, and eventually gradually came down to about 2.5 conn/s.

  • Over the course of the highest-traffic 20 minutes, Web server #1 received a total of 35% more connections than Web server #2.
Under what circumstances would the load balancing algorithm I describe behave like this?

Assumptions (aka observed facts):
- Connections were coming in from a wide range of places, with no one place accounting for 1% or more

Variables (things which define the "circumstances" under which the algorithm behaves differently):
- Time to complete a connection can vary between under 1 second and as many as 30 seconds.
- Time to complete a connection could partially depend on number of current connections
- Distribution of places that make few connections vs. places that make more connections can vary widely. Maybe every place that connects connects 100-400 times; or maybe 50% connect just once or twice each, while the other 50% connect many times each.
Date: 2007-05-21 00:45 (UTC)

From: [identity profile] tisiphone.livejournal.com
That's easy - Slashdot's front page refreshed :)

(Or someone died and New York Times tossed about a million or so breaking news alerts, or something else which generated a brief burst of interest in a particular site.)

Unrelated question: does the overhead of remembering the previous load outweigh the performance benefits from it?
Date: 2007-05-21 00:53 (UTC)

From: [identity profile] dossy.livejournal.com
You've just observed the reason why people avoid "sticky session" support in load balancers, combined with the "stampeeding herd" effect.

Date: 2007-05-21 05:40 (UTC)

From: [identity profile] jes5199.livejournal.com
I don't have an answer, but I have also seen this effect on a farm of webservers. so I'm interested.
Date: 2007-05-22 17:53 (UTC)

From: [identity profile] points.livejournal.com
It sounds like #2 got spidered, so a single location made many, many requests, possibly in parallel, over a smaller period of time. Given this, the assigned webserver (#2) will start having a consistently higher load, forcing more 'new' locations over to server #1. Since the spider continues to slam #2, connections will keep being assigned to #1 the larger percentage of the time. That seems to fit the pattern. For the odd starting behavior, you may have been at the tail end of a spider that started the initial load, that bounced the second spider to server #2, as #1 was starting to tail off.
Date: 2007-05-22 19:15 (UTC)

From: [identity profile] blimix.livejournal.com
If the load balancer always picked the server with the fewest current connections, for each new connection (or picked at random if both have the same number), then load would be very evenly balanced - each web server would have the same number of open connections, or one would have 1 more than the other.


Not precisely true. Either server could finish up a few connections before the other, and before more requests come in, thus creating a temporary imbalance.

As for the uneven distribution, you mentioned that web server #1 received 35% more connections, but you don't say whether there was ever an actual imbalance in the number of open connections. This leads me to speculate that web server #2 took longer, on average, to close its connections, leaving the two servers with the same average number of open connections at any given moment. Possible causes: Maybe the servers run at different speeds (for hardware or firmware reasons); maybe another program or a glitch slowed server #2; maybe one or two users on server #2 kept it busy with processor-intensive requests.

Alternately, your randomizer is broken. But that doesn't account for the different ways in which the rates changed.
Date: 2007-05-24 04:02 (UTC)

From: [identity profile] struct.livejournal.com
This must be an interesting problem, because I'm still thinking about it after 1.5 days... I want to simulate a few scenarios to see if I can generate some pretty looking charts, but first I want to make sure I'm understanding your algorithm correctly. Is the following pseudocode accurate?

/*
'place' is a given range of 16 IP addresses
'lastConnect' is the last time a 'place' made a connection
'time' is the current time
'load0' is the current load on server 0
'load1' is the current load on server 1

Returns 0 or 1 corresponding to one of two webservers
*/

chooseServer( place )
{
    lastConnect = lookup_last_connect_time( place );

    if ( time - lastConnect < 20 minutes )
        return lookup_last_server_connected( place );

    else if ( load0 == load1 )
        return random( 0 or 1 );

    else
        return ( load0 < load1 )? 0 : 1;
}

February 2025

S M T W T F S
      1
2345678
91011121314 15
16171819202122
232425262728 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 16th, 2026 22:06
Powered by Dreamwidth Studios