Load Balancing
What is Load Balancing?
Picture a grocery store with 10 checkout lanes. Without someone directing traffic, everyone crowds into lane 1 while lanes 2-10 sit empty. A load balancer is that person who says "lane 3 is open!" and spreads customers across all lanes evenly.
In tech terms, a load balancer sits between clients (users) and servers. When a request comes in, it decides which server should handle it. The goals are simple:
- Spread the load so no single server gets overwhelmed.
- Improve reliability — if a server dies, the load balancer stops sending traffic to it.
- Increase throughput — more servers working in parallel means more total capacity.
Load balancers can sit at multiple levels: between users and web servers, between web servers and application servers, and between application servers and databases.
Load Balancing Algorithms
How does the load balancer decide which server gets the next request? It has several strategies:
Round Robin — The simplest. Send requests to servers in order: Server 1, Server 2, Server 3, Server 1, Server 2, Server 3... Like dealing cards around a table. Works great when all servers are equally powerful and all requests take roughly the same time.
Weighted Round Robin — Same idea, but some servers get more turns. If Server 1 is twice as powerful as Server 2, give it twice as many requests. Like dealing 2 cards to the strong player for every 1 card to the others.
Least Connections — Send the request to whichever server is currently handling the fewest requests. This adapts naturally to slower requests — if Server 1 is stuck processing a big file upload, new requests go to less busy servers instead.
IP Hash — Hash the client's IP address to determine which server to use. The same user always goes to the same server. This is useful for session persistence (more on that later). It uses a technique related to consistent hashing.
Least Response Time — Send to the server with the fastest recent response time AND fewest active connections. The smartest but most complex approach.
Random — Pick a server randomly. Surprisingly effective with large server pools! Statistically, it balances well without any bookkeeping.
Load Balancing Algorithms in Python
Layer 4 vs Layer 7 Load Balancing
Load balancers can work at different levels of the network stack. The two most common are Layer 4 and Layer 7 (from the OSI model).
Layer 4 (Transport Layer) — Makes decisions based on IP addresses and TCP/UDP port numbers. It doesn't look inside the request — it has no idea if you're loading a webpage or uploading a video. It just sees "traffic from IP X on port 443" and routes it. Fast and efficient because it doesn't need to decrypt or parse anything.
Layer 7 (Application Layer) — Makes decisions based on the actual content of the request: URL path, headers, cookies, HTTP method, even the request body. Much smarter routing:
- Send
/api/*requests to the API servers - Send
/images/*requests to the media servers - Route based on cookies for session affinity
- Route based on
Accept-Languageheader for localization
Layer 7 is slower (it must parse the request) but far more flexible. Most modern load balancers like Nginx, HAProxy, and AWS ALB support Layer 7.
Health Checks
A load balancer is useless if it sends traffic to a dead server. That's why load balancers constantly health-check their servers.
There are two types:
- Passive health checks — The load balancer monitors responses. If a server starts returning errors or timing out, it's marked as unhealthy and removed from the pool.
- Active health checks — The load balancer periodically pings each server (e.g.,
GET /healthevery 10 seconds). If a server fails to respond a certain number of times, it's taken out of rotation.
When a failed server recovers, the load balancer detects the next successful health check and adds it back to the pool. This whole process is automatic — no human intervention needed at 3 AM!
Sticky Sessions
Sometimes you need the same user to always go to the same server. Maybe the server stores session data in memory, or you're doing a file upload in chunks. This is called sticky sessions (or session affinity).
The load balancer achieves this by:
- Setting a cookie (e.g.,
SERVERID=server-2) on the first response - Using IP hashing to consistently map users to servers
- Reading a session ID from the request and routing accordingly
The downside? Sticky sessions hurt scalability. If one server gets all the "heavy" users stuck to it, it becomes overloaded while others sit idle. It also makes failover harder — if that server dies, those users lose their session state.
The better solution is usually to make your servers stateless and store session data externally (Redis, database). Then you don't need sticky sessions at all.
Hardware vs Software Load Balancers
Hardware load balancers (like F5, Citrix) are dedicated physical devices. They're extremely fast and can handle millions of connections, but they cost $10,000-$100,000+ and are hard to configure.
Software load balancers (like Nginx, HAProxy, Envoy, AWS ELB/ALB) run on standard servers. They're cheaper, more flexible, and easier to update. Most companies today use software load balancers.
Cloud providers also offer managed load balancers:
- AWS ELB/ALB/NLB — Elastic/Application/Network Load Balancer
- Google Cloud Load Balancing — Global load balancing across regions
- Azure Load Balancer — Microsoft's offering
These managed solutions handle scaling, health checks, and SSL termination for you — one less thing to worry about.