Fundamentals10 min read

Load Balancing

Sharing the work so no server burns out

scope:Building Blockdifficulty:Beginner-Intermediate

What is Load Balancing?

Picture a grocery store with 10 checkout lanes. Without someone directing traffic, everyone crowds into lane 1 while lanes 2-10 sit empty. A load balancer is that person who says "lane 3 is open!" and spreads customers across all lanes evenly.

In tech terms, a load balancer sits between clients (users) and servers. When a request comes in, it decides which server should handle it. The goals are simple:

Spread the load so no single server gets overwhelmed.
Improve reliability — if a server dies, the load balancer stops sending traffic to it.
Increase throughput — more servers working in parallel means more total capacity.

Load balancers can sit at multiple levels: between users and web servers, between web servers and application servers, and between application servers and databases.

▸ The problem — all traffic hits one server

▸ Add a load balancer to distribute requests

Load Balancing Algorithms

How does the load balancer decide which server gets the next request? It has several strategies:

Round Robin — The simplest. Send requests to servers in order: Server 1, Server 2, Server 3, Server 1, Server 2, Server 3... Like dealing cards around a table. Works great when all servers are equally powerful and all requests take roughly the same time.

Weighted Round Robin — Same idea, but some servers get more turns. If Server 1 is twice as powerful as Server 2, give it twice as many requests. Like dealing 2 cards to the strong player for every 1 card to the others.

Least Connections — Send the request to whichever server is currently handling the fewest requests. This adapts naturally to slower requests — if Server 1 is stuck processing a big file upload, new requests go to less busy servers instead.

IP Hash — Hash the client's IP address to determine which server to use. The same user always goes to the same server. This is useful for session persistence (more on that later). It uses a technique related to consistent hashing.

Least Response Time — Send to the server with the fastest recent response time AND fewest active connections. The smartest but most complex approach.

Random — Pick a server randomly. Surprisingly effective with large server pools! Statistically, it balances well without any bookkeeping.

▸ Round robin — requests go in order

▸ Least connections — pick the lightest server

Load Balancing Algorithms in Python

import hashlib
from collections import defaultdict

class LoadBalancer:
    def __init__(self, servers: list[str]):
        self.servers = servers
        self.rr_index = 0
        self.connections = defaultdict(int)  # server -> active count

    def round_robin(self) -> str:
        """Cycle through servers in order."""
        server = self.servers[self.rr_index % len(self.servers)]
        self.rr_index += 1
        return server

    def least_connections(self) -> str:
        """Pick the server with the fewest active connections."""
        return min(self.servers, key=lambda s: self.connections[s])

    def ip_hash(self, client_ip: str) -> str:
        """Always route the same IP to the same server."""
        hash_val = int(hashlib.md5(client_ip.encode()).hexdigest(), 16)
        return self.servers[hash_val % len(self.servers)]

# Usage
lb = LoadBalancer(["server-1", "server-2", "server-3"])

# Round robin: cycles 1 -> 2 -> 3 -> 1 -> 2...
for _ in range(5):
    print(lb.round_robin())

# IP hash: same IP always goes to same server
print(lb.ip_hash("192.168.1.42"))  # always "server-2"
print(lb.ip_hash("192.168.1.42"))  # always "server-2"

Output

server-1
server-2
server-3
server-1
server-2
server-2
server-2

Layer 4 vs Layer 7 Load Balancing

Load balancers can work at different levels of the network stack. The two most common are Layer 4 and Layer 7 (from the OSI model).

Layer 4 (Transport Layer) — Makes decisions based on IP addresses and TCP/UDP port numbers. It doesn't look inside the request — it has no idea if you're loading a webpage or uploading a video. It just sees "traffic from IP X on port 443" and routes it. Fast and efficient because it doesn't need to decrypt or parse anything.

Layer 7 (Application Layer) — Makes decisions based on the actual content of the request: URL path, headers, cookies, HTTP method, even the request body. Much smarter routing:

Send /api/* requests to the API servers
Send /images/* requests to the media servers
Route based on cookies for session affinity
Route based on Accept-Language header for localization

Layer 7 is slower (it must parse the request) but far more flexible. Most modern load balancers like Nginx, HAProxy, and AWS ALB support Layer 7.

Health Checks

A load balancer is useless if it sends traffic to a dead server. That's why load balancers constantly health-check their servers.

There are two types:

Passive health checks — The load balancer monitors responses. If a server starts returning errors or timing out, it's marked as unhealthy and removed from the pool.
Active health checks — The load balancer periodically pings each server (e.g., GET /health every 10 seconds). If a server fails to respond a certain number of times, it's taken out of rotation.

When a failed server recovers, the load balancer detects the next successful health check and adds it back to the pool. This whole process is automatic — no human intervention needed at 3 AM!

▸ Health checks — route around failures

Note: Think of health checks like a teacher taking attendance. Every 10 seconds, the load balancer calls each server's name. "Server 1?" "Here!" "Server 2?" "Here!" "Server 3?" *silence*... "Server 3 is out — everyone go to Servers 1 and 2."

Click chart to zoom

How a load balancer routes each request: receive, health-check, pick a server, return the response

Sticky Sessions

Sometimes you need the same user to always go to the same server. Maybe the server stores session data in memory, or you're doing a file upload in chunks. This is called sticky sessions (or session affinity).

The load balancer achieves this by:

Setting a cookie (e.g., SERVERID=server-2) on the first response
Using IP hashing to consistently map users to servers
Reading a session ID from the request and routing accordingly

The downside? Sticky sessions hurt scalability. If one server gets all the "heavy" users stuck to it, it becomes overloaded while others sit idle. It also makes failover harder — if that server dies, those users lose their session state.

The better solution is usually to make your servers stateless and store session data externally (Redis, database). Then you don't need sticky sessions at all.

Hardware vs Software Load Balancers

Hardware load balancers (like F5, Citrix) are dedicated physical devices. They're extremely fast and can handle millions of connections, but they cost $10,000-$100,000+ and are hard to configure.

Software load balancers (like Nginx, HAProxy, Envoy, AWS ELB/ALB) run on standard servers. They're cheaper, more flexible, and easier to update. Most companies today use software load balancers.

Cloud providers also offer managed load balancers:

AWS ELB/ALB/NLB — Elastic/Application/Network Load Balancer
Google Cloud Load Balancing — Global load balancing across regions
Azure Load Balancer — Microsoft's offering

These managed solutions handle scaling, health checks, and SSL termination for you — one less thing to worry about.

Note: Interview tip: Don't forget that the load balancer itself can be a single point of failure! In production, you typically have TWO load balancers in an active-passive or active-active setup. If the primary fails, the secondary takes over via a virtual IP (VIP) using protocols like VRRP.