System Design

Load Balancing: How Modern Systems Distribute Traffic at Scale

Kirtesh Admute

April 5, 2026

8 min read

Load Balancing: How Modern Systems Distribute Traffic at Scale

Every time you open Netflix, book a flight, or send a message, your request doesn't land on a single server. It's intelligently routed to one of many — the one best positioned to handle it right now. That intelligence is load balancing.

Without it, a single overloaded server would become your system's Achilles heel. With it, your system becomes elastic, resilient, and fast — regardless of traffic spikes.

What Is a Load Balancer?

A load balancer is a component that sits between clients and your backend servers, distributing incoming requests across a pool of servers based on defined rules.

Client → Load Balancer → [ Server 1 | Server 2 | Server 3 ]

Its three core responsibilities:

Traffic distribution — spread load across healthy servers
Health checking — route away from unhealthy instances
Session management — optionally maintain sticky sessions

Why Load Balancing Is Non-Negotiable

Eliminate single points of failure. If one server crashes, the load balancer routes around it automatically. Your users see nothing.

Horizontal scalability. Add more servers behind the balancer instead of upgrading to a bigger machine (vertical scaling). This is cheaper and more resilient.

Improve response times. Requests go to the least-busy server, reducing queue depth and latency.

Zero-downtime deployments. Rolling deploys work by taking servers out of rotation one at a time, updating them, then re-adding — users experience no interruption.

Types of Load Balancers

Layer 4 (Transport Layer)

Routes based on IP and TCP/UDP information — fast but limited context.

Operates at the network layer
Cannot inspect request content (no URL, no headers)
Very low latency overhead
Best for: high-throughput TCP connections, database proxies, gaming servers

Layer 7 (Application Layer)

Routes based on HTTP attributes — flexible and powerful.

Can inspect URL path, headers, cookies, body
Enables content-based routing (/api → API servers, /static → CDN origin)
Supports A/B testing, canary deployments, and header-based routing
Best for: web apps, APIs, microservices

Feature	Layer 4	Layer 7
Routing basis	IP + port	URL, headers, body
Speed	Faster	Slightly slower
TLS termination	Pass-through	Yes
Content-based routing	No	Yes
Use case	TCP/UDP apps	HTTP/HTTPS apps

Load Balancing Algorithms

Choosing the right algorithm is critical for optimal distribution.

Round Robin

Requests are distributed sequentially across servers.

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)

✅ Simple, predictable
❌ Ignores server capacity or current load

Weighted Round Robin

Servers with more capacity get proportionally more traffic.

Server A (weight: 3) → 3 out of every 5 requests
Server B (weight: 2) → 2 out of every 5 requests

✅ Accounts for heterogeneous hardware
❌ Still ignores real-time load

Least Connections

Routes to the server with the fewest active connections.

Server A: 150 connections
Server B: 60 connections  ← new request goes here
Server C: 200 connections

✅ Dynamically adapts to load
✅ Great for long-lived connections (WebSockets, streaming)

IP Hash

Hashes the client's IP to always route to the same server.

hash(client_ip) % num_servers → consistent server selection

✅ Achieves session stickiness without cookies
❌ Uneven distribution if many clients share an IP (e.g., behind NAT)

Least Response Time

Routes to the server with the lowest active connections and fastest average response time.

✅ Most sophisticated dynamic algorithm
✅ Used by Nginx Plus, HAProxy, and AWS ALB
❌ Requires tracking response time metrics

Health Checks

A load balancer is only as smart as its health checking.

Every 10 seconds:
  GET /health → HTTP 200 → ✅ healthy
  GET /health → HTTP 500 → ❌ remove from pool
  GET /health → timeout  → ❌ remove from pool

Types of health checks:

Type	What it checks
TCP	Can connect on the port
HTTP	Returns expected status code
Custom	Returns specific body or JSON field
Database	App can reach its dependencies

Best practice: your /health endpoint should verify database connectivity, cache availability, and any critical dependency — not just return 200 OK unconditionally.

typescript

1// Example health check endpoint (Next.js API route)
2export async function GET() {
3  try {
4    await db.query('SELECT 1'); // verify DB is reachable
5    return Response.json({ status: 'ok' }, { status: 200 });
6  } catch {
7    return Response.json({ status: 'error' }, { status: 500 });
8  }
9}

Session Persistence (Sticky Sessions)

Some applications require a user to always reach the same server — for example, when session data is stored in-memory (not in Redis).

Cookie-based stickiness: The load balancer sets a cookie (e.g., AWSALB) on the first response. Subsequent requests from that client carry the cookie, allowing the balancer to route to the same server.

Problems with sticky sessions:

Uneven distribution — one server gets all requests from a heavy user
If the server dies, the session is lost anyway
Contradicts the purpose of horizontal scaling

✅ Better approach: store session state in a shared layer (Redis, database) so any server can handle any request.

Deployment Patterns

Single Load Balancer (Simple)

              ┌──────────┐
Client ──────▶│  Load LB │──▶ [ Server 1 | Server 2 | Server 3 ]
              └──────────┘

⚠️ The load balancer itself becomes a single point of failure.

Active-Passive HA Pair

              ┌──────────────┐
              │  Active LB   │──▶ [ Servers... ]
Client ──────▶│              │
              │  Passive LB  │ (takes over if active fails)
              └──────────────┘

✅ High availability — if the active node fails, passive takes over via VIP (virtual IP).

DNS-Based Load Balancing

Multiple A records point to different load balancers or server IPs. DNS resolver distributes clients across them.

api.example.com → 203.0.113.1
api.example.com → 203.0.113.2
api.example.com → 203.0.113.3

✅ Global geographic distribution
❌ TTL means clients may hold stale IPs for minutes
❌ No real-time health awareness

Global Server Load Balancing (GSLB)

Routes users to the geographically closest healthy data center.

User in Mumbai    → Asia-Pacific region servers
User in New York  → US-East region servers
User in Frankfurt → EU-West region servers

Used by: Cloudflare, AWS Route 53, Akamai

Real-World Tools

Tool	Type	Best For
NGINX	L7 (software)	Web apps, reverse proxy, TLS offloading
HAProxy	L4/L7	High-performance TCP/HTTP balancing
AWS ALB	L7 (managed)	AWS-native apps, path-based routing
AWS NLB	L4 (managed)	Ultra-low latency, TCP/UDP
Cloudflare LB	L7 (global)	Global distribution, DDoS protection
Traefik	L7 (software)	Kubernetes, Docker-native dynamic LB
Envoy	L7 (software)	Service meshes (Istio), microservices

Load Balancing in Microservices

In a microservices architecture, load balancing happens at two levels:

External (north-south): An API gateway or edge load balancer handles traffic from outside the cluster.

Internal (east-west): Service-to-service calls are load-balanced via a service mesh (e.g., Istio with Envoy sidecars) or client-side load balancing (e.g., Ribbon in Spring Cloud).

External Client
     │
     ▼
 API Gateway (Nginx/ALB)
     │
     ▼
 Service A  ──(east-west LB via Envoy)──▶  Service B

Common Pitfalls

Not monitoring backend health granularly. A server that responds to /health with 200 OK but has an exhausted DB connection pool will still fail real requests. Make health checks reflect actual readiness.

Ignoring connection draining. When removing a server from rotation (deploy/scale-in), connections in-flight should complete gracefully before the server is terminated. Most managed LBs support a configurable drain timeout (e.g., 30 seconds).

Choosing the wrong algorithm for long-lived connections. Round robin works poorly for WebSocket or gRPC streaming connections — least-connections or least-response-time is a better fit.

SSL termination confusion. Decide upfront where TLS terminates. Terminating at the load balancer is simpler (backend uses plain HTTP); re-encryption (LB → backend also uses HTTPS) is more secure but adds CPU overhead.

Choosing the Right Strategy

Scenario	Recommended Approach
Simple web app, uniform servers	Round robin
Mixed hardware capacities	Weighted round robin
Long-lived connections (WebSocket, gRPC)	Least connections
High variability in response times	Least response time
Stateful app (legacy, in-memory sessions)	IP hash or cookie-based sticky sessions
Global multi-region	GSLB + Route 53 / Cloudflare
Kubernetes / microservices	Envoy / Traefik with service mesh

Conclusion

Load balancing is not a single technology — it's a discipline. Getting it right means understanding your traffic patterns, connection types, hardware topology, and failure modes.

The best load balancers are invisible: users never think about them because they just work. But behind the scenes, they're the quiet orchestrators keeping your system responsive, resilient, and ready to scale to 10x traffic without breaking a sweat.

A well-designed load balancing strategy is the difference between a system that survives Black Friday and one that collapses under it.

Written by

Kirtesh Admute

Full-stack engineer and digital architect — building scalable, production-grade systems with real-world impact.

April 5, 2026 8 min read

Load Balancing: How Modern Systems Distribute Traffic at Scale

Load Balancing: How Modern Systems Distribute Traffic at Scale

What Is a Load Balancer?

Why Load Balancing Is Non-Negotiable

Types of Load Balancers

Layer 4 (Transport Layer)

Layer 7 (Application Layer)

Load Balancing Algorithms

Round Robin

Weighted Round Robin

Least Connections

IP Hash

Least Response Time

Health Checks

Session Persistence (Sticky Sessions)

Deployment Patterns

Single Load Balancer (Simple)

Active-Passive HA Pair

DNS-Based Load Balancing

Global Server Load Balancing (GSLB)

Real-World Tools

Load Balancing in Microservices

Common Pitfalls

Choosing the Right Strategy

Conclusion

Kirtesh Admute

CI/CD Pipelines: The Foundation of Modern Software Delivery

Database Sharding & Partitioning: Scaling Beyond a Single Database

Microservices vs Monolith: Making the Right Architecture Choice in 2026

Stay in the
loop.

Load Balancing: How Modern Systems Distribute Traffic at Scale

What Is a Load Balancer?

Why Load Balancing Is Non-Negotiable

Types of Load Balancers

Layer 4 (Transport Layer)

Layer 7 (Application Layer)

Load Balancing Algorithms

Round Robin

Weighted Round Robin

Least Connections

IP Hash

Least Response Time

Health Checks

Session Persistence (Sticky Sessions)

Deployment Patterns

Single Load Balancer (Simple)

Active-Passive HA Pair

DNS-Based Load Balancing

Global Server Load Balancing (GSLB)

Real-World Tools

Load Balancing in Microservices

Common Pitfalls

Choosing the Right Strategy

Conclusion

Kirtesh Admute

CI/CD Pipelines: The Foundation of Modern Software Delivery

Database Sharding & Partitioning: Scaling Beyond a Single Database

Microservices vs Monolith: Making the Right Architecture Choice in 2026

Stay in theloop.

Stay in the
loop.