Design a Load Balancer
- General system design involves using load balancer as component to distribute the scaled traffic horizontally across multiple backend nodes. But atypical but still possible is the fact that there can be a teams within (typically cloud) companies who might actually be working on such stuff and who may gave this gotcha kind of system design question.
Requirements
- Some clarifying questions -
- Candidate: Is it L4/ L7 load balancer?
- Interviewer: Great question, let's keep both the options. Can you also explain the difference between them?
- Candidate: Sure, L4 load balancer load balances based on TCP/IP Layer 4 of OSI model. Whereas L7 load balances the application level traffic at Layer 7 OSI model. Another difference is L4 load balancer can handle more traffic compared to L7 as L7 does deeper inspection to route traffic it is much slower.
- Candidate: What is the traffic that I can assume that our LB will be handling?
- Interviewer: You can expect the traffic to be Millions of requests per second.
- Candidate: Ok, so there has to multiple instances to handle such a traffic. And with that much of traffic I estimate we will have network bandwidth of multiple Gbps for our L4 load balancer. In our design mostly we need to make the load balancer more available over having it consistent across load balancer instances after making changes to traffic configuration.
- Candidate: What other features should over load balancer provide?
- Interviewer: You can tell me what features do modern load balancer provide.
- Candidate: I believe modern load balancer offers variety of features -
- Load balancing
- TLS termination
- Authentication gateway
- Proxy / Reverse proxy
- Rate limiting
- IP Whitelisting / ACLs
- Interviewer: What sort of load balancing algorithms does your load balancer provide?
- Candidate: Load balancing algorithms -
| Algorithm | Layer | Description | When to Use |
|---|---|---|---|
| Round Robin | L4/L7 | Evenly distributes across all backends | Simple, stateless backends |
| Least Connections | L4/L7 | Chooses backend with fewest active connections | Long-lived connections (e.g., WebSocket) |
| Weighted Round Robin | L4/L7 | Like RR, but favors higher-weight servers | Heterogeneous backend capacity |
| Random | L4/L7 | Random selection | When backend is stateless and symmetrical |
| Hash-Based | L4/L7 | Hashes client IP, session ID, etc. | Sticky sessions |
| Consistent Hashing | L7 | Used in sharded systems | Distributed caches or sharded DBs |
| Least Latency / Performance-based | L7 | Route based on observed response times | Adaptive load balancing |
| IP Hash | L4 | Route based on source IP hash | Basic stickiness without cookies |
- Interviewer: What sort of routing algorithms does your load balancer provide?
- Candidate: Routing algorithms -
| Algorithm / Method | Layer | Description | Example |
|---|---|---|---|
| Path-based Routing | L7 | Route based on URL path | /api/v1/* → service-A |
| Host-based Routing | L7 | Route based on Host header | api.example.com → API LB |
| Header-based Routing | L7 | Inspect request headers | X-Tenant-ID → tenant-specific svc |
| Cookie-based Routing | L7 | Sticky routing or A/B testing | cookie=variant-B → version-B backend |
| Canary Routing | L7 | Send % of traffic to new version | 90% → v1, 10% → v2 |
| Geo-based Routing | L7 (sometimes L3/L4) | Route based on client location | Client from EU → EU datacenter |
| User/Session Routing | L7 | Route based on JWT token/user ID | userID % 10 → shard-2 |
- Since, the problem is very vague we can continue asking about -
- Resilience & Fault Tolerance
- Observability
- Scalability
- Configuration Management
- Zero downtime deployments
- CDN Integration
- Rate aware routing
Estimation
- We need to store the configuration settings of load balancer routing and load balancing and resources in our storage.
We can expect configuration files in json to be on average 250KB
Approximately, if we have 1 Million such configuration that we are supporting then we need a storage of about 250GB to store.
We also went to replicate this across multiple data center (dc) regions
Design the service
Configurations: Allow service owners to set the Load balancer configurationAPI
RequestBody
{
loadbalancerid: {loadbalancerid}
type: l4/l7
resources: [
dns-name / containerid / ip
]
lb-algorithm: ecmp/geo-based
health-check: heartbeat/latency
}
Data Model
1. Load balancer configuration
{
"lb_id": "lb-12345",
"name": "prod-app-lb",
"type": "L7", // or L4
"algorithm": "round_robin", // or least_conn, hash, etc.
"listener_port": 443,
"protocol": "HTTPS",
"ssl_cert_id": "cert-abc",
"backend_pool_ids": ["pool-1", "pool-2"],
"health_check_id": "hc-001",
"created_at": "2025-06-09T09:00:00Z"
}
2. Backend pools
{
"pool_id": "pool-1",
"targets": [
{"ip": "10.0.0.12", "port": 8080, "zone": "us-west1-a"},
{"ip": "10.0.0.15", "port": 8080, "zone": "us-west1-b"}
],
"max_connections": 1000,
"health_check_id": "hc-001"
}
3. Health checks
{
"hc_id": "hc-001",
"protocol": "HTTP",
"path": "/health",
"interval_secs": 5,
"timeout_secs": 2,
"unhealthy_threshold": 3,
"healthy_threshold": 2
}
4. Routing rules(for L7)
{
"rule_id": "rule-77",
"path_pattern": "/api/v1/*",
"methods": ["GET", "POST"],
"forward_to_pool": "pool-1"
}
5. Observability
{
"log_id": "req-8817",
"lb_id": "lb-12345",
"timestamp": "2025-06-09T09:20:00Z",
"source_ip": "192.168.1.77",
"target_ip": "10.0.0.15",
"status_code": 200,
"latency_ms": 123,
"region": "us-west1"
}
Architecture
Scaling
- Horizontally scalable
- To achieve horizontal scalability we need to replicate the configuration across all the regional datacenter or in worst case at least to datacenter in each availability zone.
- The most feasible replicated db for these key value stores is etcd or consul
- Feature comparison
Feature etcd Consul Persistence BoltDB on disk Persistent storage on disk Consensus Algo Raft Raft Usage Kubernetes, config storage Service mesh, discovery, config Strong Consistency Yes Yes (Raft) APIs gRPC/HTTP HTTP RESTful Built-in UI No Yes
