Design a Load Balancer

General system design involves using load balancer as component to distribute the scaled traffic horizontally across multiple backend nodes. But atypical but still possible is the fact that there can be a teams within (typically cloud) companies who might actually be working on such stuff and who may gave this gotcha kind of system design question.

Requirements

Some clarifying questions -

Candidate: Is it L4/ L7 load balancer?
Interviewer: Great question, let's keep both the options. Can you also explain the difference between them?

Candidate: Sure, L4 load balancer load balances based on TCP/IP Layer 4 of OSI model. Whereas L7 load balances the application level traffic at Layer 7 OSI model. Another difference is L4 load balancer can handle more traffic compared to L7 as L7 does deeper inspection to route traffic it is much slower.
Candidate: What is the traffic that I can assume that our LB will be handling?
Interviewer: You can expect the traffic to be Millions of requests per second.
Candidate: Ok, so there has to multiple instances to handle such a traffic. And with that much of traffic I estimate we will have network bandwidth of multiple Gbps for our L4 load balancer. In our design mostly we need to make the load balancer more available over having it consistent across load balancer instances after making changes to traffic configuration.
Candidate: What other features should over load balancer provide?
Interviewer: You can tell me what features do modern load balancer provide.
Candidate: I believe modern load balancer offers variety of features -

Load balancing
TLS termination
Authentication gateway
Proxy / Reverse proxy
Rate limiting
IP Whitelisting / ACLs

Interviewer: What sort of load balancing algorithms does your load balancer provide?
Candidate: Load balancing algorithms -

Algorithm	Layer	Description	When to Use
Round Robin	L4/L7	Evenly distributes across all backends	Simple, stateless backends
Least Connections	L4/L7	Chooses backend with fewest active connections	Long-lived connections (e.g., WebSocket)
Weighted Round Robin	L4/L7	Like RR, but favors higher-weight servers	Heterogeneous backend capacity
Random	L4/L7	Random selection	When backend is stateless and symmetrical
Hash-Based	L4/L7	Hashes client IP, session ID, etc.	Sticky sessions
Consistent Hashing	L7	Used in sharded systems	Distributed caches or sharded DBs
Least Latency / Performance-based	L7	Route based on observed response times	Adaptive load balancing
IP Hash	L4	Route based on source IP hash	Basic stickiness without cookies

Interviewer: What sort of routing algorithms does your load balancer provide?
Candidate: Routing algorithms -

Algorithm / Method	Layer	Description	Example
Path-based Routing	L7	Route based on URL path	`/api/v1/* → service-A`
Host-based Routing	L7	Route based on Host header	`api.example.com → API LB`
Header-based Routing	L7	Inspect request headers	`X-Tenant-ID → tenant-specific svc`
Cookie-based Routing	L7	Sticky routing or A/B testing	`cookie=variant-B → version-B backend`
Canary Routing	L7	Send % of traffic to new version	`90% → v1, 10% → v2`
Geo-based Routing	L7 (sometimes L3/L4)	Route based on client location	Client from EU → EU datacenter
User/Session Routing	L7	Route based on JWT token/user ID	`userID % 10 → shard-2`

Since, the problem is very vague we can continue asking about -

Resilience & Fault Tolerance
Observability
Scalability
Configuration Management
Zero downtime deployments
CDN Integration
Rate aware routing

Estimation

We need to store the configuration settings of load balancer routing and load balancing and resources in our storage.

We can expect configuration files in json to be on average 250KB

Approximately, if we have 1 Million such configuration that we are supporting then we need a storage of about 250GB to store.

We also went to replicate this across multiple data center (dc) regions

Design the service

Configurations: Allow service owners to set the Load balancer configuration

API

POST, PUT, GET /api/v1/user/{userid}/loadbalancer/{loadbalancerid}

RequestBody

{

loadbalancerid: {loadbalancerid}

type: l4/l7

resources: [

dns-name / containerid / ip

]

lb-algorithm: ecmp/geo-based

health-check: heartbeat/latency

}

Data Model

1. Load balancer configuration

{

"lb_id": "lb-12345",

"name": "prod-app-lb",

"type": "L7", // or L4

"algorithm": "round_robin", // or least_conn, hash, etc.

"listener_port": 443,

"protocol": "HTTPS",

"ssl_cert_id": "cert-abc",

"backend_pool_ids": ["pool-1", "pool-2"],

"health_check_id": "hc-001",

"created_at": "2025-06-09T09:00:00Z"

}

2. Backend pools

{

"pool_id": "pool-1",

"targets": [

{"ip": "10.0.0.12", "port": 8080, "zone": "us-west1-a"},

{"ip": "10.0.0.15", "port": 8080, "zone": "us-west1-b"}

"max_connections": 1000,

"health_check_id": "hc-001"

}

3. Health checks

{

"hc_id": "hc-001",

"protocol": "HTTP",

"path": "/health",

"interval_secs": 5,

"timeout_secs": 2,

"unhealthy_threshold": 3,

"healthy_threshold": 2

}

4. Routing rules(for L7)

{

"rule_id": "rule-77",

"path_pattern": "/api/v1/*",

"methods": ["GET", "POST"],

"forward_to_pool": "pool-1"

}

5. Observability

{

"log_id": "req-8817",

"lb_id": "lb-12345",

"timestamp": "2025-06-09T09:20:00Z",

"source_ip": "192.168.1.77",

"target_ip": "10.0.0.15",

"status_code": 200,

"latency_ms": 123,

"region": "us-west1"

}

Architecture

Scaling

Horizontally scalable

To achieve horizontal scalability we need to replicate the configuration across all the regional datacenter or in worst case at least to datacenter in each availability zone.
The most feasible replicated db for these key value stores is etcd or consul
Feature comparison

Feature	etcd	Consul
Persistence	BoltDB on disk	Persistent storage on disk
Consensus Algo	Raft	Raft
Usage	Kubernetes, config storage	Service mesh, discovery, config
Strong Consistency	Yes	Yes (Raft)
APIs	gRPC/HTTP	HTTP RESTful
Built-in UI	No	Yes

thetinkerer

Friday, June 13, 2025