Load balancing is the process of distributing incoming network or application traffic across multiple servers to ensure no single server becomes overwhelmed, thereby improving performance, reliability, and availability of services. It acts as a traffic manager that ensures users get routed to optimal resources.

Why It’s Used

  • To scale horizontally by adding more servers
  • To increase fault tolerance: if one server fails, others continue to serve traffic
  • To reduce latency by routing requests based on proximity or performance
  • To optimize resource usage

Types of Load Balancing

Layer 4 Load Balancing

Operates at the Transport Layer (TCP/UDP). Makes routing decisions based on IP address and port information.

  • Faster, more efficient
  • Cannot inspect application-level content

Layer 7 Load Balancing

Operates at the Application Layer. Makes decisions based on HTTP headers, cookies, URL paths, etc.

  • Allows for intelligent routing (e.g. A/B testing, URL-based routing)
  • Supports content switching and user session awareness

Load Balancing Algorithms

AlgorithmDescription
Round RobinRequests are distributed evenly across all servers
Least ConnectionsTraffic is sent to the server with the fewest active connections
IP HashUses client IP to determine which server handles the request
Weighted Round RobinServers are assigned weights based on capacity
RandomRequests are distributed randomly
Response Time-BasedSends traffic to the server with the lowest response time

Health Checks

Load balancers perform regular health checks on backend servers to ensure they’re available. If a server fails, the balancer stops sending traffic to it until it recovers.

Advanced Features

  • SSL termination: Decrypt HTTPS traffic at the load balancer level
  • Sticky sessions: Keep users connected to the same backend server
  • Autoscaling: Triggered based on traffic/load metrics
  • Geographic routing: Send users to closest regional server