Load balancing is an essential part of distributed applications. It not only spreads the incoming requests across a cluster of services, which is essential for scaling, but can also help the responsiveness and availability of the applications. A smart load balancer can gather metrics to react to patterns in incoming traffic, monitor the state of the servers in its cluster, and forward requests to the less loaded and faster responding nodes – avoiding the currently unhealthy ones.
Load balancing brings more throughput and less downtime. By forwarding requests to many servers, a single point of failure is eliminated, especially if multiple load balancers are used, for example, in an active-passive scheme.
Load balancers can be used anywhere in your architecture: you can balance the requests coming from the web, requests done by web servers to other services, requests to cache or database servers, and whatever else suits your requirements...