The problem is straightforward: we have too many users and their concurrent requests(QPS) on some stateless service can’t be handled by a single machine.
Spawn multiple machines and share the loads among them. This won’t solve the problem if the workers share some resources like a DB for query, which will later become the bottleneck. Load balancing can solve the scaling problem when the works are identical and stateless across each request.
Notice that the service must be stateless, otherwise the LB will maintain a state which is usually a sign of bad design at a higher level.
Using async IO(Epoll, for example), one load balance server can hold up to 100K connections concurrently. Then the server simply forwards requests to real workers and sends the client the response. The client won’t beware of the existence of internal workers.
Very few applications today need 100K QPS, but if they do need higher numbers, they can simply have two layer LB. A two layer LB, even with each LB handing 10K connection will be able to hold 100M concurrent connections. I don’t know of any application having such high QPS.
Nginx is a typical choice for LB. For a smaller number of workers, you can just use DNS for LB.
The Tech Platform