Scaling Services
How fast is the service responding? Is the service limited to CPU cores or memory? Based on user load, when is it useful to start more server instances? If you run too many compute resources, or if they’re too big, you pay more than is necessary. If the resources you use are too small, the response time increases or the applications might not be available at all. With this, you lose customers, and your income is reduced. You should know how to find bottlenecks and know what good knobs to turn to scale the resources as needed.
In Chapter 10, we created load tests to see how the service behaves under load, while in Chapter 11, we extended the service by adding telemetry data. Now, we’ll use both load tests and telemetry data to find out what scaling option is best.
In this chapter, we’ll start reducing the response time with the help of telemetry data before analyzing the load, which can be run with one instance. Finally, we’ll define...