Once the application is deployed, but before it is publicly announced or used, it is a good idea to estimate how many requests it can handle. Usually, you can roughly predict the requirements for the service by estimating the number of requests it needs to execute at peak periods, how long those periods are, how fast it should respond, and so on. Once you're clear on the requirements, you'll need to test-load your application.
Test-loads should be performed on the actual, deployed server, not your localhost. Here, we skip over the whole topic of deploying your model. We also didn't use ngnix or any similar gateway servers, which would cache requests, boosting the performance of the API significantly. Deployment of the application deserves a separate book and can be achieved in many ways, depending on your existing...