Summary
In this chapter on deploying GenAI in the cloud, we learned how to design and scale a robust, enterprise ready GenAI cloud solution. We covered what limits exist within each of the models and how to overcome these limits either by adding additional (Azure) OpenAI accounts and/or using an Azure APIM service.
APIM, with its very important exponential interval retry setting, is yet another way to help organizations scale up to meet business and user requirements.
Reserved capacity, known as PTUs in Microsoft Azure, is another way an enterprise can scale up to meet business requirements. We described how additional PTUs can be added and scaled by increasing the number of PTUs.
During our cloud scaling journey, we learned how to scale across multiple geographies, or multi-regions, to support broader scale globally, while also supporting our enterprise DR scenarios.
We now understand how to handle various response and error codes when making API calls against our generative...