Cloud scaling and design patterns
Since you learned about some of the limits imposed by Azure OpenAI and OpenAI in the previous section, we will now look at how to overcome these limits.
Overcoming these limits through a well-designed architecture or design pattern is critical for businesses to ensure they are meeting any internal service-level agreements (SLAs) and are providing a robust service without a lot of latency, or delay, in the user or application experience.
What is scaling?
As we described earlier, limits are imposed on any cloud architecture, just as there are hardware limits on your laptop (amount of RAM or disk space), on-premises data centers, and so on. Resources are finite, so we have come to expect these limits, even in cloud services. However, there are a few techniques we can use to overcome limitations so that we can meet our business requirements or user behavior and appetite.
Understanding TPM, RPM, and PTUs
As we scale, we will need to understand...