Deploying ChatGPT in the Cloud: Architecture Design and Scaling Strategies
In the previous chapters, you learned more about how to fine-tune LLMs and add external data. You also gained a deep understanding of how prompts and responses work under the covers. Then, you learned how to develop applications with GenAI while using popular programming frameworks for the various LLMs. As we continue building on our learning of GenAI/ChatGPT for cloud solutions, we will realize that limits are placed on how these cloud services process tokens for prompts and completions. As large-scale deployments need to be “enterprise-ready,” we must take advantage of the cloud to provide the necessary services and support to enable an enterprise solution, with less effort than creating a service from the ground up, on our own. Services, such as security (this topic will be covered in more detail in the next chapter) and identity, are pre-baked into a cloud service, and thus in the cloud solution...