Generative AI Batch and Real-Time Integration Patterns
This chapter covers the two primary patterns for designing systems around large language models (LLMs) – batch and real-time. The decision to architect a batch or real-time application will depend on the use case you are working on. In general, batch use cases are formulated around generating data to be consumed later. For example, you will leverage LLMs to extract data points from a large corpus of data and then have a step to generate summaries to be consumed by business analysts on a daily basis. In the case of real-time use cases, data will be used as it becomes available. For example, you will leverage LLMs as online agents to answer questions from your customers or employees through a chat or voice interface.
Diving deeper into batch mode, it involves sending queries in bulk for higher throughput at the cost of latency. This is better suited for long, time-consuming production workloads and large data consumption...