You're reading from Generative AI Application Integration Patterns Integrate large language models into your applications

Product type Paperback

Published in Sep 2024

Publisher Packt

ISBN-13 9781835887608

Length 218 pages

Edition 1st Edition

Languages

Python

Tools

TensorFlow

Concepts

Artificial Intelligence

Authors (2):

Luis Lopez Soria

Juan Pablo Bustos

View More author details

Table of Contents (13) Chapters

Preface

1. Introduction to Generative AI Patterns FREE CHAPTER

2. Identifying Generative AI Use Cases

3. Designing Patterns for Interacting with Generative AI

4. Generative AI Batch and Real-Time Integration Patterns

5. Integration Pattern: Batch Metadata Extraction

6. Integration Pattern: Batch Summarization

7. Integration Pattern: Real-Time Intent Classification

8. Integration Pattern: Real-Time Retrieval Augmented Generation

9. Operationalizing Generative AI Integration Patterns

10. Embedding Responsible AI into Your GenAI Applications

11. Other Books You May Enjoy

12. Index

Summary

This chapter covered the two primary patterns for designing systems around LLMs – batch and real-time. The decision depends on your organization’s use case requirements. We learned that batch mode involves sending queries in bulk for higher throughput at the expense of higher latency. It is better suited to long-running workloads and the consumption of a large corpus of data.

Results are not immediately exposed to users, allowing for additional review pipelines before or after model inference.

We also learned that real-time mode offers back-and-forth querying at a faster rate, providing quicker feedback to and from the end user. It has lower throughput but is better for low-latency requirements, but the opportunities to review results are reduced to prevent latency increases.

In this chapter, we addressed the implications of batch versus real-time processing on different components of the integration pipeline. For entry points, real-time optimizes for streamlined user prompting, while batch handles data pipeline inputs.

In pre-processing, real-time employs lighter techniques to minimize latency, whereas batch allows for heavier enrichment. Inference in real-time focuses on low latency per request, while batch processes requests in groups for improved throughput.

Post-processing in real-time involves quicker formatting and filtering, but batch processing allows for more complex transformations. In terms of presentation, real-time offers instantaneous UI updates, while batch exports results asynchronously.

Additionally, the chapter provided an example use case of using GenAI to enhance a website search, with document ingestion occurring in batch mode and search/response generation in real-time mode, transforming the end user experience, and obtaining more relevant and personalized answers.

In the next chapter, we will dive deep into a use case that leverages GenAI to extract data from 10-K documents.

Join our community on Discord

Join our community’s Discord space for discussions with the authors and other readers:

https://packt.link/genpat

The rest of the chapter is locked

You're reading from Generative AI Application Integration Patterns Integrate large language models into your applications

Table of Contents (13) Chapters

Summary

Join our community on Discord

Unlock this book and the full library FREE for 7 days

Authors (2)

Personalised recommendations for you