Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Generative AI Application Integration Patterns

You're reading from   Generative AI Application Integration Patterns Integrate large language models into your applications

Arrow left icon
Product type Paperback
Published in Sep 2024
Publisher Packt
ISBN-13 9781835887608
Length 218 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Luis Lopez Soria Luis Lopez Soria
Author Profile Icon Luis Lopez Soria
Luis Lopez Soria
Juan Pablo Bustos Juan Pablo Bustos
Author Profile Icon Juan Pablo Bustos
Juan Pablo Bustos
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Introduction to Generative AI Patterns 2. Identifying Generative AI Use Cases FREE CHAPTER 3. Designing Patterns for Interacting with Generative AI 4. Generative AI Batch and Real-Time Integration Patterns 5. Integration Pattern: Batch Metadata Extraction 6. Integration Pattern: Batch Summarization 7. Integration Pattern: Real-Time Intent Classification 8. Integration Pattern: Real-Time Retrieval Augmented Generation 9. Operationalizing Generative AI Integration Patterns 10. Embedding Responsible AI into Your GenAI Applications 11. Other Books You May Enjoy
12. Index

Batch and real-time integration patterns

The first key decision when evaluating batch versus real-time integration approaches is around data immediacy – when exactly do you need the GenAI outputs? This boils down to whether you require responsive just-in-time results, like servicing on-demand user queries, versus use cases where insights from model outputs can accumulate over time before getting consumed.

Let’s illustrate this with an example of RAG, where LLMs evaluate search results to formulate human-friendly query responses. This is a real-time use case; you need AI-generated answers with minimal latency to deliver a quality user experience in applications like conversational assistants or search engines. The data has to be put into action as it is produced.

Contrast that with something like automated content generation workflows, for example, extracting metadata from a product catalog. While you still want that content quickly, there’s more flexibility around when model outputs get ingested downstream. You could run generative models in batches, queueing up prompts and processing them asynchronously based on available capacity. The generated texts then flow into your e-commerce databases on their own schedule.

The real-time interactive integration mode prioritizes low latency and responsive experiences above all else. Users receive AI results virtually instantly through request/response app interfaces. Batch mode disconnects that coupling, sacrificing instantaneous interactivity for higher overall throughput and cost efficiency at scale. Longer-running jobs optimize utilization across pooled models.

So, the batch versus real-time decision depends on analyzing data freshness requirements. For experiences demanding perceivable instant gratification, like querying information or iterating on creative ideation, you’ll want a request-scoped interactive architecture. But when targets are more about maximizing generative model output volume with flexible latency tolerances, then batching prompts yields better economics. Getting that integration pattern right is key to harnessing GenAI’s value.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime