Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Generative AI Application Integration Patterns

You're reading from   Generative AI Application Integration Patterns Integrate large language models into your applications

Arrow left icon
Product type Paperback
Published in Sep 2024
Publisher Packt
ISBN-13 9781835887608
Length 218 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Luis Lopez Soria Luis Lopez Soria
Author Profile Icon Luis Lopez Soria
Luis Lopez Soria
Juan Pablo Bustos Juan Pablo Bustos
Author Profile Icon Juan Pablo Bustos
Juan Pablo Bustos
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Introduction to Generative AI Patterns 2. Identifying Generative AI Use Cases FREE CHAPTER 3. Designing Patterns for Interacting with Generative AI 4. Generative AI Batch and Real-Time Integration Patterns 5. Integration Pattern: Batch Metadata Extraction 6. Integration Pattern: Batch Summarization 7. Integration Pattern: Real-Time Intent Classification 8. Integration Pattern: Real-Time Retrieval Augmented Generation 9. Operationalizing Generative AI Integration Patterns 10. Embedding Responsible AI into Your GenAI Applications 11. Other Books You May Enjoy
12. Index

General generative AI concepts

When integrating generative AI into practical applications, it is important to have an understanding of concepts such as model architecture and training. In this section, we cover an overview of prominent concepts, including transformers, diffusion models, pre-training, and prompt engineering, that enable systems to generate impressively accurate text, images, audio, and more.

Understanding these core concepts will equip you to make informed decisions when selecting foundations for your use cases. However, putting models into production requires further architectural considerations. We will be highlighting these decision points in the rest of the chapters in the book and in practical examples.

Generative AI model architectures

Generative AI models are based on specialized neural network architectures optimized for generative tasks. The two more widely known models are transformers and diffusion models.

Transformer models are not a new concept. They were first introduced by Google in a 2017 paper called Attention Is All You Need (https://arxiv.org/pdf/1706.03762.pdf). The paper explains the Transformer neural network architecture, which is entirely based on attention mechanisms using the encoder and decoder concepts. This architecture enables models to identify relationships across an input text. By having these relationships, the model predicts the next token, leveraging its previous prediction as an input, creating this recursive loop to generate new content.

Diffusion models have drawn considerable interest as generative models due to their foundation in the physical processes of non-equilibrium thermodynamics. In physics, diffusion refers to the motion of particles from areas of high concentration to low concentration over time. Diffusion models try to mimic this concept in their training process. These models are trained through two phases: the forward diffusion process adds “noise” to the original training data, followed by a reverse conditioning process, which then learns how to remove noise in the reverse diffusion process. By learning this process, these models can produce samples by starting from pure noise and letting the reverse diffusion model clear away unnecessary “noise” and preserving the desired “generated” content.

Other types of deep learning architectures, such as Generative Adversarial Networks (GANs), allow you to generate synthetic data based on existing data. GANs are useful because they leverage two models: one to generate a synthetic output and another one that tries to predict if this output is real or fake.

Through this iterative process, we can generate data that is indistinguishable from the real data but different enough to be used to enhance our training datasets. Another example of data generation architectures is Variational Autoencoders (VAEs), which use an encoder-decoder approach to generate new data samples resembling their training datasets.

Techniques available to optimize foundational models

There are several techniques used to develop and optimize foundational models that have driven significant gains in AI capabilities, some of which are more complex than others from a technical and monetary perspective:

  • Pre-training refers to fully training a model on a large dataset. It allows models to learn very broad representations from billions of data points, which help the model adapt to other closely related tasks. Popular methods include contrastive self-supervised pre-training on unlabeled data and pre-training on vast supervised data like the internet.
  • Fine-tuning adapts a pre-trained model’s already learned feature representations to perform a specific task. This only tunes some higher-level model layers rather than training from scratch. On the other hand, adapter tuning equips models with small, lightweight adapters that can rapidly tune to new tasks without interfering with existing capabilities. These pluggable adapters give a parameter-efficient way of accumulating knowledge across multiple tasks by learning task-specific behaviors while reusing the bulk of model weights. They help mitigate forgetting previous tasks and simplify personalization. For example, models may first be pre-trained on billions of text webpages to acquire general linguistic knowledge, before being fine-tuned on more domain-specific datasets for question answering, classification, etc.
  • Distillation uses a “teacher” model to train a smaller “student” model to reproduce the performance of the larger pre-trained model at a lower cost and latency. Quantizing and compressing large models into efficient forms for deployments also helps optimize performance and cost.

The combination of comprehensive pre-training followed by specialized fine-tuning, adapter tuning, and portable distillation has enabled unprecedented versatility of deep learning across domains. Each approach smartly reuses and transfers available knowledge, enabling the customization and scaling of generative AI.

Techniques to augment your foundational model responses

In addition to architecture and training advances, progress in generative AI has been fueled by innovations in how these models are augmented by external data at inference time.

Prompt engineering tunes the text prompts provided to models to steer their generation quality, capabilities, and properties. Well-designed prompts guide the model to produce the desired output format, reduce ambiguity, and provide helpful contextual constraints. This allows simpler model architectures to solve complex problems by encoding human knowledge into the prompts.

Retrieval augmented generation, also known as RAG, enhances text generation through efficient retrieval of relevant knowledge from external stores. Models receive contextual pieces of information as “context” to be considered as additional input before generating its output. Grounding LLMs (large language models) refers to providing model-specific factual knowledge rather than just model parameters, enabling more accurate, knowledgeable, and specific language generation.

Together, these approaches augment basic predictive language models to become far more versatile, robust, and scalable. They reduce brittleness via tight integration of human knowledge and grounded learning rather than just statistical patterns. RAG handles the breadth and real-time retrieval of information, prompts provide depth and rules to the desired outputs, and grounding binds them to reality. We would highly encourage readers to get familiar with this topic, as it is an industry best practice to perform RAG and to ground your model to prevent it from hallucinating. A good start is the following paper: Retrieval-Augmented Generation for Large Language Models: A Survey (https://arxiv.org/pdf/2312.10997).

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image