You're reading from RAG-Driven Generative AI Build custom retrieval augmented generation pipelines with LlamaIndex, Deep Lake, and Pinecone

Product type Paperback

Published in Sep 2024

Publisher Packt

ISBN-13 9781836200918

Length 334 pages

Edition 1st Edition

Languages

Python

Tools

Docker

Concepts

GPT/LLMs

Author (1):

Denis Rothman

View More author details

Table of Contents (14) Chapters

Preface

1. Why Retrieval Augmented Generation?

2. RAG Embedding Vector Stores with Deep Lake and OpenAI FREE CHAPTER

3. Building Index-Based RAG with LlamaIndex, Deep Lake, and OpenAI

4. Multimodal Modular RAG for Drone Technology

5. Boosting RAG Performance with Expert Human Feedback

6. Scaling RAG Bank Customer Data with Pinecone

7. Building Scalable Knowledge-Graph-Based RAG with Wikipedia API and LlamaIndex

8. Dynamic RAG with Chroma and Hugging Face Llama

9. Empowering AI Models: Fine-Tuning RAG Data and Human Feedback

10. RAG for Video Stock Production with Pinecone and OpenAI

11. Other Books You May Enjoy

12. Index

Appendix

Organizing RAG in a pipeline

A RAG pipeline will typically collect data and prepare it by cleaning it, for example, chunking the documents, embedding them, and storing them in a vector store dataset. The vector dataset is then queried to augment the user input of a generative AI model to produce an output. However, it is highly recommended not to run this sequence of RAG in one single program when it comes to using a vector store. We should at least separate the process into three components:

Data collection and preparation
Data embedding and loading into the dataset of a vector store
Querying the vectorized dataset to augment the input of a generative AI model to produce a response

Let’s go through the main reasons for this component approach:

Specialization, which will allow each member of a team to do what they are best at, either collecting and cleaning data, running embedding models, managing vector stores, or tweaking generative AI models.
Scalability, making it easier to upgrade separate components as the technology evolves and scale the different components with specialized methods. Storing raw data, for example, can be scaled on a different server than the cloud platform, where the embedded vectors are stored in a vectorized dataset.
Parallel development, which allows each team to advance at their pace without waiting for others. Improvements can be made continually on one component without disrupting the processes of the other components.
Maintenance is component-independent. One team can work on one component without affecting the other parts of the system. For example, if the RAG pipeline is in production, users can continue querying and running generative AI through the vector store while a team fixes the data collection component.
Security concerns and privacy are minimized because each team can work separately with specific authorization, access, and roles for each component.

As we can see, in real-life production environments or large-scale projects, it is rare for a single program or team to manage end-to-end processes. We are now ready to draw the blueprint of the RAG pipeline that we will build in Python in this chapter.

You're reading from RAG-Driven Generative AI Build custom retrieval augmented generation pipelines with LlamaIndex, Deep Lake, and Pinecone

Table of Contents (14) Chapters

Organizing RAG in a pipeline

Authors (1)

Personalised recommendations for you