You're reading from RAG-Driven Generative AI Build custom retrieval augmented generation pipelines with LlamaIndex, Deep Lake, and Pinecone

Product type Paperback

Published in Sep 2024

Publisher Packt

ISBN-13 9781836200918

Length 334 pages

Edition 1st Edition

Languages

Python

Tools

Docker

Concepts

GPT/LLMs

Author (1):

Denis Rothman

View More author details

Table of Contents (14) Chapters

Preface

1. Why Retrieval Augmented Generation?

2. RAG Embedding Vector Stores with Deep Lake and OpenAI FREE CHAPTER

3. Building Index-Based RAG with LlamaIndex, Deep Lake, and OpenAI

4. Multimodal Modular RAG for Drone Technology

5. Boosting RAG Performance with Expert Human Feedback

6. Scaling RAG Bank Customer Data with Pinecone

7. Building Scalable Knowledge-Graph-Based RAG with Wikipedia API and LlamaIndex

8. Dynamic RAG with Chroma and Hugging Face Llama

9. Empowering AI Models: Fine-Tuning RAG Data and Human Feedback

10. RAG for Video Stock Production with Pinecone and OpenAI

11. Other Books You May Enjoy

12. Index

Appendix

A RAG-driven generative AI pipeline

Let’s dive into what a real-life RAG pipeline looks like. Imagine we’re a team that has to deliver a whole system in just a few weeks. Right off the bat, we’re bombarded with questions like:

Who’s going to gather and clean up all the data?
Who’s going to handle setting up OpenAI’s embedding model?
Who’s writing the code to get those embeddings up and running and managing the vector store?
Who’s going to take care of implementing GPT-4 and managing what it spits out?

Within a few minutes, everyone starts looking pretty worried. The whole thing feels overwhelming—like, seriously, who would even think about tackling all that alone?

So here’s what we do. We split into three groups, each of us taking on different parts of the pipeline, as shown in Figure 2.3:

A diagram of a pipeline

Description automatically generated

Figure 2.3: RAG pipeline components

Each of the three groups has one component to implement:

Data Collection and Prep (D1 and D2): One team takes on collecting the data and cleaning it.
Data Embedding and Storage (D2 and D3): Another team works on getting the data through OpenAI’s embedding model and stores these vectors in an Activeloop Deep Lake dataset.
Augmented Generation (D4, G1-G4, and E1): The last team handles the big job of generating content based on user input and retrieval queries. They use GPT-4 for this, and even though it sounds like a lot, it’s actually a bit easier because they aren’t waiting on anyone else—they just need the computer to do its calculations and evaluate the output.

Suddenly, the project doesn’t seem so scary. Everyone has their part to focus on, and we can all work without being distracted by the other teams. This way, we can all move faster and get the job done without the hold-ups that usually slow things down.

The organization of the project, represented in Figure 2.3, is a variant of the RAG ecosystem’s framework represented in Figure 1.3 of Chapter 1, Why Retrieval Augmented Generation?

We can now begin building a RAG pipeline.

You're reading from RAG-Driven Generative AI Build custom retrieval augmented generation pipelines with LlamaIndex, Deep Lake, and Pinecone

Table of Contents (14) Chapters

A RAG-driven generative AI pipeline

Authors (1)

Personalised recommendations for you