You're reading from RAG-Driven Generative AI Build custom retrieval augmented generation pipelines with LlamaIndex, Deep Lake, and Pinecone

Product type Paperback

Published in Sep 2024

Publisher Packt

ISBN-13 9781836200918

Length 334 pages

Edition 1st Edition

Languages

Python

Tools

Docker

Concepts

GPT/LLMs

Author (1):

Denis Rothman

View More author details

Table of Contents (14) Chapters

Preface

1. Why Retrieval Augmented Generation? FREE CHAPTER

2. RAG Embedding Vector Stores with Deep Lake and OpenAI

3. Building Index-Based RAG with LlamaIndex, Deep Lake, and OpenAI

4. Multimodal Modular RAG for Drone Technology

5. Boosting RAG Performance with Expert Human Feedback

6. Scaling RAG Bank Customer Data with Pinecone

7. Building Scalable Knowledge-Graph-Based RAG with Wikipedia API and LlamaIndex

8. Dynamic RAG with Chroma and Hugging Face Llama

9. Empowering AI Models: Fine-Tuning RAG Data and Human Feedback

10. RAG for Video Stock Production with Pinecone and OpenAI

11. Other Books You May Enjoy

12. Index

Appendix

The RAG ecosystem

RAG-driven generative AI is a framework that can be implemented in many configurations. RAG’s framework runs within a broad ecosystem, as shown in Figure 1.3. However, no matter how many retrieval and generation frameworks you encounter, it all boils down to the following four domains and questions that go with them:

Data: Where is the data coming from? Is it reliable? Is it sufficient? Are there copyright, privacy, and security issues?
Storage: How is the data going to be stored before or after processing it? What amount of data will be stored?
Retrieval: How will the correct data be retrieved to augment the user’s input before it is sufficient for the generative model? What type of RAG framework will be successful for a project?
Generation: Which generative AI model will fit into the type of RAG framework chosen?

The data, storage, and generation domains depend heavily on the type of RAG framework you choose. Before making that choice, we need to evaluate the proportion of parametric and non-parametric knowledge in the ecosystem we are implementing. Figure 1.3 represents the RAG framework, which includes the main components regardless of the types of RAG implemented:

A diagram of a process

Description automatically generated

Figure 1.3: The Generative RAG-ecosystem

The Retriever (D) handles data collection, processing, storage, and retrieval
The Generator (G) handles input augmentation, prompt engineering, and generation
The Evaluator (E) handles mathematical metrics, human evaluation, and feedback
The Trainer (T) handles the initial pre-trained model and fine-tuning the model

Each of these four components relies on their respective ecosystems, which form the overall RAG-driven generative AI pipeline. We will refer to the domains D, G, E, and T in the following sections. Let’s begin with the retriever.

The retriever (D)

The retriever component of a RAG ecosystem collects, processes, stores, and retrieves data. The starting point of a RAG ecosystem is thus an ingestion data process, of which the first step is to collect data.

Collect (D1)

In today’s world, AI data is as diverse as our media playlists. It can be anything from a chunk of text in a blog post to a meme or even the latest hit song streamed through headphones. And it doesn’t stop there—the files themselves come in all shapes and sizes. Think of PDFs filled with all kinds of details, web pages, plain text files that get straight to the point, neatly organized JSON files, catchy MP3 tunes, videos in MP4 format, or images in PNG and JPG.

Furthermore, a large proportion of this data is unstructured and found in unpredictable and complex ways. Fortunately, many platforms, such as Pinecone, OpenAI, Chroma, and Activeloop, provide ready-to-use tools to process and store this jungle of data.

Process (D2)

In the data collection phase (D1) of multimodal data processing, various types of data, such as text, images, and videos, can be extracted from websites using web scraping techniques or any other source of information. These data objects are then transformed to create uniform feature representations. For example, data can be chunked (broken into smaller parts), embedded (transformed into vectors), and indexed to enhance searchability and retrieval efficiency.

We will introduce these techniques, starting with the Building Hybrid Adaptive RAG in Python section of this chapter. In the following chapters, we will continue building more complex data processing functions.

Storage (D3)

At this stage of the pipeline, we have collected and begun processing a large amount of diverse data from the internet—videos, pictures, texts, you name it. Now, what can we do with all that data to make it useful?

That’s where vector stores like Deep Lake, Pinecone, and Chroma come into play. Think of these as super smart libraries that don’t just store your data but convert it into mathematical entities as vectors, enabling powerful computations. They can also apply a variety of indexing methods and other techniques for rapid access.

Instead of keeping the data in static spreadsheets and files, we turn it into a dynamic, searchable system that can power anything from chatbots to search engines.

Retrieval query (D4)

The retrieval process is triggered by the user input or automated input (G1).

To retrieve data quickly, we load it into vector stores and datasets after transforming it into a suitable format. Then, using a combination of keyword searches, smart embeddings, and indexing, we can retrieve the data efficiently. Cosine similarity, for example, finds items that are closely related, ensuring that the search results are not just fast but also highly relevant.

Once the data is retrieved, we then augment the input.

The generator (G)

The lines are blurred in the RAG ecosystem between input and retrieval, as shown in Figure 1.3, representing the RAG framework and ecosystem. The user input (G1), automated or human, interacts with the retrieval query (D4) to augment the input before sending it to the generative model.

The generative flow begins with an input.

Input (G1)

The input can be a batch of automated tasks (processing emails, for example) or human prompts through a User Interface (UI). This flexibility allows you to seamlessly integrate AI into various professional environments, enhancing productivity across industries.

Augmented input with HF (G2)

Human feedback (HF) can be added to the input, as described in the Human feedback (E2) under Evaluator (E) section. Human feedback will make a RAG ecosystem considerably adaptable and provide full control over data retrieval and generative AI inputs. In the Building hybrid adaptive RAG in Python section of this chapter, we will build augmented input with human feedback.

Prompt engineering (G3)

Both the retriever (D) and the generator (G) rely heavily on prompt engineering to prepare the standard and augmented message that the generative AI model will have to process. Prompt engineering brings the retriever’s output and the user input together.

Generation and output (G4)

The choice of a generative AI model depends on the goals of a project. Llama, Gemini, GPT, and other models can fit various requirements. However, the prompt must meet each model’s specifications. Frameworks such as LangChain, which we will implement in this book, help streamline the integration of various AI models into applications by providing adaptable interfaces and tools.

The evaluator (E)

We often rely on mathematical metrics to assess the performance of a generative AI model. However, these metrics only give us part of the picture. It’s important to remember that the ultimate test of an AI’s effectiveness comes down to human evaluation.

Metrics (E1)

A model cannot be evaluated without mathematical metrics, such as cosine similarity, as with any AI system. These metrics ensure that the retrieved data is relevant and accurate. By quantifying the relationships and relevance of data points, they provide a solid foundation for assessing the model’s performance and reliability.

Human feedback (E2)

No generative AI system, whether RAG-driven or not, and whether the mathematical metrics seem sufficient or not, can elude human evaluation. It is ultimately human evaluation that decides if a system designed for human users will be accepted or rejected, praised or criticized.

Adaptive RAG introduces the human, real-life, pragmatic feedback factor that will improve a RAG-driven generative AI ecosystem. We will implement adaptive RAG in Chapter 5, Boosting RAG Performance with Expert Human Feedback.

The trainer (T)

A standard generative AI model is pre-trained with a vast amount of general-purpose data. Then, we can fine-tune (T2) the model with domain-specific data.

We will take this further by integrating static RAG data into the fine-tuning process in Chapter 9, Empowering AI Models: Fine-Tuning RAG Data and Human Feedback. We will also integrate human feedback, which provides valuable information that can be integrated into the fine-tuning process in a variant of Reinforcement Learning from Human Feedback (RLHF).

We are now ready to code entry-level naïve, advanced, and modular RAG in Python.