You're reading from RAG-Driven Generative AI Build custom retrieval augmented generation pipelines with LlamaIndex, Deep Lake, and Pinecone

Product type Paperback

Published in Sep 2024

Publisher Packt

ISBN-13 9781836200918

Length 334 pages

Edition 1st Edition

Languages

Python

Tools

Docker

Concepts

GPT/LLMs

Author (1):

Denis Rothman

View More author details

Table of Contents (14) Chapters

Preface

1. Why Retrieval Augmented Generation? FREE CHAPTER

2. RAG Embedding Vector Stores with Deep Lake and OpenAI

3. Building Index-Based RAG with LlamaIndex, Deep Lake, and OpenAI

4. Multimodal Modular RAG for Drone Technology

5. Boosting RAG Performance with Expert Human Feedback

6. Scaling RAG Bank Customer Data with Pinecone

7. Building Scalable Knowledge-Graph-Based RAG with Wikipedia API and LlamaIndex

8. Dynamic RAG with Chroma and Hugging Face Llama

9. Empowering AI Models: Fine-Tuning RAG Data and Human Feedback

10. RAG for Video Stock Production with Pinecone and OpenAI

11. Other Books You May Enjoy

12. Index

Appendix

Why Retrieval Augmented Generation?

Even the most advanced generative AI models can only generate responses based on the data they have been trained on. They cannot provide accurate answers to questions about information outside their training data. Generative AI models simply don’t know that they don’t know! This leads to inaccurate or inappropriate outputs, sometimes called hallucinations, bias, or, simply said, nonsense.

Retrieval Augmented Generation (RAG) is a framework that addresses this limitation by combining retrieval-based approaches with generative models. It retrieves relevant data from external sources in real time and uses this data to generate more accurate and contextually relevant responses. Generative AI models integrated with RAG retrievers are revolutionizing the field with their unprecedented efficiency and power. One of the key strengths of RAG is its adaptability. It can be seamlessly applied to any type of data, be it text, images, or audio. This versatility makes RAG ecosystems a reliable and efficient tool for enhancing generative AI capabilities.

A project manager, however, already encounters a wide range of generative AI platforms, frameworks, and models such as Hugging Face, Google Vertex AI, OpenAI, LangChain, and more. An additional layer of emerging RAG frameworks and platforms will only add complexity with Pinecone, Chroma, Activeloop, LlamaIndex, and so on. All these Generative AI and RAG frameworks often overlap, creating an incredible number of possible configurations. Finding the right configuration of models and RAG resources for a specific project, therefore, can be challenging for a project manager. There is no silver bullet. The challenge is tremendous, but the rewards, when achieved, are immense!

We will begin this chapter by defining the RAG framework at a high level. Then, we will define the three main RAG configurations: naïve RAG, advanced RAG, and modular RAG. We will also compare RAG and fine-tuning and determine when to use these approaches. RAG can only exist within an ecosystem, and we will design and describe one in this chapter. Data needs to come from somewhere and be processed. Retrieval requires an organized environment to retrieve data, and generative AI models have input constraints.

Finally, we will dive into the practical aspect of this chapter. We will build a Python program from scratch to run entry-level naïve RAG with keyword search and matching. We will also code an advanced RAG system with vector search and index-based retrieval. Finally, we will build a modular RAG that takes both naïve and advanced RAG into account. By the end of this chapter, you will acquire a theoretical understanding of the RAG framework and practical experience in building a RAG-driven generative AI program. This hands-on approach will deepen your understanding and equip you for the following chapters.

In a nutshell, this chapter covers the following topics: