Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
RAG-Driven Generative AI
RAG-Driven Generative AI

RAG-Driven Generative AI: Build custom retrieval augmented generation pipelines with LlamaIndex, Deep Lake, and Pinecone

Arrow left icon
Profile Icon Denis Rothman
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.6 (14 Ratings)
Paperback Sep 2024 334 pages 1st Edition
eBook
$24.99 $35.99
Paperback
$43.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Denis Rothman
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.6 (14 Ratings)
Paperback Sep 2024 334 pages 1st Edition
eBook
$24.99 $35.99
Paperback
$43.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$24.99 $35.99
Paperback
$43.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

RAG-Driven Generative AI

Why Retrieval Augmented Generation?

Even the most advanced generative AI models can only generate responses based on the data they have been trained on. They cannot provide accurate answers to questions about information outside their training data. Generative AI models simply don’t know that they don’t know! This leads to inaccurate or inappropriate outputs, sometimes called hallucinations, bias, or, simply said, nonsense.

Retrieval Augmented Generation (RAG) is a framework that addresses this limitation by combining retrieval-based approaches with generative models. It retrieves relevant data from external sources in real time and uses this data to generate more accurate and contextually relevant responses. Generative AI models integrated with RAG retrievers are revolutionizing the field with their unprecedented efficiency and power. One of the key strengths of RAG is its adaptability. It can be seamlessly applied to any type of data, be it text, images, or audio. This versatility makes RAG ecosystems a reliable and efficient tool for enhancing generative AI capabilities.

A project manager, however, already encounters a wide range of generative AI platforms, frameworks, and models such as Hugging Face, Google Vertex AI, OpenAI, LangChain, and more. An additional layer of emerging RAG frameworks and platforms will only add complexity with Pinecone, Chroma, Activeloop, LlamaIndex, and so on. All these Generative AI and RAG frameworks often overlap, creating an incredible number of possible configurations. Finding the right configuration of models and RAG resources for a specific project, therefore, can be challenging for a project manager. There is no silver bullet. The challenge is tremendous, but the rewards, when achieved, are immense!

We will begin this chapter by defining the RAG framework at a high level. Then, we will define the three main RAG configurations: naïve RAG, advanced RAG, and modular RAG. We will also compare RAG and fine-tuning and determine when to use these approaches. RAG can only exist within an ecosystem, and we will design and describe one in this chapter. Data needs to come from somewhere and be processed. Retrieval requires an organized environment to retrieve data, and generative AI models have input constraints.

Finally, we will dive into the practical aspect of this chapter. We will build a Python program from scratch to run entry-level naïve RAG with keyword search and matching. We will also code an advanced RAG system with vector search and index-based retrieval. Finally, we will build a modular RAG that takes both naïve and advanced RAG into account. By the end of this chapter, you will acquire a theoretical understanding of the RAG framework and practical experience in building a RAG-driven generative AI program. This hands-on approach will deepen your understanding and equip you for the following chapters.

In a nutshell, this chapter covers the following topics:

  • Defining the RAG framework
  • The RAG ecosystem
  • Naïve keyword search and match RAG in Python
  • Advanced RAG with vector-search and index-based RAG in Python
  • Building a modular RAG program

Let’s begin by defining RAG.

What is RAG?

When a generative AI model doesn’t know how to answer accurately, some say it is hallucinating or producing bias. Simply said, it just produces nonsense. However, it all boils down to the impossibility of providing an adequate response when the model’s training didn’t include the information requested beyond the classical model configuration issues. This confusion often leads to random sequences of the most probable outputs, not the most accurate ones.

RAG begins where generative AI ends by providing the information an LLM model lacks to answer accurately. RAG was designed (Lewis et al., 2020) for LLMs. The RAG framework will perform optimized information retrieval tasks, and the generation ecosystem will add this information to the input (user query or automated prompt) to produce improved output. The RAG framework can be summed up at a high level in the following figure:

A diagram of a library

Description automatically generated

Figure 1.1: The two main components of RAG-driven generative AI

Think of yourself as a student in a library. You have an essay to write on RAG. Like ChatGPT, for example, or any other AI copilot, you have learned how to read and write. As with any Large Language Model (LLM), you are sufficiently trained to read advanced information, summarize it, and write content. However, like any superhuman AI you will find from Hugging Face, Vertex AI, or OpenAI, there are many things you don’t know.

In the retrieval phase, you search the library for books on the topic you need (the left side of Figure 1.1). Then, you go back to your seat, perform a retrieval task by yourself or a co-student, and extract the information you need from those books. In the generation phase (the right side of Figure 1.1), you begin to write your essay. You are a RAG-driven generative human agent, much like a RAG-driven generative AI framework.

As you continue to write your essay on RAG, you stumble across some tough topics. You don’t have the time to go through all the information available physically! You, as a generative human agent, are stuck, just as a generative AI model would be. You may try to write something, just as a generative AI model does when its output makes little sense. But you, like the generative AI agent, will not realize whether the content is accurate or not until somebody corrects your essay and you get a grade that will rank your essay.

At this point, you have reached your limit and decide to turn to a RAG generative AI copilot to ensure you get the correct answers. However, you are puzzled by the number of LLM models and RAG configurations available. You need first to understand the resources available and how RAG is organized. Let’s go through the main RAG configurations.

Naïve, advanced, and modular RAG configurations

A RAG framework necessarily contains two main components: a retriever and a generator. The generator can be any LLM or foundation multimodal AI platform or model, such as GPT-4o, Gemini, Llama, or one of the hundreds of variations of the initial architectures. The retriever can be any of the emerging frameworks, methods, and tools such as Activeloop, Pinecone, LlamaIndex, LangChain, Chroma, and many more.

The issue now is to decide which of the three types of RAG frameworks (Gao et al., 2024) will fit the needs of a project. We will illustrate these three approaches in code in the Naïve, advanced, and modular RAG in code section of this chapter:

  • Naïve RAG: This type of RAG framework doesn’t involve complex data embedding and indexing. It can be efficient to access reasonable amounts of data through keywords, for example, to augment a user’s input and obtain a satisfactory response.
  • Advanced RAG: This type of RAG involves more complex scenarios, such as with vector search and indexed-base retrieval applied. Advanced RAG can be implemented with a wide range of methods. It can process multiple data types, as well as multimodal data, which can be structured or unstructured.
  • Modular RAG: Modular RAG broadens the horizon to include any scenario that involves naïve RAG, advanced RAG, machine learning, and any algorithm needed to complete a complex project.

However, before going further, we need to decide if we should implement RAG or fine-tune a model.

RAG versus fine-tuning

RAG is not always an alternative to fine-tuning, and fine-tuning cannot always replace RAG. If we accumulate too much data in RAG datasets, the system may become too cumbersome to manage. On the other hand, we cannot fine-tune a model with dynamic, ever-changing data such as daily weather forecasts, stock market values, corporate news, and all forms of daily events.

The decision of whether to implement RAG or fine-tune a model relies on the proportion of parametric versus non-parametric information. The fundamental difference between a model trained from scratch or fine-tuned and RAG can be summed up in terms of parametric and non-parametric knowledge:

  • Parametric: In a RAG-driven generative AI ecosystem, the parametric part refers to the generative AI model’s parameters (weights) learned through training data. This means the model’s knowledge is stored in these learned weights and biases. The original training data is transformed into a mathematical form, which we call a parametric representation. Essentially, the model “remembers” what it learned from the data, but the data itself is not stored explicitly.
  • Non-Parametric: In contrast, the non-parametric part of a RAG ecosystem involves storing explicit data that can be accessed directly. This means that the data remains available and can be queried whenever needed. Unlike parametric models, where knowledge is embedded indirectly in the weights, non-parametric data in RAG allows us to see and use the actual data for each output.

The difference between RAG and fine-tuning relies on the amount of static (parametric) and dynamic (non-parametric) ever-evolving data the generative AI model must process. A system that relies too heavily on RAG might become overloaded and cumbersome to manage. A system that relies too much on fine-tuning a generative model will display its inability to adapt to daily information updates.

There is a decision-making threshold illustrated in Figure 1.2 that shows that a RAG-driven generative AI project manager will have to evaluate the potential of the ecosystem’s trained parametric generative AI model before implementing a non-parametric (explicit data) RAG framework. The potential of the RAG component requires careful evaluation as well.

A diagram of a temperature measurement

Description automatically generated

Figure 1.2: The decision-making threshold between enhancing RAG or fine-tuning an LLM

In the end, the balance between enhancing the retriever and the generator in a RAG-driven generative AI ecosystem depends on a project’s specific requirements and goals. RAG and fine-tuning are not mutually exclusive.

RAG can be used to improve a model’s overall efficiency, together with fine-tuning, which serves as a method to enhance the performance of both the retrieval and generation components within the RAG framework. We will fine-tune a proportion of the retrieval data in Chapter 9, Empowering AI Models: Fine-Tuning RAG Data and Human Feedback.

We will now see how a RAG-driven generative AI involves an ecosystem with many components.

The RAG ecosystem

RAG-driven generative AI is a framework that can be implemented in many configurations. RAG’s framework runs within a broad ecosystem, as shown in Figure 1.3. However, no matter how many retrieval and generation frameworks you encounter, it all boils down to the following four domains and questions that go with them:

  • Data: Where is the data coming from? Is it reliable? Is it sufficient? Are there copyright, privacy, and security issues?
  • Storage: How is the data going to be stored before or after processing it? What amount of data will be stored?
  • Retrieval: How will the correct data be retrieved to augment the user’s input before it is sufficient for the generative model? What type of RAG framework will be successful for a project?
  • Generation: Which generative AI model will fit into the type of RAG framework chosen?

The data, storage, and generation domains depend heavily on the type of RAG framework you choose. Before making that choice, we need to evaluate the proportion of parametric and non-parametric knowledge in the ecosystem we are implementing. Figure 1.3 represents the RAG framework, which includes the main components regardless of the types of RAG implemented:

A diagram of a process

Description automatically generated

Figure 1.3: The Generative RAG-ecosystem

  • The Retriever (D) handles data collection, processing, storage, and retrieval
  • The Generator (G) handles input augmentation, prompt engineering, and generation
  • The Evaluator (E) handles mathematical metrics, human evaluation, and feedback
  • The Trainer (T) handles the initial pre-trained model and fine-tuning the model

Each of these four components relies on their respective ecosystems, which form the overall RAG-driven generative AI pipeline. We will refer to the domains D, G, E, and T in the following sections. Let’s begin with the retriever.

The retriever (D)

The retriever component of a RAG ecosystem collects, processes, stores, and retrieves data. The starting point of a RAG ecosystem is thus an ingestion data process, of which the first step is to collect data.

Collect (D1)

In today’s world, AI data is as diverse as our media playlists. It can be anything from a chunk of text in a blog post to a meme or even the latest hit song streamed through headphones. And it doesn’t stop there—the files themselves come in all shapes and sizes. Think of PDFs filled with all kinds of details, web pages, plain text files that get straight to the point, neatly organized JSON files, catchy MP3 tunes, videos in MP4 format, or images in PNG and JPG.

Furthermore, a large proportion of this data is unstructured and found in unpredictable and complex ways. Fortunately, many platforms, such as Pinecone, OpenAI, Chroma, and Activeloop, provide ready-to-use tools to process and store this jungle of data.

Process (D2)

In the data collection phase (D1) of multimodal data processing, various types of data, such as text, images, and videos, can be extracted from websites using web scraping techniques or any other source of information. These data objects are then transformed to create uniform feature representations. For example, data can be chunked (broken into smaller parts), embedded (transformed into vectors), and indexed to enhance searchability and retrieval efficiency.

We will introduce these techniques, starting with the Building Hybrid Adaptive RAG in Python section of this chapter. In the following chapters, we will continue building more complex data processing functions.

Storage (D3)

At this stage of the pipeline, we have collected and begun processing a large amount of diverse data from the internet—videos, pictures, texts, you name it. Now, what can we do with all that data to make it useful?

That’s where vector stores like Deep Lake, Pinecone, and Chroma come into play. Think of these as super smart libraries that don’t just store your data but convert it into mathematical entities as vectors, enabling powerful computations. They can also apply a variety of indexing methods and other techniques for rapid access.

Instead of keeping the data in static spreadsheets and files, we turn it into a dynamic, searchable system that can power anything from chatbots to search engines.

Retrieval query (D4)

The retrieval process is triggered by the user input or automated input (G1).

To retrieve data quickly, we load it into vector stores and datasets after transforming it into a suitable format. Then, using a combination of keyword searches, smart embeddings, and indexing, we can retrieve the data efficiently. Cosine similarity, for example, finds items that are closely related, ensuring that the search results are not just fast but also highly relevant.

Once the data is retrieved, we then augment the input.

The generator (G)

The lines are blurred in the RAG ecosystem between input and retrieval, as shown in Figure 1.3, representing the RAG framework and ecosystem. The user input (G1), automated or human, interacts with the retrieval query (D4) to augment the input before sending it to the generative model.

The generative flow begins with an input.

Input (G1)

The input can be a batch of automated tasks (processing emails, for example) or human prompts through a User Interface (UI). This flexibility allows you to seamlessly integrate AI into various professional environments, enhancing productivity across industries.

Augmented input with HF (G2)

Human feedback (HF) can be added to the input, as described in the Human feedback (E2) under Evaluator (E) section. Human feedback will make a RAG ecosystem considerably adaptable and provide full control over data retrieval and generative AI inputs. In the Building hybrid adaptive RAG in Python section of this chapter, we will build augmented input with human feedback.

Prompt engineering (G3)

Both the retriever (D) and the generator (G) rely heavily on prompt engineering to prepare the standard and augmented message that the generative AI model will have to process. Prompt engineering brings the retriever’s output and the user input together.

Generation and output (G4)

The choice of a generative AI model depends on the goals of a project. Llama, Gemini, GPT, and other models can fit various requirements. However, the prompt must meet each model’s specifications. Frameworks such as LangChain, which we will implement in this book, help streamline the integration of various AI models into applications by providing adaptable interfaces and tools.

The evaluator (E)

We often rely on mathematical metrics to assess the performance of a generative AI model. However, these metrics only give us part of the picture. It’s important to remember that the ultimate test of an AI’s effectiveness comes down to human evaluation.

Metrics (E1)

A model cannot be evaluated without mathematical metrics, such as cosine similarity, as with any AI system. These metrics ensure that the retrieved data is relevant and accurate. By quantifying the relationships and relevance of data points, they provide a solid foundation for assessing the model’s performance and reliability.

Human feedback (E2)

No generative AI system, whether RAG-driven or not, and whether the mathematical metrics seem sufficient or not, can elude human evaluation. It is ultimately human evaluation that decides if a system designed for human users will be accepted or rejected, praised or criticized.

Adaptive RAG introduces the human, real-life, pragmatic feedback factor that will improve a RAG-driven generative AI ecosystem. We will implement adaptive RAG in Chapter 5, Boosting RAG Performance with Expert Human Feedback.

The trainer (T)

A standard generative AI model is pre-trained with a vast amount of general-purpose data. Then, we can fine-tune (T2) the model with domain-specific data.

We will take this further by integrating static RAG data into the fine-tuning process in Chapter 9, Empowering AI Models: Fine-Tuning RAG Data and Human Feedback. We will also integrate human feedback, which provides valuable information that can be integrated into the fine-tuning process in a variant of Reinforcement Learning from Human Feedback (RLHF).

We are now ready to code entry-level naïve, advanced, and modular RAG in Python.

Naïve, advanced, and modular RAG in code

This section introduces naïve, advanced, and modular RAG through basic educational examples. The program builds keyword matching, vector search, and index-based retrieval methods. Using OpenAI’s GPT models, it generates responses based on input queries and retrieved documents.

The goal of the notebook is for a conversational agent to answer questions on RAG in general. We will build the retriever from the bottom up, from scratch, in Python and run the generator with OpenAI GPT-4o in eight sections of code divided into two parts:

Part 1: Foundations and Basic Implementation

  1. Environment setup for OpenAI API integration
  2. Generator function using GPT-4o
  3. Data setup with a list of documents (db_records)
  4. Query for user input

Part 2: Advanced Techniques and Evaluation

  1. Retrieval metrics to measure retrieval responses
  2. Naïve RAG with a keyword search and matching function
  3. Advanced RAG with vector search and index-based search
  4. Modular RAG implementing flexible retrieval methods

To get started, open RAG_Overview.ipynb in the GitHub repository. We will begin by establishing the foundations of the notebook and exploring the basic implementation.

Part 1: Foundations and basic implementation

In this section, we will set up the environment, create a function for the generator, define a function to print a formatted response, and define the user query.

The first step is to install the environment.

The section titles of the following implementation of the notebook follow the structure in the code. Thus, you can follow the code in the notebook or read this self-contained section.

1. Environment

The main package to install is OpenAI to access GPT-4o through an API:

!pip install openai==1.40.3

Make sure to freeze the OpenAI version you install. In RAG framework ecosystems, we will have to install several packages to run advanced RAG configurations. Once we have stabilized an installation, we will freeze the version of the packages installed to minimize potential conflicts between the libraries and modules we implement.

Once you have installed openai, you will have to create an account on OpenAI (if you don’t have one) and obtain an API key. Make sure to check the costs and payment plans before running the API.

Once you have a key, store it in a safe place and retrieve it as follows from Google Drive, for example, as shown in the following code:

#API Key
#Store you key in a file and read it(you can type it directly in the notebook but it will be visible for somebody next to you)
from google.colab import drive
drive.mount('/content/drive')

You can use Google Drive or any other method you choose to store your key. You can read the key from a file, or you can also choose to enter the key directly in the code:

f = open("drive/MyDrive/files/api_key.txt", "r")
API_KEY=f.readline().strip()
f.close()
 
#The OpenAI Key
import os
import openai
os.environ['OPENAI_API_KEY'] =API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")

With that, we have set up the main resources for our project. We will now write a generation function for the OpenAI model.

2. The generator

The code imports openai to generate content and time to measure the time the requests take:

import openai
from openai import OpenAI
import time
client = OpenAI()
gptmodel="gpt-4o"
start_time = time.time()  # Start timing before the request

We now create a function that creates a prompt with an instruction and the user input:

def call_llm_with_full_text(itext):
    # Join all lines to form a single string
    text_input = '\n'.join(itext)
    prompt = f"Please elaborate on the following content:\n{text_input}"

The function will try to call gpt-4o, adding additional information for the model:

    try:
      response = client.chat.completions.create(
         model=gptmodel,
         messages=[
            {"role": "system", "content": "You are an expert Natural Language Processing exercise expert."},
            {"role": "assistant", "content": "1.You can explain read the input and answer in detail"},
            {"role": "user", "content": prompt}
         ],
         temperature=0.1  # Add the temperature parameter here and other parameters you need
        )
      return response.choices[0].message.content.strip()
    except Exception as e:
        return str(e)

Note that the instruction messages remain general in this scenario so that the model remains flexible. The temperature is low (more precise) and set to 0.1. If you wish for the system to be more creative, you can set temperature to a higher value, such as 0.7. However, in this case, it is recommended to ask for precise responses.

We can add textwrap to format the response as a nice paragraph when we call the generative AI model:

import textwrap
def print_formatted_response(response):
    # Define the width for wrapping the text
    wrapper = textwrap.TextWrapper(width=80)  # Set to 80 columns wide, but adjust as needed
    wrapped_text = wrapper.fill(text=response)
    # Print the formatted response with a header and footer
    print("Response:")
    print("---------------")
    print(wrapped_text)
    print("---------------\n")

The generator is now ready to be called when we need it. Due to the probabilistic nature of generative AI models, it might produce different outputs each time we call it.

The program now implements the data retrieval functionality.

3. The Data

Data collection includes text, images, audio, and video. In this notebook, we will focus on data retrieval through naïve, advanced, and modular configurations, not data collection. We will collect and embed data later in Chapter 2, RAG Embedding Vector Stores with Deep Lake and OpenAI. As such, we will assume that the data we need has been processed and thus collected, cleaned, and split into sentences. We will also assume that the process included loading the sentences into a Python list named db_records.

This approach illustrates three aspects of the RAG ecosystem we described in The RAG ecosystem section and the components of the system described in Figure 1.3:

  • The retriever (D) has three data processing components, collect (D1), process (D2), and storage (D3), which are preparatory phases of the retriever.
  • The retriever query (D4) is thus independent of the first three phases (collect, process, and storage) of the retriever.
  • The data processing phase will often be done independently and prior to activating the retriever query, as we will implement starting in Chapter 2.

This program assumes that data processing has been completed and the dataset is ready:

db_records = [
    "Retrieval Augmented Generation (RAG) represents a sophisticated hybrid approach in the field of artificial intelligence, particularly within the realm of natural language processing (NLP).",
…/…

We can display a formatted version of the dataset:

import textwrap
paragraph = ' '.join(db_records)
wrapped_text = textwrap.fill(paragraph, width=80)
print(wrapped_text)

The output joins the sentences in db_records for display, as printed in this excerpt, but db_records remains unchanged:

Retrieval Augmented Generation (RAG) represents a sophisticated hybrid approach in the field of artificial intelligence, particularly within the realm of natural language processing (NLP)…

The program is now ready to process a query.

4.The query

The retriever (D4 in Figure 1.3) query process depends on how the data was processed, but the query itself is simply user input or automated input from another AI agent. We all dream of users who introduce the best input into software systems, but unfortunately, in real life, unexpected inputs lead to unpredictable behaviors. We must, therefore, build systems that take imprecise inputs into account.

In this section, we will imagine a situation in which hundreds of users in an organization have heard the word “RAG” associated with “LLM” and “vector stores.” Many of them would like to understand what these terms mean to keep up with a software team that’s deploying a conversational agent in their department. After a couple of days, the terms they heard become fuzzy in their memory, so they ask the conversational agent, GPT-4o in this case, to explain what they remember with the following query:

query = "define a rag store"

In this case, we will simply store the main query of the topic of this program in query, which represents the junction between the retriever and the generator. It will trigger a configuration of RAG (naïve, advanced, and modular). The choice of configuration will depend on the goals of each project.

The program takes the query and sends it to a GPT-4o model to be processed and then displays the formatted output:

# Call the function and print the result
llm_response = call_llm_with_full_text(query)
print_formatted_response(llm_response)

The output is revealing. Even the most powerful generative AI models cannot guess what a user, who knows nothing about AI, is trying to find out in good faith. In this case, GPT-4o will answer as shown in this excerpt of the output:

Response:
---------------
Certainly! The content you've provided appears to be a sequence of characters
that, when combined, form the phrase "define a rag store." Let's break it down
step by step:…
… This is an indefinite article used before words that begin with a consonant sound.    - **rag**: This is a noun that typically refers to a pieceof old, often torn, cloth.    - **store**: This is a noun that refers to a place where goods are sold.  4. **Contextual Meaning**:    - **"Define a rag store"**: This phrase is asking for an explanation or definition of what a "rag store" is. 5. **Possible Definition**:    - A "rag store" could be a shop or retail establishment that specializes in selling rags,…

The output will seem like a hallucination, but is it really? The user wrote the query with the good intentions of every beginner trying to learn a new topic. GPT-4o, in good faith, did what it could with the limited context it had with its probabilistic algorithm, which might even produce a different response each time we run it. However, GPT-4o is being wary of the query. It wasn’t very clear, so it ends the response with the following output that asks the user for more context:

…Would you like more information or a different type of elaboration on this content?…

The user is puzzled, not knowing what to do, and GPT-4o is awaiting further instructions. The software team has to do something!

Generative AI is based on probabilistic algorithms. As such, the response provided might vary from one run to another, providing similar (but not identical) responses.

That is when RAG comes in to save the situation. We will leave this query as it is for the whole notebook and see if a RAG-driven GPT-4o system can do better.

Part 2: Advanced techniques and evaluation

In Part 2, we will introduce naïve, advanced, and modular RAG. The goal is to introduce the three methods, not to process complex documents, which we will implement throughout the following chapters of this book.

Let’s first begin by defining retrieval metrics to measure the accuracy of the documents we retrieve.

1. Retrieval metrics

This section explores retrieval metrics, first focusing on the role of cosine similarity in assessing the relevance of text documents. Then we will implement enhanced similarity metrics by incorporating synonym expansion and text preprocessing to improve the accuracy of similarity calculations between texts.

We will explore more metrics in the Metrics calculation and display section in Chapter 7, Building Scalable Knowledge-Graph-Based RAG with Wikipedia API and LlamaIndex.

In this chapter, let’s begin with cosine similarity.

Cosine similarity

Cosine similarity measures the cosine of the angle between two vectors. In our case, the two vectors are the user query and each document in a corpus.

The program first imports the class and function we need:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

TfidfVectorizer imports the class that converts text documents into a matrix of TF-IDF features. Term Frequency-Inverse Document Frequency (TF-IDF) quantifies the relevance of a word to a document in a collection, distinguishing common words from those significant to specific texts. TF-IDF will thus quantify word relevance in documents using frequency across the document and inverse frequency across the corpus. cosine_similarity imports the function we will use to calculate the similarity between vectors.

calculate_cosine_similarity(text1, text2) then calculates the cosine similarity between the query (text1) and each record of the dataset.

The function converts the query text (text1) and each record (text2) in the dataset into a vector with a vectorizer. Then, it calculates and returns the cosine similarity between the two vectors:

def calculate_cosine_similarity(text1, text2):
    vectorizer = TfidfVectorizer(
        stop_words='english',
        use_idf=True,
        norm='l2',
        ngram_range=(1, 2),  # Use unigrams and bigrams
        sublinear_tf=True,   # Apply sublinear TF scaling
        analyzer='word'      # You could also experiment with 'char' or 'char_wb' for character-level features
    )
    tfidf = vectorizer.fit_transform([text1, text2])
    similarity = cosine_similarity(tfidf[0:1], tfidf[1:2])
    return similarity[0][0]

The key parameters of this function are:

  • stop_words='english: Ignores common English words to focus on meaningful content
  • use_idf=True: Enables inverse document frequency weighting
  • norm='l2': Applies L2 normalization to each output vector
  • ngram_range=(1, 2): Considers both single words and two-word combinations
  • sublinear_tf=True: Applies logarithmic term frequency scaling
  • analyzer='word': Analyzes text at the word level

Cosine similarity can be limited in some cases. Cosine similarity has limitations when dealing with ambiguous queries because it strictly measures the similarity based on the angle between vector representations of text. If a user asks a vague question like “What is rag?” in the program of this chapter and the database primarily contains information on “RAG” as in “retrieval-augmented generation” for AI, not “rag cloths,” the cosine similarity score might be low. This low score occurs because the mathematical model lacks contextual understanding to differentiate between the different meanings of “rag.” It only computes similarity based on the presence and frequency of similar words in the text, without grasping the user’s intent or the broader context of the query. Thus, even if the answers provided are technically accurate within the available dataset, the cosine similarity may not reflect the relevance accurately if the query’s context isn’t well-represented in the data.

In this case, we can try enhanced similarity.

Enhanced similarity

Enhanced similarity introduces calculations that leverage natural language processing tools to better capture semantic relationships between words. Using libraries like spaCy and NLTK, it preprocesses texts to reduce noise, expands terms with synonyms from WordNet, and computes similarity based on the semantic richness of the expanded vocabulary. This method aims to improve the accuracy of similarity assessments between two texts by considering a broader context than typical direct comparison methods.

The code contains four main functions:

  • get_synonyms(word): Retrieves synonyms for a given word from WordNet
  • preprocess_text(text): Converts all text to lowercase, lemmatizes gets the (roots of words), and filters stopwords (common words) and punctuation from text
  • expand_with_synonyms(words): Enhances a list of words by adding their synonyms
  • calculate_enhanced_similarity(text1, text2): Computes cosine similarity between preprocessed and synonym-expanded text vectors

The calculate_enhanced_similarity(text1, text2) function takes two texts and ultimately returns the cosine similarity score between two processed and synonym-expanded texts. This score quantifies the textual similarity based on their semantic content and enhanced word sets.

The code begins by downloading and importing the necessary libraries and then runs the four functions beginning with calculate_enhanced_similarity(text1, text2):

import spacy
import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet
from collections import Counter
import numpy as np
# Load spaCy model
nlp = spacy.load("en_core_web_sm")
…

Enhanced similarity takes this a bit further in terms of metrics. However, integrating RAG with generative AI presents multiple challenges.

No matter which metric we implement, we will face the following limitations:

  • Input versus Document Length: User queries are often short, while retrieved documents are longer and richer, complicating direct similarity evaluations.
  • Creative Retrieval: Systems may creatively select longer documents that meet user expectations but yield poor metric scores due to unexpected content alignment.
  • Need for Human Feedback: Often, human judgment is crucial to accurately assess the relevance and effectiveness of retrieved content, as automated metrics may not fully capture user satisfaction. We will explore this critical aspect of RAG in Chapter 5, Boosting RAG Performance with Expert Human Feedback.

We will always have to find the right balance between mathematical metrics and human feedback.

We are now ready to create an example with naïve RAG.

2. Naïve RAG

Naïve RAG with keyword search and matching can prove efficient with well-defined documents within an organization, such as legal and medical documents. These documents generally have clear titles or labels for images, for example. In this naïve RAG function, we will implement keyword search and matching. To achieve this, we will apply a straightforward retrieval method in the code:

  1. Split the query into individual keywords
  2. Split each record in the dataset into keywords
  3. Determine the length of the common matches
  4. Choose the record with the best score

The generation method will:

  • Augment the user input with the result of the retrieval query
  • Request the generation model, which is gpt-4o in this case
  • Display the response

Let’s write the keyword search and matching function.

Keyword search and matching

The best matching function first initializes the best scores:

def find_best_match_keyword_search(query, db_records):
    best_score = 0
    best_record = None

The query is then split into keywords. Each record is also split into words to find the common words, measure the length of common content, and find the best match:

# Split the query into individual keywords
    query_keywords = set(query.lower().split())
    # Iterate through each record in db_records
    for record in db_records:
        # Split the record into keywords
        record_keywords = set(record.lower().split())
        # Calculate the number of common keywords
        common_keywords = query_keywords.intersection(record_keywords)
        current_score = len(common_keywords)
        # Update the best score and record if the current score is higher
        if current_score > best_score:
            best_score = current_score
            best_record = record
    return best_score, best_record

We now call the function, format the response, and print it:

# Assuming 'query' and 'db_records' are defined in previous cells in your Colab notebook
best_keyword_score, best_matching_record = find_best_match_keyword_search(query, db_records)
print(f"Best Keyword Score: {best_keyword_score}")
#print(f"Best Matching Record: {best_matching_record}")
print_formatted_response(best_matching_record)

The main query of this notebook will be query = "define a rag store" to see if each RAG method produces an acceptable output.

The keyword search finds the best record in the list of sentences in the dataset:

Best Keyword Score: 3
Response:
---------------
A RAG vector store is a database or dataset that contains vectorized data points.
---------------

Let’s run the metrics.

Metrics

We created the similarity metrics in the 1. Retrieval metrics section of this chapter. We will first apply cosine similarity:

# Cosine Similarity
score = calculate_cosine_similarity(query, best_matching_record)
print(f"Best Cosine Similarity Score: {score:.3f}")

The output similarity is low, as explained in the 1. Retrieval metrics section of this chapter. The user input is short and the response is longer and complete:

Best Cosine Similarity Score: 0.126

Enhanced similarity will produce a better score:

# Enhanced Similarity
response = best_matching_record
print(query,": ", response)
similarity_score = calculate_enhanced_similarity(query, response)
print(f"Enhanced Similarity:, {similarity_score:.3f}")

The score produced is higher with enhanced functionality:

define a rag store :  A RAG vector store is a database or dataset that contains vectorized data points.
Enhanced Similarity:, 0.642

The output of the query will now augment the user input.

Augmented input

The augmented input is the concatenation of the user input and the best matching record of the dataset detected with the keyword search:

augmented_input=query+ ": "+ best_matching_record

The augmented input is displayed if necessary for maintenance reasons:

print_formatted_response(augmented_input)

The output then shows that the augmented input is ready:

Response:
---------------
define a rag store: A RAG vector store is a database or dataset that contains
vectorized data points.
---------------

The input is now ready for the generation process.

Generation

We are now ready to call GPT-4o and display the formatted response:

llm_response = call_llm_with_full_text(augmented_input)
print_formatted_response(llm_response)

The following excerpt of the response shows that GPT-4o understands the input and provides an interesting, pertinent response:

Response:
---------------
Certainly! Let's break down and elaborate on the provided content:  ### Define a
RAG Store:  A **RAG (Retrieval-Augmented Generation) vector store** is a
specialized type of database or dataset that is designed to store and manage
vectorized data points…

Naïve RAG can be sufficient in many situations. However, if the volume of documents becomes too large or the content becomes more complex, then advanced RAG configurations will provide better results. Let’s now explore advanced RAG.

3. Advanced RAG

As datasets grow larger, keyword search methods might prove too long to run. For instance, if we have hundreds of documents and each document contains hundreds of sentences, it will become challenging to use keyword search only. Using an index will reduce the computational load to just a fraction of the total data.

In this section, we will go beyond searching text with keywords. We will see how RAG transforms text data into numerical representations, enhancing search efficiency and processing speed. Unlike traditional methods that directly parse text, RAG first converts documents and user queries into vectors, numerical forms that speed up calculations. In simple terms, a vector is a list of numbers representing various features of text. Simple vectors might count word occurrences (term frequency), while more complex vectors, known as embeddings, capture deeper linguistic patterns.

In this section, we will implement vector search and index-based search:

  • Vector Search: We will convert each sentence in our dataset into a numerical vector. By calculating the cosine similarity between the query vector (the user query) and these document vectors, we can quickly find the most relevant documents.
  • Index-Based Search: In this case, all sentences are converted into vectors using TF-IDF (Term Frequency-Inverse Document Frequency), a statistical measure used to evaluate how important a word is to a document in a collection. These vectors act as indices in a matrix, allowing quick similarity comparisons without parsing each document fully.

Let’s start with vector search and see these concepts in action.

3.1.Vector search

Vector search converts the user query and the documents into numerical values as vectors, enabling mathematical calculations that retrieve relevant data faster when dealing with large volumes of data.

The program runs through each record of the dataset to find the best matching document by computing the cosine similarity of the query vector and each record in the dataset:

def find_best_match(text_input, records):
    best_score = 0
    best_record = None
    for record in records:
        current_score = calculate_cosine_similarity(text_input, record)
        if current_score > best_score:
            best_score = current_score
            best_record = record
    return best_score, best_record

The code then calls the vector search function and displays the best record found:

best_similarity_score, best_matching_record = find_best_match(query, db_records)
print_formatted_response(best_matching_record)

The output is satisfactory:

Response:
---------------
A RAG vector store is a database or dataset that contains vectorized data
points.

The response is the best one found, like with naïve RAG. This shows that there is no silver bullet. Each RAG technique has its merits. The metrics will confirm this observation.

Metrics

The metrics are the same for both similarity methods as for naïve RAG because the same document was retrieved:

print(f"Best Cosine Similarity Score: {best_similarity_score:.3f}")

The output is:

Best Cosine Similarity Score: 0.126

And with enhanced similarity, we obtain the same output as for naïve RAG:

# Enhanced Similarity
response = best_matching_record
print(query,": ", response)
similarity_score = calculate_enhanced_similarity(query, best_matching_record)
print(f"Enhanced Similarity:, {similarity_score:.3f}")

The output confirms the trend:

define a rag store :  A RAG vector store is a database or dataset that contains vectorized data points.
Enhanced Similarity:, 0.642

So why use vector search if it produces the same outputs as naïve RAG? Well, in a small dataset, everything looks easy. But when we’re dealing with datasets of millions of complex documents, keyword search will not capture subtleties that vectors can. Let’s now augment the user query with this information retrieved.

Augmented input

We add the information retrieved to the user query with no other aid and display the result:

# Call the function and print the result
augmented_input=query+": "+best_matching_record
print_formatted_response(augmented_input)

We only added a space between the user query and the retrieved information; nothing else. The output is satisfactory:

Response:
---------------
define a rag store: A RAG vector store is a database or dataset that contains
vectorized data points.
---------------

Let’s now see how the generative AI model reacts to this augmented input.

Generation

We now call GPT-4o with the augmented input and display the formatted output:

# Call the function and print the result
augmented_input=query+best_matching_record
llm_response = call_llm_with_full_text(augmented_input)
print_formatted_response(llm_response)

The response makes sense, as shown in the following excerpt:

Response:
---------------
Certainly! Let's break down and elaborate on the provided content:  ### Define a RAG Store:  A **RAG (Retrieval-Augmented Generation) vector store** is a specialized type of database or dataset that is designed to store and manage vectorized data points…

While vector search significantly speeds up the process of finding relevant documents by sequentially going through each record, its efficiency can decrease as the dataset size increases. To address this scalability issue, indexed search offers a more advanced solution. Let’s now see how index-based search can accelerate document retrieval.

3.2. Index-based search

Index-based search compares the vector of a user query not with the direct vector of a document’s content but with an indexed vector that represents this content.

The program first imports the class and function we need:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

TfidfVectorizer imports the class that converts text documents into a matrix of TF-IDF features. TF-IDF will quantify word relevance in documents using frequency across the document. The function finds the best matches using the cosine similarity function to calculate the similarity between the query and the weighted vectors of the matrix:

def find_best_match(query, vectorizer, tfidf_matrix):
    query_tfidf = vectorizer.transform([query])
    similarities = cosine_similarity(query_tfidf, tfidf_matrix)
    best_index = similarities.argmax()  # Get the index of the highest similarity score
    best_score = similarities[0, best_index]
    return best_score, best_index

The function’s main tasks are:

  • Transform Query: Converts the input query into TF-IDF vector format using the provided vectorizer
  • Calculate Similarities: Computes the cosine similarity between the query vector and all vectors in the tfidf_matrix
  • Identify Best Match: Finds the index (best_index) of the highest similarity score in the results
  • Retrieve Best Score: Extracts the highest cosine similarity score (best_score)

The output is the best similarity score found and the best index.

The following code first calls the dataset vectorizer and then searches for the most similar record through its index:

vectorizer, tfidf_matrix = setup_vectorizer(db_records)
best_similarity_score, best_index = find_best_match(query, vectorizer, tfidf_matrix)
best_matching_record = db_records[best_index]

Finally, the results are displayed:

print_formatted_response(best_matching_record)

The system finds the best similar document to the user’s input query:

Response:
---------------
A RAG vector store is a database or dataset that contains vectorized data
points.
---------------

We can see that the fuzzy user query produced a reliable output at the retrieval level before running GPT-4o.

The metrics that follow in the program are the same as for naïve and advanced RAG with vector search. This is normal because the document found is the closest to the user’s input query. We will be introducing more complex documents for RAG starting in Chapter 2, RAG Embedding Vector Stores with Deep Lake and OpenAI. For now, let’s have a look at the features that influence how the words are represented in vectors.

Feature extraction

Before augmenting the input with this document, run the following cell, which calls the setup_vectorizer(records) function again but displays the matrix so that you can see its format. This is shown in the following excerpt for the words “accurate” and “additional” in one of the sentences:

A black and white image of a number

Description automatically generated with medium confidence

Figure 1.4: Format of the matrix

Let’s now augment the input.

Augmented input

We will simply add the query to the best matching record in a minimal way to see how GPT-4o will react and display the output:

augmented_input=query+": "+best_matching_record
print_formatted_response(augmented_input)

The output is close to or the same as with vector search, but the retrieval method is faster:

Response:
---------------
define a rag store: A RAG vector store is a database or dataset that contains
vectorized data points.
---------------

We will now plug this augmented input into the generative AI model.

Generation

We now call GPT-4o with the augmented input and display the output:

# Call the function and print the result
llm_response = call_llm_with_full_text(augmented_input)
print_formatted_response(llm_response)

The output makes sense for the user who entered the initial fuzzy query:

Response:
---------------
Certainly! Let's break down and elaborate on the given content:  ---  **Define a RAG store:**  A **RAG vector store** is a **database** or **dataset** that contains **vectorized data points**.  ---  ### Detailed Explanation:  1. **RAG Store**:    - **RAG** stands for **Retrieval-Augmented Generation**. It is a technique used in natural language processing (NLP) where a model retrieves relevant information from a database or dataset to augment its generation capabilities…

This approach worked well in a closed environment within an organization in a specific domain. In an open environment, the user might have to elaborate before submitting a request.

In this section, we saw that a TF-IDF matrix pre-computes document vectors, enabling faster, simultaneous comparisons without repeated vector transformations. We have seen how vector and index-based search can improve retrieval. However, in one project, we may need to apply naïve and advanced RAG depending on the documents we need to retrieve. Let’s now see how modular RAG can improve our system.

4. Modular RAG

Should we use keyword search, vector search, or index-based search when implementing RAG? Each approach has its merits. The choice will depend on several factors:

  • Keyword search suits simple retrieval
  • Vector search is ideal for semantic-rich documents
  • Index-based search offers speed with large data.

However, all three methods can perfectly fit together in a project. In one scenario, for example, a keyword search can help find clearly defined document labels, such as the titles of PDF files and labeled images, before they are processed. Then, indexed search will group the documents into indexed subsets. Finally, the retrieval program can search the indexed dataset, find a subset, and only use vector search to go through a limited number of documents to find the most relevant one.

In this section, we will create a RetrievalComponent class that can be called at each step of a project to perform the task required. The code sums up the three methods we have built in this chapter and that we can sum for the RetrievalComponent through its main members.

The following code initializes the class with search method choice and prepares a vectorizer if needed. self refers to the current instance of the class to access its variables, methods, and functions:

def __init__(self, method='vector'):
        self.method = method
        if self.method == 'vector' or self.method == 'indexed':
            self.vectorizer = TfidfVectorizer()
            self.tfidf_matrix = None

In this case, the vector search is activated.

The fit method builds a TF-IDF matrix from records, and is applicable for vector or indexed search methods:

    def fit(self, records):
        if self.method == 'vector' or self.method == 'indexed':
            self.tfidf_matrix = self.vectorizer.fit_transform(records)

The retrieve method directs the query to the appropriate search method:

    def retrieve(self, query):
        if self.method == 'keyword':
            return self.keyword_search(query)
        elif self.method == 'vector':
            return self.vector_search(query)
        elif self.method == 'indexed':
            return self.indexed_search(query)

The keyword search method finds the best match by counting common keywords between queries and documents:

    def keyword_search(self, query):
        best_score = 0
        best_record = None
        query_keywords = set(query.lower().split())
        for index, doc in enumerate(self.documents):
            doc_keywords = set(doc.lower().split())
            common_keywords = query_keywords.intersection(doc_keywords)
            score = len(common_keywords)
            if score > best_score:
                best_score = score
                best_record = self.documents[index]
        return best_record

The vector search method computes similarities between query TF-IDF and document matrix and returns the best match:

    def vector_search(self, query):
        query_tfidf = self.vectorizer.transform([query])
        similarities = cosine_similarity(query_tfidf, self.tfidf_matrix)
        best_index = similarities.argmax()
        return db_records[best_index]

The indexed search method uses a precomputed TF-IDF matrix for fast retrieval of the best-matching document:

    def indexed_search(self, query):
        # Assuming the tfidf_matrix is precomputed and stored
        query_tfidf = self.vectorizer.transform([query])
        similarities = cosine_similarity(query_tfidf, self.tfidf_matrix)
        best_index = similarities.argmax()
        return db_records[best_index]

We can now activate modular RAG strategies.

Modular RAG strategies

We can call the retrieval component for any RAG configuration we wish when needed:

# Usage example
retrieval = RetrievalComponent(method='vector')  # Choose from 'keyword', 'vector', 'indexed'
retrieval.fit(db_records)
best_matching_record = retrieval.retrieve(query)
print_formatted_response(best_matching_record)

In this case, the vector search method was activated.

The following cells select the best record, as in the 3.1. Vector search section, augment the input, call the generative model, and display the output as shown in the following excerpt:

Response:
---------------
Certainly! Let's break down and elaborate on the content provided:  ---
**Define a RAG store:**  A **RAG (Retrieval-Augmented Generation) store** is a specialized type of data storage system designed to support the retrieval and generation of information...

We have built a program that demonstrated how different search methodologies—keyword, vector, and index-based—can be effectively integrated into a RAG system. Each method has its unique strengths and addresses specific needs within a data retrieval context. The choice of method depends on the dataset size, query type, and performance requirements, which we will explore in the following chapters.

It’s now time to summarize our explorations in this chapter and move to the next level!

Summary

RAG for generative AI relies on two main components: a retriever and a generator. The retriever processes data and defines a search method, such as fetching labeled documents with keywords—the generator’s input, an LLM, benefits from augmented information when producing sequences. We went through the three main configurations of the RAG framework: naïve RAG, which accesses datasets through keywords and other entry-level search methods; advanced RAG, which introduces embeddings and indexes to improve the search methods; and modular RAG, which can combine naïve and advanced RAG as well as other ML methods.

The RAG framework relies on datasets that can contain dynamic data. A generative AI model relies on parametric data through its weights. These two approaches are not mutually exclusive. If the RAG datasets become too cumbersome, fine-tuning can prove useful. When fine-tuned models cannot respond to everyday information, RAG can come in handy. RAG frameworks also rely heavily on the ecosystem that provides the critical functionality to make the systems work. We went through the main components of the RAG ecosystem, from the retriever to the generator, for which the trainer is necessary, and the evaluator. Finally, we built an entry-level naïve, advanced, and modular RAG program in Python, leveraging keyword matching, vector search, and index-based retrieval, augmenting the input of GPT-4o.

Our next step in Chapter 2, RAG Embedding Vector Stores with Deep Lake and OpenAI, is to embed data in vectors. We will store the vectors in vector stores to enhance the speed and precision of the retrieval functions of a RAG ecosystem.

Questions

Answer the following questions with Yes or No:

  1. Is RAG designed to improve the accuracy of generative AI models?
  2. Does a naïve RAG configuration rely on complex data embedding?
  3. Is fine-tuning always a better option than using RAG?
  4. Does RAG retrieve data from external sources in real time to enhance responses?
  5. Can RAG be applied only to text-based data?
  6. Is the retrieval process in RAG triggered by a user or automated input?
  7. Are cosine similarity and TF-IDF both metrics used in advanced RAG configurations?
  8. Does the RAG ecosystem include only data collection and generation components?
  9. Can advanced RAG configurations process multimodal data such as images and audio?
  10. Is human feedback irrelevant in evaluating RAG systems?

References

Further reading

Join our community on Discord

Join our community’s Discord space for discussions with the author and other readers:

https://www.packt.link/rag

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Implement RAG’s traceable outputs, linking each response to its source document to build reliable multimodal conversational agents
  • Deliver accurate generative AI models in pipelines integrating RAG, real-time human feedback improvements, and knowledge graphs
  • Balance cost and performance between dynamic retrieval datasets and fine-tuning static data

Description

RAG-Driven Generative AI provides a roadmap for building effective LLM, computer vision, and generative AI systems that balance performance and costs. This book offers a detailed exploration of RAG and how to design, manage, and control multimodal AI pipelines. By connecting outputs to traceable source documents, RAG improves output accuracy and contextual relevance, offering a dynamic approach to managing large volumes of information. This AI book shows you how to build a RAG framework, providing practical knowledge on vector stores, chunking, indexing, and ranking. You’ll discover techniques to optimize your project’s performance and better understand your data, including using adaptive RAG and human feedback to refine retrieval accuracy, balancing RAG with fine-tuning, implementing dynamic RAG to enhance real-time decision-making, and visualizing complex data with knowledge graphs. You’ll be exposed to a hands-on blend of frameworks like LlamaIndex and Deep Lake, vector databases such as Pinecone and Chroma, and models from Hugging Face and OpenAI. By the end of this book, you will have acquired the skills to implement intelligent solutions, keeping you competitive in fields from production to customer service across any project.

Who is this book for?

This book is ideal for data scientists, AI engineers, machine learning engineers, and MLOps engineers. If you are a solutions architect, software developer, product manager, or project manager looking to enhance the decision-making process of building RAG applications, then you’ll find this book useful.

What you will learn

  • Scale RAG pipelines to handle large datasets efficiently
  • Employ techniques that minimize hallucinations and ensure accurate responses
  • Implement indexing techniques to improve AI accuracy with traceable and transparent outputs
  • Customize and scale RAG-driven generative AI systems across domains
  • Find out how to use Deep Lake and Pinecone for efficient and fast data retrieval
  • Control and build robust generative AI systems grounded in real-world data
  • Combine text and image data for richer, more informative AI responses

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Sep 30, 2024
Length: 334 pages
Edition : 1st
Language : English
ISBN-13 : 9781836200918
Vendor :
Facebook , Docker , OpenAI
Category :
Languages :
Concepts :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Sep 30, 2024
Length: 334 pages
Edition : 1st
Language : English
ISBN-13 : 9781836200918
Vendor :
Facebook , Docker , OpenAI
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 138.97
Unlocking Data with Generative AI and RAG
$44.99
Building LLM Powered  Applications
$49.99
RAG-Driven Generative AI
$43.99
Total $ 138.97 Stars icon

Table of Contents

12 Chapters
Why Retrieval Augmented Generation? Chevron down icon Chevron up icon
RAG Embedding Vector Stores with Deep Lake and OpenAI Chevron down icon Chevron up icon
Building Index-Based RAG with LlamaIndex, Deep Lake, and OpenAI Chevron down icon Chevron up icon
Multimodal Modular RAG for Drone Technology Chevron down icon Chevron up icon
Boosting RAG Performance with Expert Human Feedback Chevron down icon Chevron up icon
Scaling RAG Bank Customer Data with Pinecone Chevron down icon Chevron up icon
Building Scalable Knowledge-Graph-Based RAG with Wikipedia API and LlamaIndex Chevron down icon Chevron up icon
Dynamic RAG with Chroma and Hugging Face Llama Chevron down icon Chevron up icon
Empowering AI Models: Fine-Tuning RAG Data and Human Feedback Chevron down icon Chevron up icon
RAG for Video Stock Production with Pinecone and OpenAI Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.6
(14 Ratings)
5 star 78.6%
4 star 14.3%
3 star 0%
2 star 0%
1 star 7.1%
Filter icon Filter
Top Reviews

Filter reviews by




Melvin Ng Nov 21, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Nice book
Feefo Verified review Feefo
dr t Oct 04, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Rothman has once again delivered something exceptional with RAG-Driven Generative AI. As expected from Rothman, this book shines in its ability to make complex topics accessible and practical, making it a standout in the growing literature on RAG systems. If you're looking for one of the best resources on RAG, packed with Python code and real-world applications, this book will not let you down.For readers keen to get hands-on, the book does not disappoint. Rothman provides a wealth of Python code throughout, with step-by-step examples that make it easy to follow along and implement RAG-driven solutions. Each chapter concludes with questions to test your understanding, reinforcing key concepts and ensuring that you grasp the material before moving on. For beginners and experienced practitioners alike, this interactive approach adds immense value to the learning experience.Chapter 4, Building a RAG Pipeline, is particularly valuable, offering clear instructions on how to build an end-to-end RAG system. The chapter walks readers through the process of designing a robust RAG pipeline. In addition, Rothman explores cutting-edge tools such as LlamaIndex, Deep Lake, and OpenAI to illustrate how to leverage them effectively for RAG-based projects. The comprehensive nature of this chapter makes it an essential guide for anyone looking to develop RAG systems from scratch or optimise existing ones.However, the most enlightening part of the book for this reader was Chapter 5: Boosting RAG Performance with Expert Human Feedback. This chapter delves into the creation of an adaptive RAG system that can evolve based on user feedback. Rothman guides readers through building a hybrid adaptive RAG program in Python on Google Colab. This hands-on project not only gives readers a solid grasp of adaptive RAG processes but also demonstrates how to adjust a system when predefined models fail to meet user expectations. Rothman goes further to show how human feedback, gathered through user rankings, can be integrated to fine-tune RAG systems, ensuring that the AI continues to meet users' needs. The chapter concludes with the implementation of an automated ranking system to enhance the generative model's performance, making it highly applicable to real-world business settings.In conclusion, RAG-Driven Generative AI is a must-read for anyone involved with LLMs. Rothman has delivered an insightful, practical, and highly recommended resource for anyone looking to explore RAG systems. Highly recommended.
Amazon Verified review Amazon
Jorge Deflon Oct 10, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have been reading this new book on generative artificial intelligence complemented with RAG (Retrieval-Augmented Generation) and I find it quite useful and interesting.LLM models are advanced artificial intelligence systems designed to process and generate human language.They are trained with enormous amounts of text from several sources, to understand and respond coherently to a wide variety of questions and requests, but this also carries the disadvantage that they may not have the most relevant information for an organization, since it was not available when the model was trained, either due to time or confidentiality issues.Retrieval enhanced generation (RAG) is the process of optimizing the output so that it references an personalized knowledge base before generating a response.This allows the GAI to produce more useful and reliable responses to the organization's users.This book is one of the most complete and up-to-date references on how to use RAG techniques to improve the responses that GAI tools provide to organizational users.The book contains many examples on how use the different types of RAG, including the necessary code to incorporate it into your projects quickly and efficiently.Highly recommended for all practitioners, developers, and students of the topic of generative artificial intelligence.
Amazon Verified review Amazon
Subhayan Roy Oct 11, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
RAG being in the forefront of Gen AI LLM models is a highly sought after skill or knowledge to have.This book covers the theory part of RAG, vectorization, Vector databases.Yet what I found most fascinating was the code snippets, applications that you can directly use in your GenAI application with a bit of modification.Just one advice be clear on Transformer and language models before learning RAG.For this I would recommend Denis's other book Transformers for NLP.
Amazon Verified review Amazon
Siddhartha Vemuganti Oct 15, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Denis Rothman's "RAG-Driven Gen AI" offers a comprehensive exploration of Retrieval-Augmented Generation systems, addressing a critical need in the rapidly evolving field of artificial intelligence. This book stands out for its practical approach, bridging the gap between theoretical concepts and real-world applications.Rothman's writing style is accessible yet thorough, guiding readers from foundational principles to advanced implementations of RAG systems. The book's structure feels well-considered, allowing readers to build their understanding progressively. While it assumes some prior knowledge of machine learning and Python, making it less suitable for complete beginners, it offers valuable insights for software engineers, developers, and data scientists looking to expand their AI toolkit.One of the book's strengths lies in its diverse range of practical examples. By covering applications from drone technology to customer retention, Rothman effectively demonstrates the versatility of RAG systems. The chapter on multimodal RAG for drone technology is particularly intriguing, opening up new possibilities that many readers might not have previously considered.A standout feature is the book's attention to often-overlooked aspects of AI development, such as software versioning and package management. Rothman's detailed guidance on version control and dependency management addresses real challenges faced by practitioners, potentially saving readers significant time and frustration.The hands-on approach, complete with projects and source code, transforms the book from a mere reference into a practical learning tool. Rothman doesn't shy away from discussing performance optimization and cost management – crucial considerations for implementing AI solutions in production environments.However, readers should be aware that the rapid pace of AI advancement may necessitate supplementing this book with current research and developments. Some cutting-edge concepts discussed may evolve quickly."RAG-Driven Gen AI" serves as a valuable resource for those looking to understand and implement RAG systems. While it may not be the only book you'll need on the subject, it provides a solid foundation and practical insights that many readers will find useful. Rothman's work effectively captures the current state of RAG technology while offering guidance that should remain relevant as the field continues to evolve.For professionals aiming to leverage the power of RAG systems or enhance their AI capabilities, this book is a worthwhile addition to their technical library. It offers a balanced mix of theoretical understanding and practical application, making it a useful companion for those navigating the complex landscape of modern AI development.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.