You're reading from RAG-Driven Generative AI Build custom retrieval augmented generation pipelines with LlamaIndex, Deep Lake, and Pinecone

Product type Paperback

Published in Sep 2024

Publisher Packt

ISBN-13 9781836200918

Length 334 pages

Edition 1st Edition

Languages

Python

Tools

Docker

Concepts

GPT/LLMs

Author (1):

Denis Rothman

View More author details

Table of Contents (14) Chapters

Preface

1. Why Retrieval Augmented Generation?

2. RAG Embedding Vector Stores with Deep Lake and OpenAI FREE CHAPTER

3. Building Index-Based RAG with LlamaIndex, Deep Lake, and OpenAI

4. Multimodal Modular RAG for Drone Technology

5. Boosting RAG Performance with Expert Human Feedback

6. Scaling RAG Bank Customer Data with Pinecone

7. Building Scalable Knowledge-Graph-Based RAG with Wikipedia API and LlamaIndex

8. Dynamic RAG with Chroma and Hugging Face Llama

9. Empowering AI Models: Fine-Tuning RAG Data and Human Feedback

10. RAG for Video Stock Production with Pinecone and OpenAI

11. Other Books You May Enjoy

12. Index

Appendix

The environment of the video production ecosystem

The Chapter10 directory on GitHub contains the environment for all four notebooks in this chapter:

Videos_dataset_visualization.ipynb
Pipeline_1_The_Generator_and_the_Commentator.ipynb
Pipeline_2_The_Vector_Store_Administrator.ipynb
Pipeline_3_The_Video_Expert.ipynb

Each notebook includes an Installing the environment section, including a set of the following sections that are identical across all notebooks:

Importing modules and libraries
GitHub
Video download and display functions
OpenAI
Pinecone

This chapter aims to establish a common pre-production installation policy that will focus on the pipelines’ content once we dive into the RAG for video production code. This policy is limited to the scenario described in this chapter and will vary depending on the requirements of each real-life production environment.

The notebooks in this chapter only require a CPU, limited memory, and limited disk space. As such, the whole process can be streamlined indefinitely one video at a time in an optimized, scalable environment.

Let’s begin by importing the modules and libraries we need for our project.

Importing modules and libraries

The goal is to prepare a pre-production global environment common to all the notebooks. As such, the modules and libraries are present in all four notebooks regardless of whether they are used or not in a specific program:

from IPython.display import HTML # to display videos
import base64 # to encode videos as base64
from base64 import b64encode # to encode videos as base64
import os # to interact with the operating system
import subprocess # to run commands
import time # to measure execution time
import csv # to save comments
import uuid # to generate unique ids
import cv2 # to split videos
from PIL import Image # to display videos
import pandas as pd # to display comments
import numpy as np # to use Numerical Python
from io import BytesIO #to manage a binary stream of data in memory

Each of the four notebooks contains these modules and libraries, as shown in the following table:

Code	Comment
`from IPython.display import HTML`	To display videos
`import base64`	To encode videos as `base64`
`from base64 import b64encode`	To encode videos as `base64`
`import os`	To interact with the operating system
`import subprocess`	To run commands
`import time`	To measure execution time
`import csv`	To save comments
`import uuid`	To generate unique IDs
`import cv2`	To split videos (open source computer vision library)
`from PIL import Image`	To display videos
`import pandas as pd`	To display comments
`import numpy as np`	To use Numerical Python
`from io import BytesIO`	For a binary stream of data in memory

Table 10.1: Modules and libraries for our video production system

The Code column contains the module or library name, while the Comment column provides a brief description of their usage. Let’s move on to GitHub commands.

GitHub

download(directory, filename) is present in all four notebooks. The main function of download(directory, filename) is to download the files we need from the book’s GitHub repository:

def download(directory, filename):
    # The base URL of the image files in the GitHub repository
    base_url = 'https://raw.githubusercontent.com/Denis2054/RAG-Driven-Generative-AI/main/'
    # Complete URL for the file
    file_url = f"{base_url}{directory}/{filename}"
    # Use curl to download the file
    try:
        # Prepare the curl command
        curl_command = f'curl -o {filename} {file_url}'
        # Execute the curl command
        subprocess.run(curl_command, check=True, shell=True)
        print(f"Downloaded '{filename}' successfully.")
    except subprocess.CalledProcessError:
        print(f"Failed to download '{filename}'. Check the URL, your internet connection, and if the token is correct and has appropriate permissions.")

The preceding function takes two arguments:

directory, which is the GitHub directory that the file we want to download is located in
filename, which is the name of the file we want to download

OpenAI

The OpenAI package is installed in all three pipeline notebooks but not in Video_dataset_visualization.ipynb, which doesn’t require an LLM. You can retrieve the API key from a file or enter it manually (but it will be visible):

#You can retrieve your API key from a file(1)
# or enter it manually(2)
#Comment this cell if you want to enter your key manually.
#(1)Retrieve the API Key from a file
#Store you key in a file and read it(you can type it directly in the notebook but it will be visible for somebody next to you)
from google.colab import drive
drive.mount('/content/drive')
f = open("drive/MyDrive/files/api_key.txt", "r")
API_KEY=f.readline()o
Nf.close()

You will need to sign up at www.openai.com before running the code and obtain an API key. The program installs the openai package:

try:
  import openai
except:
  #!pip install openai==1.45.0
  import openai

Finally, we set an environment variable for the API key:

#(2) Enter your manually by
# replacing API_KEY by your key.
#The OpenAI Key
os.environ['OPENAI_API_KEY'] =API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")

Pinecone

The Pinecone section is only present in Pipeline_2_The_Vector_Store_Administrator.ipynb and Pipeline_3_The_Video_Expert.ipynb when the Pinecone vector store is required. The following command installs Pinecone, and then Pinecone is imported:

!pip install pinecone-client==4.1.1
import pinecone

The program then retrieves the key from a file (or you can enter it manually):

f = open("drive/MyDrive/files/pinecone.txt", "r")
PINECONE_API_KEY=f.readline()
f.close()

In production, you can set an environment variable or implement the method that best fits your project so that the API key is never visible.

The Evaluator section of Pipeline_3_The_Video_Expert.ipynb contains its own requirements and installations.

With that, we have defined the environment for all four notebooks, which contain the same sub-sections we just described in their respective Installing the environment sections. We can now fully focus on the processes involved in the video production programs. We will begin with the Generator and Commentator.

The rest of the chapter is locked

You're reading from RAG-Driven Generative AI Build custom retrieval augmented generation pipelines with LlamaIndex, Deep Lake, and Pinecone

Table of Contents (14) Chapters

The environment of the video production ecosystem

Importing modules and libraries

GitHub

OpenAI

Pinecone

Authors (1)

Personalised recommendations for you

You're reading from RAG-Driven Generative AI Build custom retrieval augmented generation pipelines with LlamaIndex, Deep Lake, and Pinecone

Table of Contents (14) Chapters

The environment of the video production ecosystem

Importing modules and libraries

GitHub

OpenAI

Pinecone

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you