The environment of the video production ecosystem
The Chapter10
directory on GitHub contains the environment for all four notebooks in this chapter:
Videos_dataset_visualization.ipynb
Pipeline_1_The_Generator_and_the_Commentator.ipynb
Pipeline_2_The_Vector_Store_Administrator.ipynb
Pipeline_3_The_Video_Expert.ipynb
Each notebook includes an Installing the environment section, including a set of the following sections that are identical across all notebooks:
- Importing modules and libraries
- GitHub
- Video download and display functions
- OpenAI
- Pinecone
This chapter aims to establish a common pre-production installation policy that will focus on the pipelines’ content once we dive into the RAG for video production code. This policy is limited to the scenario described in this chapter and will vary depending on the requirements of each real-life production environment.
The notebooks in this chapter only require a CPU, limited memory, and limited disk space. As such, the whole process can be streamlined indefinitely one video at a time in an optimized, scalable environment.
Let’s begin by importing the modules and libraries we need for our project.
Importing modules and libraries
The goal is to prepare a pre-production global environment common to all the notebooks. As such, the modules and libraries are present in all four notebooks regardless of whether they are used or not in a specific program:
from IPython.display import HTML # to display videos
import base64 # to encode videos as base64
from base64 import b64encode # to encode videos as base64
import os # to interact with the operating system
import subprocess # to run commands
import time # to measure execution time
import csv # to save comments
import uuid # to generate unique ids
import cv2 # to split videos
from PIL import Image # to display videos
import pandas as pd # to display comments
import numpy as np # to use Numerical Python
from io import BytesIO #to manage a binary stream of data in memory
Each of the four notebooks contains these modules and libraries, as shown in the following table:
Code |
Comment |
|
To display videos |
|
To encode videos as |
|
To encode videos as |
|
To interact with the operating system |
|
To run commands |
|
To measure execution time |
|
To save comments |
|
To generate unique IDs |
|
To split videos (open source computer vision library) |
|
To display videos |
|
To display comments |
|
To use Numerical Python |
|
For a binary stream of data in memory |
Table 10.1: Modules and libraries for our video production system
The Code
column contains the module or library name, while the Comment
column provides a brief description of their usage. Let’s move on to GitHub commands.
GitHub
download(directory, filename)
is present in all four notebooks. The main function of download(directory, filename)
is to download the files we need from the book’s GitHub repository:
def download(directory, filename):
# The base URL of the image files in the GitHub repository
base_url = 'https://raw.githubusercontent.com/Denis2054/RAG-Driven-Generative-AI/main/'
# Complete URL for the file
file_url = f"{base_url}{directory}/{filename}"
# Use curl to download the file
try:
# Prepare the curl command
curl_command = f'curl -o {filename} {file_url}'
# Execute the curl command
subprocess.run(curl_command, check=True, shell=True)
print(f"Downloaded '{filename}' successfully.")
except subprocess.CalledProcessError:
print(f"Failed to download '{filename}'. Check the URL, your internet connection, and if the token is correct and has appropriate permissions.")
The preceding function takes two arguments:
directory
, which is the GitHub directory that the file we want to download is located infilename
, which is the name of the file we want to download
OpenAI
The OpenAI package is installed in all three pipeline notebooks but not in Video_dataset_visualization.ipynb
, which doesn’t require an LLM. You can retrieve the API key from a file or enter it manually (but it will be visible):
#You can retrieve your API key from a file(1)
# or enter it manually(2)
#Comment this cell if you want to enter your key manually.
#(1)Retrieve the API Key from a file
#Store you key in a file and read it(you can type it directly in the notebook but it will be visible for somebody next to you)
from google.colab import drive
drive.mount('/content/drive')
f = open("drive/MyDrive/files/api_key.txt", "r")
API_KEY=f.readline()o
Nf.close()
You will need to sign up at www.openai.com
before running the code and obtain an API key. The program installs the openai
package:
try:
import openai
except:
#!pip install openai==1.45.0
import openai
Finally, we set an environment variable for the API key:
#(2) Enter your manually by
# replacing API_KEY by your key.
#The OpenAI Key
os.environ['OPENAI_API_KEY'] =API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")
Pinecone
The Pinecone section is only present in Pipeline_2_The_Vector_Store_Administrator.ipynb
and Pipeline_3_The_Video_Expert.ipynb
when the Pinecone vector store is required. The following command installs Pinecone, and then Pinecone is imported:
!pip install pinecone-client==4.1.1
import pinecone
The program then retrieves the key from a file (or you can enter it manually):
f = open("drive/MyDrive/files/pinecone.txt", "r")
PINECONE_API_KEY=f.readline()
f.close()
In production, you can set an environment variable or implement the method that best fits your project so that the API key is never visible.
The Evaluator section of Pipeline_3_The_Video_Expert.ipynb
contains its own requirements and installations.
With that, we have defined the environment for all four notebooks, which contain the same sub-sections we just described in their respective Installing the environment sections. We can now fully focus on the processes involved in the video production programs. We will begin with the Generator and Commentator.