This article is an excerpt from the book, "RAG-Driven Generative AI", by Denis Rothman. Explore the transformative potential of RAG-driven LLMs, computer vision, and generative AI with this comprehensive guide, from basics to building a complex RAG pipeline.
On a bright Monday morning, Dakota sits down to get to work and is called by the CEO of their software company, who looks quite worried. An important fire department needs a conversational AI agent to train hundreds of rookie firefighters nationwide on drone technology. The CEO looks dismayed because the data provided is spread over many websites around the country. Worse, the management of the fire department is coming over at 2 PM to see a demonstration to decide whether to work with Dakata’s company or not. Dakota is smiling. The CEO is puzzled.
Dakota explains that the AI team can put a prototype together in a few hours and be more than ready by 2 PM and get to work. The strategy is to divide the AI team into three sub-teams that will work in parallel on three pipelines based on the reference Deep Lake, LlamaIndex and OpenAI RAG program* they had tested and approved a few weeks back.
The team gets to work at around 9:30 AM after devising their strategy. The Pipeline 1 team begins by fetching and cleaning a batch of documents. They run Python functions to remove punctuation except for periods and noisy references within the content. Leveraging the automated functions they already have through the educational program, the result is satisfactory.
By 10 AM, the Pipeline 2 team sees the first batch of documents appear on their file server. They run the code they got from the RAG program* to create a Deep Lake vector store and seamlessly populate it with an OpenAI embedding model, as shown in the following excerpt:
from llama_index.core import StorageContext
vector_store_path = "hub://denis76/drone_v2"
dataset_path = "hub://denis76/drone_v2"
# overwrite=True will overwrite dataset, False will append it
vector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=True)
Note that the path of the dataset points to the online Deep Lake vector store. The fact that the vector store is serverless is a huge advantage because there is no need to manage its size, storage process and just begin to populate it in a few seconds!
Also, to process the first batch of documents, overwrite=True, will force the system to write the initial data. Then, starting the second batch, the Pipeline 2 team can run overwrite=False
, to append the following documents.
Finally, LlamaIndex automatically creates a vector store index:
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Create an index over the documents
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
By 10:30 AM, the Pipeline 3 team can visualize the vectorized(embedded) dataset in their Deep Lake vector store. They create a LlamaIndex query engine on the dataset:
from llama_index.core import VectorStoreIndex
vector_store_index = VectorStoreIndex.from_documents(documents)
…
vector_query_engine = vector_store_index.as_query_engine(similarity_top_k=k, temperature=temp, num_output=mt)
Note that the OpenAI Large Language Model is seamlessly integrated into LlamaIndex with the following parameters:
k
, in this case, k=3
, specifies the number of documents to retrieve from the vector store. The retrieval is based on the similarity of embedded user inputs and embedded vectors within the dataset. temp
, in this case temp=0.1
, determines the randomness of the output. A low value such as 0.1 forces the similarity search to be precise. A higher value would allow for more diverse responses, which we do not want for this technological conversational agent. mt
, in this case, mt=1024
, determines the maximum number of tokens in the output. A cosine similarity function was added to make sure that the outputs matched the sample user inputs:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
def calculate_cosine_similarity_with_embeddings(text1, text2):
embeddings1 = model.encode(text1)
embeddings2 = model.encode(text2)
similarity = cosine_similarity([embeddings1], [embeddings2])
return similarity[0][0]
By 11:00 AM, all three pipeline teams are warmed up and ready to go full throttle! While the Pipeline 2 team was creating the vector store and populating it with the first batch of documents, the Pipeline 1 team prepared the next several batches.
At 11:00 AM, Dakota gave the green light to run all three pipelines simultaneously. Within a few minutes, the whole RAG-driven generative AI system was humming like a beehive!
By 1:00 PM, Dakota and the three pipeline teams were working on a PowerPoint slideshow with a copilot. Within a few minutes, it was automatically generated based on their scenario. At 1:30 PM, they had time to grab a quick lunch.
At 2:00 pm, the fire department management, Dakota’s team, and the CEO of their software company were in the meeting room.
Dakota’s team ran the PowerPoint slide show and began the demonstration with a simple input:
user_input="Explain how drones employ real-time image processing and machine learning algorithms to accurately detect events in various environmental conditions."
The response displayed was satisfactory:
Drones utilize real-time image processing and machine learning algorithms to accurately detect events in various environmental conditions by analyzing data captured by their sensors and cameras. This technology allows drones to process visual information quickly and efficiently, enabling them to identify specific objects, patterns, or changes in the environment in real-time. By employing these advanced algorithms, drones can effectively monitor and respond to different situations, such as wildfires, wildlife surveys, disaster relief efforts, and agricultural monitoring with precision and accuracy.
Dakota’s team then showed that the program could track and display the original documents the response was based on. At one point, the fire department’s top manager, Taylor, exclaimed, “Wow, this is impressive! It’s exactly what we were looking for! " Of course, Dakato’s CEO began discussing the number of users, cost, and timelines with Taylor.
In the meantime, Dakota and the rest of the fire department’s team went out to drink some coffee and get to know each other. Fire departments intervene at short notice efficiently for emergencies. So can expert-level AI teams!
In facing a high-stakes, time-sensitive challenge, Dakota and their AI team demonstrated the power and efficiency of RAG-driven generative AI. By leveraging a structured, multi-pipeline approach with tools like Deep Lake, LlamaIndex, and OpenAI’s advanced models, the team was able to integrate scattered data sources quickly and effectively, delivering a sophisticated, real-time conversational AI prototype tailored for firefighter training on drone technology. Their success showcases how expert planning, resourceful use of AI tools, and teamwork can transform a complex project into a streamlined solution that meets client needs. This case underscores the potential of generative AI to create responsive, practical solutions for critical industries, setting a new standard for rapid, high-quality AI deployment in real-world applications.
Denis Rothman graduated from Sorbonne University and Paris-Diderot University, and as a student, he wrote and registered a patent for one of the earliest word2vector embeddings and word piece tokenization solutions. He started a company focused on deploying AI and went on to author one of the first AI cognitive NLP chatbots, applied as a language teaching tool for Mo�t et Chandon (part of LVMH) and more. Denis rapidly became an expert in explainable AI, incorporating interpretable, acceptance-based explanation data and interfaces into solutions implemented for major corporate projects in the aerospace, apparel, and supply chain sectors. His core belief is that you only really know something once you have taught somebody how to do it.