What this book covers
Chapter 1, What Is Retrieval-Augmented Generation (RAG), introduces RAG, a technique that combines large language models (LLMs) with a company’s internal data to enhance the accuracy, relevance, and customization of AI-generated outputs. It discusses the advantages of RAG, such as improved performance and flexibility, as well as challenges such as data quality and complexity. The chapter also covers key RAG vocabulary, the importance of vectors, and real-world applications across various industries. It compares RAG to conventional generative AI and fine-tuning and outlines the architecture of RAG systems, which consists of indexing, retrieval, and generation stages.
Chapter 2, Code Lab – An Entire RAG Pipeline, provides a comprehensive code lab that walks through the implementation of a complete RAG pipeline using Python, LangChain, and Chroma. It covers installing necessary packages, setting up an OpenAI API key, loading and preprocessing documents from a web page, splitting them into manageable chunks, embedding them into vector representations, and storing them in a vector database. The chapter then demonstrates how to perform a vector similarity search, retrieve relevant documents based on a query, and generate a response using a pre-built prompt template and a language model within a LangChain chain. Finally, it shows how to submit a question to the RAG pipeline and receive an informative response.
Chapter 3, Practical Applications of RAG, explores various practical applications of RAG in business, including enhancing customer support chatbots, automated reporting, e-commerce product descriptions and recommendations, utilizing internal and external knowledge bases, innovation scouting, trend analysis, content personalization, and employee training. It highlights how RAG can transform unstructured data into actionable insights, improve decision-making, and deliver personalized experiences across different sectors. The chapter concludes with a code example demonstrating how to add sources to RAG-generated responses, emphasizing the importance of citing information for credibility and support in applications such as legal document analysis or scientific research.
Chapter 4, Components of a RAG System, provides a comprehensive overview of the key components that make up a RAG system. It covers the three main stages: indexing, retrieval, and generation, explaining how they work together to deliver enhanced responses to user queries. The chapter also highlights the importance of the user interface (UI) and evaluation components, with the UI serving as the primary point of interaction between the user and the system, and evaluation is crucial for assessing and improving the RAG system’s performance through metrics and user feedback. While not exhaustive, these components form the foundation of most successful RAG systems.
Chapter 5, Managing Security in RAG Applications, explores security aspects specific to RAG applications. It discusses how RAG can be leveraged as a security solution by limiting data access, ensuring reliable responses, and providing transparency of sources. However, it also acknowledges the challenges posed by the black-box nature of LLMs and the importance of protecting user data and privacy. It introduces the concept of red teaming to proactively identify and mitigate vulnerabilities, and through hands-on code labs, it demonstrates how to implement security best practices, such as securely storing API keys and defending against prompt injection attacks using a red team versus blue team exercise. The chapter emphasizes the importance of ongoing vigilance and adaptation in the face of ever-evolving security threats.
Chapter 6, Interfacing with RAG and Gradio, provides a practical guide on creating interactive applications using RAG and Gradio as the UI. It covers setting up the Gradio environment, integrating RAG models, and creating a user-friendly interface that allows users to interact with the RAG system like a typical web application. The chapter discusses the benefits of using Gradio, such as its open source nature, integration with popular machine learning frameworks, and collaboration features, as well as its integration with Hugging Face for hosting demos. The code lab demonstrates how to add a Gradio interface to a RAG application, creating a process question function that invokes the RAG pipeline and displays the relevance score, final answer, and sources returned by the system.
Chapter 7, The Key Role Vectors and Vector Stores Play in RAG, addresses the crucial role of vectors and vector stores in RAG systems. It explains what vectors are, how they’re created through various embedding techniques, and their importance in representing semantic information. The chapter covers different vectorization methods, from traditional TF-IDF to modern transformer-based models, such as BERT and OpenAI’s embeddings. It discusses factors to consider when selecting a vectorization option, including quality, cost, network availability, speed, and compatibility. The chapter also explores vector stores, their architecture, and popular options such as Chroma, Pinecone, and pgvector. It concludes by outlining key considerations for choosing the right vector store for a RAG system, emphasizing the need to align with specific project requirements and existing infrastructure.
Chapter 8, Similarity Searching with Vectors, focuses on the intricacies of similarity searching with vectors in RAG systems. It covers distance metrics, vector spaces, and similarity search algorithms, such as k-NN and ANN. The chapter explains indexing techniques to enhance search efficiency, including LSH, tree-based indexing, and HNSW. It discusses dense (semantic) and sparse (keyword) vector types, introducing hybrid search methods that combine both approaches. Through code labs, the chapter demonstrates a custom hybrid search implementation and the use of LangChain’s EnsembleRetriever
as a hybrid retriever. Finally, it provides an overview of various vector search tool options, such as pgvector, Elasticsearch, FAISS, and Chroma DB, highlighting their features and use cases to help select the most suitable solution for RAG projects.
Chapter 9, Evaluating RAG Quantitatively and with Visualizations, focuses on the crucial role of evaluation in building and maintaining RAG pipelines. It covers evaluation during development and after deployment, emphasizing its importance in optimizing performance and ensuring reliability. The chapter discusses standardized evaluation frameworks for various RAG components and the significance of ground truth data. A code lab demonstrates the integration of the ragas evaluation platform, generating synthetic ground truth and establishing comprehensive metrics to compare hybrid search with dense vector semantic-based search. The chapter explores retrieval, generation, and end-to-end evaluation stages, analyzing results and visualizing them. It also presents insights from a ragas co-founder and discusses additional evaluation techniques, such as BLEU, ROUGE, semantic similarity, and human evaluation, highlighting the importance of using multiple metrics for a holistic assessment of RAG systems.
Chapter 10, Key RAG Components in LangChain, explores key RAG components in LangChain: vector stores, retrievers, and LLMs. It discusses various vector store options, such as Chroma, Weaviate, and FAISS, highlighting their features and integration with LangChain. The chapter then covers different retriever types, including dense, sparse (BM25), ensemble, and specialized retrievers, such as WikipediaRetriever
and KNNRetriever
. It explains how these retrievers work and their applications in RAG systems. Finally, the chapter examines LLM integration in LangChain, focusing on OpenAI and Together AI models. It demonstrates how to switch between different LLMs and discusses extended capabilities, such as async, streaming, and batch support. The chapter provides code examples and practical insights for implementing these components in RAG applications using LangChain.
Chapter 11, Using LangChain to Get More from RAG, explores how to enhance RAG applications using LangChain components. It covers document loaders for processing various file formats, text splitters for dividing documents into manageable chunks, and output parsers for structuring LLM responses. The chapter provides code labs demonstrating the implementation of different document loaders (HTML, PDF, Word, JSON), text splitters (character, recursive character, semantic), and output parsers (string, JSON). It highlights the importance of choosing appropriate splitters based on document characteristics and semantic relationships. The chapter also shows how to integrate these components into a RAG pipeline, emphasizing the flexibility and power of LangChain in customizing RAG applications. Overall, it provides practical insights into optimizing RAG systems using LangChain’s diverse toolkit.
Chapter 12, Combining RAG with the Power of AI Agents and LangGraph, explores integrating AI agents and LangGraph into RAG applications. It explains that AI agents are LLMs with a decision-making loop, allowing for more complex task handling. The chapter introduces LangGraph, an extension of LCEL, which enables graph-based agent orchestration. Key concepts covered include agent state, tools, toolkits, and graph theory elements, such as nodes and edges. A code lab demonstrates building a LangGraph retrieval agent for RAG, showcasing how to create tools, define agent state, set up prompts, and establish cyclical graphs. The chapter emphasizes how this approach enhances RAG applications by allowing agents to reason, use tools, and break down complex tasks, ultimately providing more thorough responses to user queries.
Chapter 13, Using Prompt Engineering to Improve RAG Efforts, focuses on utilizing prompt engineering to enhance RAG systems. It covers key concepts, such as prompt parameters (temperature, top-p, seed), shot design, and the fundamentals of prompt design. The chapter emphasizes the importance of adapting prompts for different LLMs and iterating to improve results. Through code labs, it demonstrates creating custom prompt templates and applying various prompting techniques, such as iterating, summarizing, inferring, transforming, and expanding. These techniques are illustrated with practical examples, showing how to refine prompts for different purposes, such as tone adjustment, language translation, and data extraction. The chapter underscores the significance of strategic prompt formulation in improving information retrieval and text generation quality in RAG applications.
Chapter 14, Advanced RAG-Related Techniques for Improving Results, explores advanced techniques to enhance RAG applications. It covers query expansion, query decomposition, and multi-modal RAG (MM-RAG), demonstrating their implementation through code labs. Query expansion augments the original query to improve retrieval, while query decomposition breaks complex questions into smaller, manageable sub-questions. MM-RAG incorporates multiple data types, including images, to provide more comprehensive responses. The chapter also discusses other advanced techniques for improving the indexing, retrieval, and generation stages of RAG pipelines. These include deep chunking, embedding adapters, hypothetical document embeddings, and self-reflective RAG. The chapter emphasizes the rapid evolution of RAG techniques and encourages readers to explore and adapt these methods to their specific applications.