Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Unlocking Data with Generative AI and RAG

You're reading from   Unlocking Data with Generative AI and RAG Enhance generative AI systems by integrating internal data with large language models using RAG

Arrow left icon
Product type Paperback
Published in Sep 2024
Publisher Packt
ISBN-13 9781835887905
Length 346 pages
Edition 1st Edition
Concepts
Arrow right icon
Author (1):
Arrow left icon
Keith Bourne Keith Bourne
Author Profile Icon Keith Bourne
Keith Bourne
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Preface 1. Part 1 – Introduction to Retrieval-Augmented Generation (RAG) FREE CHAPTER
2. Chapter 1: What Is Retrieval-Augmented Generation (RAG) 3. Chapter 2: Code Lab – An Entire RAG Pipeline 4. Chapter 3: Practical Applications of RAG 5. Chapter 4: Components of a RAG System 6. Chapter 5: Managing Security in RAG Applications 7. Part 2 – Components of RAG
8. Chapter 6: Interfacing with RAG and Gradio 9. Chapter 7: The Key Role Vectors and Vector Stores Play in RAG 10. Chapter 8: Similarity Searching with Vectors 11. Chapter 9: Evaluating RAG Quantitatively and with Visualizations 12. Chapter 10: Key RAG Components in LangChain 13. Chapter 11: Using LangChain to Get More from RAG 14. Part 3 – Implementing Advanced RAG
15. Chapter 12: Combining RAG with the Power of AI Agents and LangGraph 16. Chapter 13: Using Prompt Engineering to Improve RAG Efforts 17. Chapter 14: Advanced RAG-Related Techniques for Improving Results 18. Index 19. Other Books You May Enjoy

Comparing RAG with model fine-tuning

LLMs can be adapted to your data in two ways:

  • Fine-tuning: With fine-tuning, you are adjusting the weights and/or biases that define the model’s intelligence based on new training data. This directly impacts the model, permanently changing how it will interact with new inputs.
  • Input/prompts: This is where you use the model, using the prompt/input to introduce new knowledge that the LLM can act upon.

Why not use fine-tuning in all situations? Once you have introduced the new knowledge, the LLM will always have it! It is also how the model was created – by being trained with data, right? That sounds right in theory, but in practice, fine-tuning has been more reliable in teaching a model specialized tasks (such as teaching a model how to converse in a certain way), and less reliable for factual recall.

The reason is complicated, but in general, a model’s knowledge of facts is like a human’s long-term memory. If you memorize a long passage from a speech or book and then try to recall it a few months later, you will likely still understand the context of the information, but you may forget specific details. On the other hand, adding knowledge through the input of the model is like our short-term memory, where the facts, details, and even the order of wording are all very fresh and available for recall. It is this latter scenario that lends itself better in a situation where you want successful factual recall. And given how much more expensive fine-tuning can be, this makes it that much more important to consider RAG.

There is a trade-off, though. While there are generally ways to feed all data you have to a model for fine-tuning, inputs are limited by the context window of the model. This is an area that is actively being addressed. For example, early versions of ChatGPT 3.5 had a 4,096 token context window, which is the equivalent of about five pages of text. When ChatGPT 4 was released, they expanded the context window to 8,192 tokens (10 pages) and there was a Chat 4-32k version that had a context window of 32,768 tokens (40 pages). This issue is so important that they included the context window size in the name of the model. That is a strong indicator of how important the context window is!

Interesting fact!

What about the latest Gemini 1.5 model? It has a 1 million token context window or over 1,000 pages!

As the context windows expand, another issue is created. Early models with expanded context windows were shown to lose a lot of the details, especially in the middle of the text. This issue is also being addressed. The Gemini 1.5 model with its 1 million token context window has performed well in tests called needle in a haystack tests for remembering all details well throughout the text it can take as input. Unfortunately, the model did not perform as well in the multiple needles in a haystack tests. Expect more effort in this area as these context windows become larger. Keep this in mind if you need to work with large amounts of text at a time.

Note

It is important to note that token count differs from word count as tokens include punctuation, symbols, numbers, and other text representations. How a compound word such as ice cream is treated token-wise depends on the tokenization scheme and it can vary across LLMs. But most well-known LLMs (such as ChatGPT and Gemini) would consider ice cream as two tokens. Under certain circumstances in NLP, you may argue that it should be one token based on the concept that a token should represent a useful semantic unit for processing, but that is not the case for these models.

Fine-tuning can also be quite expensive, depending on the environment and resources you have available. In recent years, the costs for fine-tuning have come down substantially due to new techniques such as representative fine-tuning, LoRA-related techniques, and quantization. But in many RAG development efforts, fine-tuning is considered an additional cost to already expensive RAG efforts, so it is considered a more expensive addition to the efforts.

Ultimately, when deciding between RAG and fine-tuning, consider your specific use case and requirements. RAG is generally superior for retrieving factual information that is not present in the LLM’s training data or is private. It allows you to dynamically integrate external knowledge without modifying the model’s weights. Fine-tuning, on the other hand, is more suitable for teaching the model specialized tasks or adapting it to a specific domain. Keep the limitations of context window sizes and the potential for overfitting in mind when fine-tuning a specific dataset.

Now that we have defined what RAG is, particularly when compared to other approaches that use generative AI, let’s review the general architecture of RAG systems.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime