Augmenting an LLM with external data
In the following recipes, we will learn how to get an LLM to answer questions on which it has not been trained. These could include information that was created after the LLM was trained. New content keeps getting added to the World Wide Web daily. There is no one LLM that can be trained on that context every day. The Retriever Augmented Generation (RAG) frameworks allow us to augment the LLM with additional content that can be sent as input to it for generating content for downstream tasks. This allows us to save on costs too since we do not have to spend time and compute costs on retraining a model based on updated content. As a basic introduction to RAG, we will augment an LLM with some content from a few web pages and ask some questions pertaining to the content contained in those pages. For this recipe, we will first load the LLM and ask it a few questions without providing it any context. We will then augment this LLM with additional context...