Retrieval-augmented generation
RAG is an approach that combines the powers of pre-trained language models with information retrieval to generate responses based on a large corpus of documents. This is particularly useful for generating informed responses that rely on external knowledge not contained within the model’s training dataset.
RAG involves three steps:
- Retrieval: Given an input query (for example, a question or a prompt), you use a system to retrieve relevant documents or passages from your data sources. This is typically done using embeddings.
- Augmentation: The retrieved documents are then used to augment the input prompt. Usually, this means creating a prompt that incorporates the data from the retrieval step and adds some prompt engineering.
- Generation: The augmented prompt is then fed into a generative model, usually GPT, which generates the output. Because the prompt contains relevant information from the retrieved documents, the model can generate...