Implementing the LLM Twin’s RAG inference pipeline
As explained at the beginning of this chapter, the RAG inference pipeline can mainly be divided into three parts: the retrieval module, the prompt creation, and the answer generation, which boils down to calling an LLM with the augmented prompt. In this section, our primary focus will be implementing the retrieval module, where most of the code and logic go. Afterward, we will look at how to build the final prompt using the retrieved context and user query.
Ultimately, we will examine how to combine the retrieval module, prompt creation logic, and the LLM to capture an end-to-end RAG workflow. Unfortunately, we won’t be able to test out the LLM until we finish Chapter 10, as we haven’t deployed our fine-tuned LLM Twin module to AWS SageMaker.
Thus, by the end of this section, you will learn how to implement the RAG inference pipeline, which you can test out end to end only after finishing Chapter 10. Now...