Exploring the LLM Twin’s RAG feature pipeline architecture
Now that you have a strong intuition and understanding of RAG and its workings, we will continue exploring our particular LLM Twin use case. The goal is to provide a hands-on end-to-end example to solidify the theory presented in this chapter.
Any RAG system is split into two independent components:
- The ingestion pipeline takes in raw data, cleans, chunks, embeds, and loads it into a vector DB.
- The inference pipeline queries the vector DB for relevant context and ultimately generates an answer by levering an LLM.
In this chapter, we will focus on implementing the RAG ingestion pipeline, and in Chapter 9, we will continue developing the inference pipeline.
With that in mind, let’s have a quick refresher on the problem we are trying to solve and where we get our raw data. Remember that we are building an end-to-end ML system. Thus, all the components talk to each other through...