Organizing RAG in a pipeline
A RAG pipeline will typically collect data and prepare it by cleaning it, for example, chunking the documents, embedding them, and storing them in a vector store dataset. The vector dataset is then queried to augment the user input of a generative AI model to produce an output. However, it is highly recommended not to run this sequence of RAG in one single program when it comes to using a vector store. We should at least separate the process into three components:
- Data collection and preparation
- Data embedding and loading into the dataset of a vector store
- Querying the vectorized dataset to augment the input of a generative AI model to produce a response
Let’s go through the main reasons for this component approach:
- Specialization, which will allow each member of a team to do what they are best at, either collecting and cleaning data, running embedding models, managing vector stores, or tweaking generative AI models.
- Scalability, making it easier to upgrade separate components as the technology evolves and scale the different components with specialized methods. Storing raw data, for example, can be scaled on a different server than the cloud platform, where the embedded vectors are stored in a vectorized dataset.
- Parallel development, which allows each team to advance at their pace without waiting for others. Improvements can be made continually on one component without disrupting the processes of the other components.
- Maintenance is component-independent. One team can work on one component without affecting the other parts of the system. For example, if the RAG pipeline is in production, users can continue querying and running generative AI through the vector store while a team fixes the data collection component.
- Security concerns and privacy are minimized because each team can work separately with specific authorization, access, and roles for each component.
As we can see, in real-life production environments or large-scale projects, it is rare for a single program or team to manage end-to-end processes. We are now ready to draw the blueprint of the RAG pipeline that we will build in Python in this chapter.