Using the ingestion pipeline to increase efficiency
Starting with version 0.9
, the LlamaIndex framework introduced a really neat concept: the so-called ingestion pipeline.
A simple analogy
An ingestion pipeline is a bit like a conveyor belt in a factory. In the context of LlamaIndex, it’s a setup that takes your raw data and gets it ready to be integrated into your RAG workflow. It does this by running the data through a series of steps – called transformations – one by one. The key idea is to break the ingestion process into a series of reusable transformations that are applied to input data. This helps standardize and customize ingestion flows for different use cases. Think of transformations as different workstations along this conveyor belt. As your raw data moves along, it hits different stations where something specific happens. It might be split into sentences at one station – that’s your SentenceSplitter
– and have a title extracted...