Estimating the potential cost of building and querying Indexes
In a similar manner to metadata extractors, Indexes pose issues related to costs and data privacy. That is because, as we have seen in this chapter, most Indexes rely on LLMs to some extent – during building and/or querying.
Repeatedly calling LLMs to process large volumes of text can quickly break your budget if you’re not paying attention to your potential costs. For example, if you are building a TreeIndex
or KeywordTableIndex
from thousands of documents, those constant LLM invocations during Index construction will carry a significant cost. Embeddings can also rely on calls to external models; therefore, the VectorStoreIndex
is another important source of costs. In my experience, prevention and prediction are the best ways to avoid nasty surprises and keep your expenses low.
Just like with metadata extraction, I’d start first by observing and applying some best practices:
- Use Indexes...