Factors in selecting a vectorization option
Selecting the right vectorization option is a crucial decision when building a RAG system. Key considerations include the quality of the embeddings for your specific application, the associated costs, network availability, speed of embedding generation, and compatibility between embedding models. There are numerous other options beyond what we shared above that you can explore for your specific needs when it comes to selecting an embedding model. Let’s review these considerations.
Quality of the embedding
When considering the quality of your embeddings, you cannot rely on just the generic metrics you have seen for each model. For example, OpenAI has been tested on the Massive Text Embedding Benchmark (MTEB), scoring 61.0% with their 'text-embedding-ada-002'
model, whereas the 'text-embedding-3-large'
model scored 64.6%. The metrics can be useful, especially when trying to hone in on a model of a certain quality...