Storage efficiency strategies
As your production dataset for vector search grows in size, so do the resources required to store those vectors and search through them in a timely fashion. In this section, we discuss several strategies users can take to reduce those resources. Each strategy has its trade-offs and should be carefully considered and thoroughly tested before being put into production.
Reducing dimensionality
Reducing dimensionality refers to the process of transforming high-dimensional data into a lower-dimensional representation. This process is often employed to mitigate the challenges that arise when working with high-dimensional data, such as the curse of dimensionality (https://en.wikipedia.org/wiki/Curse_of_dimensionality). Dimensionality reduction techniques, such as Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE), can help improve the efficiency and effectiveness of kNN vector search. However, there are advantages...