Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!
The ability to harness vast amounts of data is essential to build an AI-based application. Whether you're building a search engine, creating tabular question-answering systems, or developing a frequently asked questions (FAQ) search, you need efficient tools to manage and retrieve information. Vector databases have emerged as an invaluable resource for these tasks. In this article, we'll delve into the world of vector databases, focusing on Pinecone, a cloud-based option with high performance. We'll also discuss other noteworthy choices like Milvus and Qdrant. So, buckle up as we explore the world of vector databases and their applications.
Vector databases serve two primary purposes: facilitating semantic search and acting as a long-term memory for Large Language Models (LLMs). Semantic search is widely employed across various applications, including search engines, tabular question answering, and FAQ-based question answering. This search method relies on embedding-based similarity search. The core idea is to find the closest embedding that represents a query from a source document. Essentially, it's about matching the question to the answer by comparing their embeddings.
Vector databases are pivotal in the context of LLMs, particularly in architectures like Retrieval-Augmented Generation (RAG). In RAG, a knowledge base is essential, and vector databases step in to store all the sources of truth. They convert this information into embeddings and perform similarity searches to retrieve the most relevant documents. In a nutshell, vector databases become the knowledge base for your LLM, acting as an external long-term memory.
Now that we've established the importance of vector databases let's explore your options. The market offers a variety of vector databases to suit different use cases. Among the prominent contenders, three stand out: Milvus, Pinecone, and, Qdrant.
Milvus and Pinecone are renowned for their exceptional performance. Milvus, based on the Faiss library, offers a highly optimized vector similarity search. It's a powerhouse for demanding applications. Pinecone, on the other hand, is a cloud-based vector database designed for real-time similarity searches. Both of these options excel in speed and reliability, making them ideal for intensive use cases.
If you're on the lookout for a free and open-source vector storage database, Qdrant is a compelling choice. However, it's not as fast or scalable as Milvus and Pinecone. Qdrant is a valuable option when you need an economical solution without sacrificing core functionality. You can check another article about Qdrant here.
Scalability is a crucial factor when considering vector databases, especially for large-scale applications. Milvus stands out for its ability to scale horizontally to handle billions of vectors and thousands of queries per second. Pinecone, being cloud-based, automatically scales as your needs grow. Qdrant, as mentioned earlier, may not be the go-to option for extreme scalability.
In this article, we’ll dive deeper into Pinecone. We’ll discuss anything you need to know about Pinecone, starting from Pinecone’s architecture, how to set it up, the pricing, and several examples of how to use Pinecone. Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn all you need to know about Pinecone!
Indexes are at the core of Pinecone's functionality. They serve as the repositories for your vector embeddings and metadata. Each project can have one or more indexes. The structure of an index is highly flexible and can be tailored to your specific use case and resource requirements.
Pods are the units of cloud resources that provide storage and compute for each index. They are equipped with vCPU, RAM, and disk space to handle the processing and storage needs of your data. The choice of pod type is crucial as it directly impacts the performance and scalability of your Pinecone index.
When it comes to selecting the appropriate pod type, it's essential to align your choice with your specific use case. Pinecone offers various pod types designed to cater to different resource needs. Whether you require substantial storage capacity, high computational power, or a balance between the two, there's a pod type that suits your requirements.
As your data grows, you have the flexibility to scale your storage capacity. This can be achieved by increasing the size and number of pods allocated to your index. Pinecone ensures that you have the necessary resources to manage and access your expanding dataset efficiently.
In addition to scaling storage, Pinecone allows you to control throughput as well. You can fine-tune the performance of your index by adding replicas. These replicas can help distribute the workload and handle increased query traffic, providing a seamless and responsive experience for your end users.
Pinecone is not only available in Python, it’s also available in TypeScript/Node clients. In this article, we’ll focus only on the python-client of Pinecone. Setting Pinecone in Python is very straightforward. We just need to install it via pip, like the following.
pip3 install pinecone-client
Once it’s installed, we can directly exploit the power of Pinecone. However, remember that Pinecone is a commercial product, so we need to put our API key when using Pinecone.
import pinecone
pinecone.init(api_key="YOUR_API_KEY",
environment="us-west1-gcp")
First thing first, we need to create an index in Pinecone to be able to exploit the power of vector database. Remember that vector database is basically storing embedding or vectors inside it. Thus, configuring the dimensions of vectors and the distance metric to be used is important when creating an index. The provided commands showcase how to create an index named "hello_pinecone" for performing an approximate nearest-neighbor search using the Cosine distance metric for 10-dimensional vectors. Creating an index usually takes around 60 seconds.
pinecone.create_index("hello_pinecone", dimension=10, metric="cosine")
Once the index is created, we can get all the information about the index by calling the `.describe_index()` method. This includes configuration information and the deployment status of the index. The operation requires the name of the index as a parameter. The response includes details about the database and its status.
index_description = pinecone.describe_index("hello_pinecone")
You can also check what are the created indices by calling the `.list_indexes()` method.
active_indexes = pinecone.list_indexes()
Creating an index is just the first step before you can insert or query the data. The next step involves creating a client instance that targets the index you just created.
index = pinecone.Index("hello_pinecone")
Once the client instance is created, we can start inserting any relevant data that we want to store. To ingest vectors into your index, you can use the "upsert" operation, which allows you to insert new records into the index or update existing records if a record with the same ID is already present. Below are the commands to upsert five 10-dimensional vectors into your index.
index.upsert([
("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]),
("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]),
("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]),
("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]),
("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5])
])
We can also check the statistics of our index by running the following command.
index.describe_index_stats()
# Returns:
# {'dimension': 10, 'index_fullness': 0.0, 'namespaces': {'': {'vector_count': 5}}}
Now, we can start interacting with our index. Let’s perform a query operation by giving a vector of 10-dimensional as the query and return the top-3 most similar vectors from the index. Note that Pinecone will judge the similarity between the query vector and each of the data in the index based on the provided similarity metric during the index creation, where in this case, it’ll use the Cosine metric.
index.query(
vector=[0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6],
top_k=3,
include_values=True
)
# Returns:
# {'matches': [{'id': 'E',
# 'score': 0.0,
# 'values': [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]},
# {'id': 'D',
# 'score': 0.0799999237,
# 'values': [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]},
# {'id': 'C',
# 'score': 0.0800000429,
# 'values': [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]}],
# 'namespace': ''}
Finally, if you are done using the index, you can also delete the index by running the following command.
pinecone.delete_index("hello_pinecone")
Pinecone offers both a paid licensing option and a free-tier plan. The free-tier plan is an excellent starting point for those who want to dip their toes into Pinecone's capabilities without immediate financial commitment. However, it comes with certain limitations. Under this plan, users are restricted to creating one index and one project.
For users opting for the paid plans, hourly billing is an important aspect to consider. The billing is determined by the per-hour price of a pod, multiplied by the number of pods that your index uses. Essentially, you will be charged based on the resources your index consumes, and this billing structure ensures that you only pay for what you use.
It's important to note that, regardless of the activity on your indexes, you will be sent an invoice at the end of the month. This invoice is generated based on the total minutes your indexes have been running. Pinecone's billing approach ensures transparency and aligns with your actual resource consumption.
Congratulations on keeping up to this point! Throughout this article, you have learned all you need to know about Pinecone, starting from Pinecone’s architecture, how to set it up, the pricing, and several examples of how to use Pinecone. I wish the best for your experiment in creating your vector database with Pinecone and see you in the next article!
Louis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects.
Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.