Hands-On Vector Similarity Search with Milvus

Introduction

In the realm of AI and machine learning, effective management of vast high-dimensional vector data is critical. Milvus, an open-source vector database, tackles this challenge using advanced indexing for swift similarity search and analytics, catering to AI-driven applications.

Milvus operates on vectorization and quantization, converting complex raw data into streamlined high-dimensional vectors for efficient indexing and querying. Its scope spans recommendation, image recognition, natural language processing, and bioinformatics, boosting result precision and overall efficiency.

Milvus impresses not just with capabilities but also design flexibility, supporting diverse backends like MinIO, Ceph, AWS S3, Google Cloud Storage, alongside etcd for metadata storage.

Local Milvus deployment becomes user-friendly with Docker Compose, managing multi-container Docker apps well-suited for Milvus' distributed architecture. The guide delves into Milvus' core principles—vectorization and quantization—reshaping raw data into compact vectors for efficient querying. Its applications in recommendation, image recognition, natural language processing, and bioinformatics enhance system accuracy and efficacy.

The next article details deploying Milvus locally via Docker Compose. This approach's simplicity underscores Milvus' user-centric design, delivering robust capabilities within an accessible framework. Let’s get started.

Standalone Milvus with Docker Compose

Setting up a local instance of Milvus involves a multi-service architecture that consists of the Milvus server, metadata storage, and object storage server. Docker Compose provides an ideal environment to manage such a configuration in a convenient and efficient way.

The Docker Compose file for deploying Milvus locally consists of three services: etcd, minio, and milvus itself. etcd provides metadata storage, minio functions as the object storage server and milvus handles vector data processing and search. By specifying service dependencies and environment variables, we can establish seamless communication between these components. milvus, etcd, and minio services are run in isolated containers, ensuring operational isolation and enhanced security.

To launch the Milvus application, all you need to do is execute the Docker Compose file. Docker Compose manages the initialization sequence based on service dependencies and takes care of launching the entire stack with a single command. The next is the docker-compose.yml which specifies all of the aforementioned components:

version: '3'
services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls <http://0.0.0.0:2379> --data-dir /etcd
 
  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2022-03-17T06-34-49Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9001:9001"
      - "9000:9000"
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "<http://localhost:9000/minio/health/live>"]
      interval: 30s
      timeout: 20s
      retries: 3
 
  milvus:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.3.0-beta
    command: ["milvus", "run", "standalone"]
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"

After we have defined the docker-compose file we can deploy the services by first running docker compose build and then running docker compose up -d.

In the next section, we'll move on to a practical example — creating sentence embeddings. This process leverages Transformer models to convert sentences into high-dimensional vectors. These embeddings capture the semantic essence of the sentences and serve as an excellent demonstration of the sort of data that can be stored and processed with Milvus.

Creating sentence embeddings

Creating sentence embeddings involves a few steps: preparing your environment, importing necessary libraries, and finally, generating and processing the embeddings. We'll walk through each step in this section assuming that this code is being executed in a Python environment where the Milvus database is running.

First, let’s start with the requirements.txt file:

transformers==4.25.1
pymilvus==2.1.0
torch==2.0.1
protobuf==3.18.0
Now let’s import the packages.
import numpy as np
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
from pymilvus import (
    connections,
    utility,
    FieldSchema, CollectionSchema, DataType,
    Collection,
)

Here, we're importing all the necessary libraries for our task. numpy and torch are used for mathematical operations and transformations, transformers is for language model-related tasks, and pymilvus is for interacting with the Milvus server.

This Python code block sets up the transformer model we will be using and lists the sentences for which we will generate embeddings. We first specify a model checkpoint ("sentence-transformers/all-MiniLM-L6-v2") that will serve as our base model for sentence embeddings. We then define a list of sentences to generate embeddings for. To facilitate our task, we initialize a tokenizer and model using the model checkpoint. The tokenizer will convert our sentences into tokens suitable for the model, and the model will use these tokens to generate embeddings:

# Transformer model checkpoint
model_ckpt = "sentence-transformers/all-MiniLM-L6-v2"
 
# Sentences for which we will compute embeddings
sentences = [
    "I took my dog for a walk",
    "Today is going to rain",
    "I took my cat for a walk",
]
 
# Initialize tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = AutoModel.from_pretrained(model_ckpt)

Here, we define the model checkpoint that we will use to get the sentence embeddings. We then initialize a list of sentences for which we will compute embeddings. The tokenizer and model are initialized using the defined checkpoint.

We've obtained token embeddings, but we need to aggregate them to obtain sentence-level embeddings. For this, we'll use a mean pooling operation. The upcoming section of the guide will define a function to accomplish this.

Mean Pooling Function Definition

This function is used to aggregate the token embeddings into sentence embeddings. The token embeddings and the attention mask (which indicates which tokens are not padding and should be considered for pooling) are passed as inputs to this function. The function performs a weighted average of the token embeddings according to the attention mask and returns the aggregated sentence embeddings:

# Mean pooling function to aggregate token embeddings into sentence embeddings
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output.last_hidden_state
    input_mask_expanded = (
        attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    )
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
        input_mask_expanded.sum(1), min=1e-9
    )

This function takes the model output and the attention mask as input and returns the sentence embeddings by performing a mean pooling operation over the token embeddings. The attention mask is used to ignore the tokens corresponding to padding during the pooling operation.

Generating Sentence Embeddings

This code snippet first tokenizes the sentences, padding and truncating them as necessary. We then use the transformer model to generate token embeddings. These token embeddings are pooled using the previously defined mean pooling function to create sentence embeddings. The embeddings are normalized to ensure consistency and finally transformed into Python lists to make them compatible with Milvus:

# Tokenize the sentences and compute their embeddings
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
 
with torch.no_grad():
    model_output = model(**encoded_input)
 
sentence_embeddings = mean_pooling(model_output, encoded_input["attention_mask"])
 
# Normalize the embeddings
sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
 
# Convert the sentence embeddings into a format suitable for Milvus
embeddings = sentence_embeddings.numpy().tolist()

In this section, we're using the transformer model to tokenize the sentences and generate their embeddings. We then normalize these embeddings and convert them to a format suitable for insertion into Milvus (Python lists).

With the pooling function defined, we're now equipped to generate the actual sentence embeddings. These embeddings will then be processed and made ready for insertion into Milvus.

Inserting vector embeddings into Milvus

We're now ready to interact with Milvus. In this section, we will connect to our locally deployed Milvus server, define a schema for our data, and create a collection in the Milvus database to store our sentence embeddings.

Now, it's time to put our Milvus deployment to use. We will define the structure of our data, set up a connection to the Milvus server, and prepare our data for insertion:

# Establish a connection to the Milvus server
connections.connect("default", host="localhost", port="19530")
 
# Define the schema for our collection
fields = [
    FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="sentences", dtype=DataType.VARCHAR, is_primary=False, description="The actual sentences",
                max_length=256),
    FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, is_primary=False, description="The sentence embeddings",
                dim=sentence_embeddings.size()[1]) 
]
 
schema = CollectionSchema(fields, "A collection to store sentence embeddings")

We establish a connection to the Milvus server and then define the schema for our collection in Milvus. The schema includes a primary key field, a field for the sentences, and a field for the sentence embeddings.

With our connection established and schema defined, we can now create our collection, insert our data, and build an index to enable efficient search operations.

Create Collection, Insert Data, and Create Index

In this snippet, we first create the collection in Milvus using the previously defined schema. We then organize our data to match our collection's schema and insert it into our collection. After the data is inserted, we create an index on the embeddings to optimize search operations. Finally, we print the number of entities in the collection to confirm the insertion was successful:

# Create the collection in Milvus
sentence_embeddings_collection = Collection("sentence_embeddings", schema)
 
# Organize our data to match our collection's schema
entities = [
    sentences,  # The actual sentences
    embeddings,  # The sentence embeddings
]
 
# Insert our data into the collection
insert_result = sentence_embeddings_collection.insert(entities)
 
# Create an index to make future search queries faster
index = {
    "index_type": "IVF_FLAT",
    "metric_type": "L2",
    "params": {"nlist": 128},
}
 
sentence_embeddings_collection.create_index("embeddings", index)
 
print(f"Number of entities in Milvus: {sentence_embeddings_collection.num_entities}")

We create a collection in Milvus using the previously defined schema. We organize our data ( sentences, and sentence embeddings) and insert this data into the collection. Primary keys are generated as auto IDs so we don't need to add them. Finally, we print the number of entities in the collection:

hands-on-vector-similarity-search-with-milvus-img-0

This way, the sentences, and their corresponding embeddings are stored in a Milvus collection, ready to be used for similarity searches or other tasks.

Now that we've stored our embeddings in Milvus, let's make use of them. We will search for similar vectors in our collection based on similarity to sample vectors.

Search Based on Vector Similarity

In the code, we're loading the data from our collection into memory and then defining the vectors that we want to find similar vectors for:

# Load the data into memory
sentence_embeddings_collection.load()

This step is necessary to load the data in our collection into memory before conducting a search or a query. The search parameters are set, specifying the metric to use for calculating similarity (L2 distance in this case) and the number of clusters to examine during the search operation. The search operation is then performed, and the results are printed out:

# Vectors to search
vectors_to_search = embeddings[-2:]
 
search_params = {
    "metric_type": "L2",
    "params": {"nprobe": 10},
}
 
# Perform the search
result = sentence_embeddings_collection.search(vectors_to_search, "embeddings", search_params, limit=3, output_fields=["sentences"])
 
# Print the search results
for hits in result:
    for hit in hits:
        print(f"hit: {hit}, sentence field: {hit.entity.get('sentences')}")

Here, we're searching for the two most similar sentence embeddings to the last two embeddings in our list. The results are limited to the top 3 matches, and the corresponding sentences of these matches are printed out:

hands-on-vector-similarity-search-with-milvus-img-1

Once we're done with our data, it's a good practice to clean up. In this section, we'll explore how to delete entities from our collection using their primary keys.

Delete Entities by Primary Key

This code first gets the primary keys of the entities that we want to delete. We then query the collection before the deletion operation to show the entities that will be deleted. The deletion operation is performed, and the same query is run after the deletion operation to confirm that the entities have been deleted:

# Get the primary keys of the entities we want to delete
ids = insert_result.primary_keys
expr = f'pk in [{ids[0]}, {ids[1]}]'
 
# Query before deletion
result = sentence_embeddings_collection.query(expr=expr, output_fields=["sentences", "embeddings"])
print(f"Query before delete by expr=`{expr}` -> result: \\\\n-{result[0]}\\\\n-{result[1]}\\\\n")
 
# Delete entities
sentence_embeddings_collection.delete(expr)
 
# Query after deletion
result = sentence_embeddings_collection.query(expr=expr, output_fields=["sentences", "embeddings"])
print(f"Query after delete by expr=`{expr}` -> result: {result}\\\\n")

Here, we're deleting the entities corresponding to the first two primary keys in our collection. Before and after the deletion, we perform a query to see the result of the deletion operation:

hands-on-vector-similarity-search-with-milvus-img-2

Finally, we drop the entire collection from the Milvus server:

# Drop the collection
utility.drop_collection("sentence_embeddings")

Conclusion

Congratulations on completing this hands-on tutorial with Milvus! You've learned how to harness the power of an open-source vector database that simplifies and accelerates AI and ML applications. Throughout this journey, you set up Milvus locally using Docker Compose, transformed sentences into high-dimensional embeddings and conducted vector similarity searches for practical use cases.

Milvus' advanced indexing techniques have empowered you to efficiently store, search, and analyze large volumes of vector data. Its user-friendly design and seamless integration capabilities ensure that you can leverage its powerful features without unnecessary complexity.

As you continue exploring Milvus, you'll uncover even more possibilities for its application in diverse fields, such as recommendation systems, image recognition, and natural language processing. The high-performance similarity search and analytics offered by Milvus open doors to cutting-edge AI-driven solutions.

With your newfound expertise in Milvus, you are equipped to embark on your own AI adventures, leveraging the potential of vector databases to tackle real-world challenges. Continue experimenting, innovating, and building AI-driven applications that push the boundaries of what's possible. Happy coding!

Author Bio:

Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.