Enhancing Image Search with Vector Similarity

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!

This article is an excerpt from the book, Vector Search for Practitioners with Elastic, by Bahaaldine Azarmi and Jeff Vestal. Optimize your search capabilities in Elastic by operationalizing and fine-tuning vector search and enhance your search relevance while improving overall search performance

Introduction

Vector similarity search plays a crucial role in image search. After images are transformed into vectors, a search query (also represented as a vector) is compared against the database of image vectors to find the most similar matches. This process is known as k-Nearest Neighbor (kNN) search, where “k” represents the number of similar items to retrieve.

Several algorithms can be used for kNN search, including brute-force search and more efficient methods such as the Hierarchical Navigable Small World (HNSW) algorithm (see Chapter 7, Next Generation of Observability Powered, by Vectors for a more in-depth discussion on HNSW). Bruteforce search involves comparing the query vector with every vector in the database, which can be computationally expensive for large databases. On the other hand, HNSW is an optimized algorithm that can quickly find the nearest neighbors in a large-scale database, making it particularly useful for vector similarity search in image search systems.

The tangible benefits of image search are observed across industries. Its flexibility and adaptability make it a tool of choice for enhancing user experiences, ensuring digital security, or even revolutionizing digital content interactions.

Image search in practice

Applications of image search are varied and far-reaching. In e-commerce, for example, reverse image search allows customers to upload a photo of a product and find similar items for sale. In the field of digital forensics, image search can be used to find visually similar images across a database to detect illicit content. It is also used in the realm of social media for face recognition, image tagging, and content recommendation.

As we continue to generate and share more visual content, the need for effective and efficient image search technology will only grow. The combination of artificial intelligence, machine learning, and vector similarity search provides a powerful toolkit to meet this demand, powering a new generation of image search capabilities that can analyze and understand visual content.

Traditionally, image search engines use text-based metadata associated with images, such as the image’s filename, alt text, and surrounding text context, to understand the content of an image. This approach, however, is limited by the accuracy and completeness of the metadata, and it fails to analyze the actual visual content of the image itself.

Over time, with advancements in artificial intelligence and machine learning, more sophisticated methods of image search have been developed that can analyze the visual content of images directly. This technique, known as content-based image retrieval (CBIR), involves extracting feature vectors from images and using these vectors to find visually similar images.

Feature vectors are a numerical representation of an image’s visual content. They are generated by applying a feature extraction algorithm to the image. The specifics of the feature extraction process can vary, but in general, it involves analyzing the image’s colors, textures, and shapes. In recent years, CNNs have become a popular tool for feature extraction due to their ability to capture complex patterns in image data.

Once feature vectors have been extracted from a set of images, these vectors can be indexed in a database. When a new query image is submitted, its feature vector is compared to the indexed vectors, and the images with the most similar vectors are returned as the search results. The similarity between vectors is typically measured using distance metrics such as Euclidean distance or cosine similarity.

Despite the impressive capabilities of CBIR systems, there are several challenges in implementing them. For instance, interpreting and understanding the semantic meaning of images is a complex task due to the subjective nature of visual perception. Furthermore, the high dimensionality of image data can make the search process computationally expensive, particularly for large databases.

To address these challenges, approximate nearest neighbor (ANN) search algorithms, such as the HNSW graph, are often used to optimize the search process. These algorithms sacrifice a small amount of accuracy for a significant increase in search speed, making them a practical choice for large-scale image search applications.

With the advent of Elasticsearch’s dense vector field type, it is now possible to index and search highdimensional vectors directly within an Elasticsearch cluster. This functionality, combined with an appropriate feature extraction model, provides a powerful toolset for building efficient and scalable image search systems.

In the following sections, we will delve into the details of image feature extraction, vector indexing, and search techniques. We will also demonstrate how to implement an image search system using Elasticsearch and a pre-trained CNN model for feature extraction. The overarching goal is to provide a comprehensive guide for building and optimizing image search systems using state-of-the-art technology.

Vector search with images

Vector search is a transformative feature of Elasticsearch and other vector stores that enables a method for performing searches within complex data types such as images. Through this approach, images are converted into vectors that can be indexed, searched, and compared against each other, revolutionizing the way we can retrieve and analyze image data. This inherent characteristic of producing embeddings applies to other media types as well. This section provides an in-depth overview of the vector search process with images, including image vectorization, vector indexing in Elasticsearch, kNN search, vector similarity metrics, and fine-tuning the kNN algorithm.

Image vectorization

The first phase of the vector search process involves transforming the image data into a vector, a process known as image vectorization. Deep learning models, specifically CNNs, are typically employed for this task. CNNs are designed to understand and capture the intricate features of an image, such as color distribution, shapes, textures, and patterns. By processing an image through layers of convolutional, pooling, and fully connected nodes, a CNN can represent an image as a high-dimensional vector. This vector encapsulates the key features of the image, serving as its numerical representation.

The output layer of a pre-trained CNN (often referred to as an embedding or feature vector) is often used for this purpose. Each dimension in this vector represents some learned feature from the image. For instance, one dimension might correspond to the presence of a particular color or texture pattern.

The values in the vector quantify the extent to which these features are present in the image.

enhancing-image-search-with-vector-similarity-img-0

Figure 1 : Layers of a CNN model

As seen in the preceding diagram, these are the layers of a CNN model:

1. Accepts raw pixel values of the image as input.

2. Each layer extracts specific features such as edges, corners, textures, and so on.

3. Introduces non-linearity, learns from errors, and approximates more complex functions.

4. Reduces the dimensions of feature maps through down-sampling to decrease the computational complexity.

5. Consists of the weights and biases from the previous layers for the classification process to take place.

6. Outputs a probability distribution over classes.

Indexing image vectors in Elasticsearch

Once the image vectors have been obtained, the next step is to index these vectors in Elasticsearch for future searching. Elasticsearch provides a special field type, the dense_vector field, to handle the storage of these high-dimensional vectors.

A dense_vector field is defined as an array of numeric values, typically floating-point numbers, with a specified number of dimensions (dims). The maximum number of dimensions allowed for indexed vectors is currently 2,048, though this may be further increased in the future. It’s essential to note that each dense_vector field is single-valued, meaning that it is not possible to store multiple values in one such field.

In the context of image search, each image (now represented as a vector) is indexed into an Elasticsearch document. This vector can be one per document or multiple vectors per document. The vector representing the image is stored in a dense_vector field within the document. Additionally, other relevant information or metadata about the image can be stored in other fields within the same document.

The full example code can be found in the Jupyter Notebook available in the chapter 5 folder of this book’s GitHub repository at https://github.com/PacktPublishing/VectorSearch-for-Practitioners-with-Elastic/tree/main/chapter5, but we’ll discuss the relevant parts here.

First, we will initialize a pre-trained model using the SentenceTransformer library.

The clip-ViT-B-32-multilingual-v1 model is discussed in detail later in this chapter:

model = SentenceTransformer('clip-ViT-B-32-multilingual-v1')

Next, we will prepare the image transformation function:

transform = transforms.Compose([
   transforms.Resize(224),
   transforms.CenterCrop(224),
   lambda image: image.convert("RGB"),
   transforms.ToTensor(),
   transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

Transforms.Compose() combines all the following transformations:

transforms.Resize(224): Resizes the shorter side of the image to 224 pixels while maintaining the aspect ratio.
transforms.CenterCrop(224): Crops the center of the image so that the resultant image has dimensions of 224x224 pixels.
lambda image: image.convert("RGB"): This is a transformation that converts the image to the RGB format. This is useful for grayscale images or images with an alpha channel, as deep learning models typically expect RGB inputs.
transforms.ToTensor(): Converts the image (in the PIL image format) into a PyTorch tensor. This will change the data from a range of [0, 255] in the PIL image format to a float in a range [0.0, 1.0].
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)): Normalizes the tensor image with a given mean and standard deviation for each channel. In this case, the mean and standard deviation for all three channels (R, G, B) are 0.5. This normalization will transform the data range from [0.0, 1.0] to [-1.0, 1.0].

We can use the following code to apply the transform to an image file and then generate an image vector using the model. See the Python notebook for this chapter to run against actual image files:

from PIL import Image
img = Image.open("image_file.jpg")
image = transform(img).unsqueeze(0)
image_vector = model.encode(image)

The vector and other associated data can then be indexed into Elasticsearch for use with kNN search:

# Create document
   document = {'_index': index_name,
               '_source': {"filename": filename,
                           "image_vector": vector

See the complete code in the chapter 5 folder of this book’s GitHub repository.

With vectors generated and indexed into Elasticsearch, we can move on to searching for similar images.

k-Nearest Neighbor (kNN) search

With the vectors now indexed in Elasticsearch, the next step is to make use of kNN search. You can refer back to Chapter 2, Getting Started with Vector Search in Elastic, for a full discussion on kNN and HNSW search.

As with text-based vector search, when performing vector search with images, we first need to convert our query image to a vector. The process is the same as we used to convert images to vectors at index time.

We convert the image to a vector and include that vector in the query_vector parameter of the knn search function:

knn = {
   "field": "image_vector",
   "query_vector": search_image_vector[0],
   "k": 1,
   "num_candidates": 10
 }

Here, we specify the following:

field: The field in the index that contains vector representations of images we are searching against
query_vector: The vector representation of our query image
k: We want only one closest image
num_candidates: The number of approximate nearest neighbor candidates on each shard to search against

With an understanding of how to convert an image to a vector representation and perform an approximate nearest neighbor search, let’s discuss some of the challenges.

Challenges and limitations with image search

While vector search with images offers powerful capabilities for image retrieval, it also comes with certain challenges and limitations. One of the main challenges is the high dimensionality of image vectors, which can lead to computational inefficiencies and difficulties in visualizing and interpreting the data.

Additionally, while pre-trained models for feature extraction can capture a wide range of features, they may not always align with the specific features that are relevant to a particular use case. This can lead to suboptimal search results. One potential solution, not limited to image search, is to use transfer learning to fine-tune the feature extraction model on a specific task, although this requires additional data and computational resources.

Conclusion

In conclusion, vector similarity search revolutionizes image retrieval by harnessing advanced algorithms and machine learning. From e-commerce to digital forensics, its impact is profound, enhancing user experiences and content discovery. Leveraging techniques like k-Nearest Neighbor search and Elasticsearch's dense vector field, image search becomes more efficient and scalable. Despite challenges, such as high dimensionality and feature alignment, ongoing advancements promise even greater insights into visual data. As technology evolves, so does our ability to navigate and understand the vast landscape of images, ensuring a future of enhanced digital interactions and insights.

Author Bio

Bahaaldine Azarmi, Global VP Customer Engineering at Elastic, guides companies as they leverage data architecture, distributed systems, machine learning, and generative AI. He leads the customer engineering team, focusing on cloud consumption, and is passionate about sharing knowledge to build and inspire a community skilled in AI.

Jeff Vestal has a rich background spanning over a decade in financial trading firms and extensive experience with Elasticsearch. He offers a unique blend of operational acumen, engineering skills, and machine learning expertise. As a Principal Customer Enterprise Architect, he excels at crafting innovative solutions, leveraging Elasticsearch's advanced search capabilities, machine learning features, and generative AI integrations, adeptly guiding users to transform complex data challenges into actionable insights.