You're reading from Building AI Intensive Python Applications Create intelligent apps with LLMs and vector databases

Product type Paperback

Published in Sep 2024

Publisher Packt

ISBN-13 9781836207252

Length 298 pages

Edition 1st Edition

Languages

Python

Tools

MongoDB

Concepts

Artificial Intelligence

Table of Contents (18) Chapters

Preface

1. Chapter 1: Getting Started with Generative AI

2. Chapter 2: Building Blocks of Intelligent Applications FREE CHAPTER

3. Part 1: Foundations of AI: LLMs, Embedding Models, Vector Databases, and Application Design

4. Chapter 3: Large Language Models

5. Chapter 4: Embedding Models

6. Chapter 5: Vector Databases

7. Chapter 6: AI/ML Application Design

8. Part 2: Building Your Python Application: Frameworks, Libraries, APIs, and Vector Search

9. Chapter 7: Useful Frameworks, Libraries, and APIs

10. Chapter 8: Implementing Vector Search in AI Applications

11. Part 3: Optimizing AI Applications: Scaling, Fine-Tuning, Troubleshooting, Monitoring, and Analytics

12. Chapter 9: LLM Output Evaluation

13. Chapter 10: Refining the Semantic Data Model to Improve Accuracy

14. Chapter 11: Common Failures of Generative AI

15. Chapter 12: Correcting and Optimizing Your Generative AI Application

16. Other Books You May Enjoy

Appendix: Further Reading: Index

Why subscribe?

Embedding models and vector databases – semantic long-term memory

In addition to the reasoning capabilities provided by LLMs, intelligent applications require semantic long-term memory for storing and retrieving information.

Semantic memory typically consists of two core components—AI vector embedding models and vector databases. Vector embedding models represent the semantic meaning of unstructured data, such as text or images, in large arrays of numbers. Vector databases efficiently store and retrieve these vectors to support semantic search and context retrieval. These components work together to enable the reasoning engine to access relevant context and information as needed.

Embedding models

Embedding models are AI models that map text and other data types, such as images and audio, into high-dimensional vector representations. These vector representations capture the semantic meaning of the input data, allowing for efficient similarity comparisons and semantic search, typically using cosine similarity as the distance metric.

Embedding models encode semantic meaning into a machine-interpretable format. By representing similar concepts as nearby points in the vector space, embedding models let us measure the semantic similarity between pieces of unstructured data and perform semantic search across a large corpus.

Pre-trained embedding models are widely available and can be fine-tuned for specific domains or use cases. Compared to LLMs, embedding models tend to be more affordable and can run on limited hardware, making them accessible to a wider range of applications.

Some common applications of embedding models include the following:

Semantic search and retrieval: Embedding models can be used as a component in larger AI systems to retrieve relevant context for LLMs, especially in RAG architectures. RAG is a particularly important use case for the intelligent applications discussed in this book and will be covered in more detail in Chapter 8, Implementing Vector Search in AI Applications.
Recommendation systems: By representing items and user preferences as embeddings, recommendation systems can identify similar items and generate personalized recommendations.
Clustering and topic modeling: Embedding models can help discover latent topics and themes in large datasets, which can be useful for analyzing user interactions with intelligent applications, such as identifying frequently asked questions in a chatbot.
Anomaly detection: By identifying outlier vectors that are semantically distant from the norm, embedding models can be used for anomaly detection in various domains.
Analyzing relationships between entities: Embedding models can uncover hidden relationships and connections between entities based on their semantic similarity.

You will explore the technical details and practical considerations of embedding models in Chapter 4, Embedding Models.

Vector databases

Vector databases are specialized data stores optimized for storing and searching high-dimensional vectors. They provide fast, approximate nearest neighbor (ANN) search capabilities that allow intelligent applications to quickly store and retrieve relevant information based on spatial proximity.

ANN search is necessary because performing exact similarity calculations against every vector in the database becomes computationally expensive as the database grows in size. Vector databases use algorithms, such as hierarchical navigable small worlds (HNSW), to efficiently find approximate nearest neighbors, making vector search feasible at scale.

In addition to ANN search, vector databases typically support filtering and exact search on metadata associated with the vectors. The exact functionality and performance of these features vary across different vector database products.

Vector databases provide an intelligent application with low-latency retrieval of relevant information given a query. Using the semantic meaning of the content for search, vector databases align with the way LLMs reason about information, enabling the application to apply the same unstructured data format for long-term memory as it does for reasoning.

In applications that use RAG, the vector database plays a crucial role. The application generates a query embedding, which is used to retrieve relevant context from the vector database. Multiple relevant chunks are then provided as context to the LLM, which uses this information to generate informed and relevant responses.

You will learn about the technical details and practical considerations of vector databases in Chapter 5, Vector Databases.

Model hosting

To implement AI models in your intelligent application, you must host them on computers, either in a data center or the cloud. This process is known as model hosting. Hosting AI models for applications presents a different set of requirements compared to hosting traditional software. Running AI models at scale requires powerful graphics processing units (GPUs) and configuring the software environment to load and execute the model efficiently.

The key challenges in model hosting include high computational requirements and hardware costs, limited availability of GPU resources, complexity in managing and scaling the hosting infrastructure, and potential vendor lock-in or limited flexibility when using proprietary solutions. As a result, hardware and cost constraints must be factored into the application design process more than ever.

Self-hosting models

The term self-hosting models refers to the practice of deploying and running AI models, such as LLMs, on an organization’s own infrastructure and hardware resources. In this approach, the organization is responsible for setting up and maintaining the necessary computational resources, software environment, and infrastructure required to load and execute the models.

Self-hosting AI models requires a significant upfront investment in specialized hardware, which can be cost-prohibitive for many organizations. Managing the model infrastructure also imposes an operational burden that requires ML expertise, which many software teams lack. This can divert the focus from the core application and business logic.

Scaling self-hosted models to ensure availability can be challenging, as models can be large and take time to load into memory. Organizations may need to provision significant excess capacity to handle peak loads. Additionally, maintaining and updating models is a complex task, as models can go stale over time and require retraining or fine-tuning. With the active research in the field, new models and techniques constantly emerge, making it difficult for organizations to keep up.

Model hosting providers

The challenges associated with self-hosting have made model hosting providers a popular choice for intelligent application development.

Model hosting providers are cloud-based services that offer a platform for deploying, running, and managing AI models, such as LLMs, on their infrastructure. These providers handle the complexities of setting up, maintaining, and scaling the infrastructure required to load and execute the models.

Model hosting providers offer several benefits:

Outsourced hardware and infrastructure management: Model hosting providers handle provisioning, scaling, availability, security, and other infrastructure concerns, allowing application teams to focus on their core product.
Cost efficiency and flexible pricing: With model hosting providers, organizations pay only for what they use and can scale resources up and down as needed, reducing upfront investment.
Access to a wide range of models: Providers curate and host many state-of-the-art models, continuously integrating the latest research. They often add additional features and optimizations to the raw models.
Support and expertise: Providers can offer consultation on model selection, prompt engineering, application architecture, and assistance with fine-tuning, data preparation, evaluation, and other aspects of AI development.
Rapid prototyping and experimentation: Model hosting providers enable developers to quickly test different models and approaches, adapting to new developments in the fast-moving AI/ML space.
Scalability and reliability: Providers build robust, highly available, and auto-scaling infrastructure to meet the demands of production-scale intelligent applications.

Examples of model hosting providers include those from model developers such as OpenAI, Anthropic, and Cohere, as well as offerings from cloud providers such as AWS Bedrock, Google Vertex AI, and Azure AI Studio.