Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Building AI Intensive Python Applications

You're reading from   Building AI Intensive Python Applications Create intelligent apps with LLMs and vector databases

Arrow left icon
Product type Paperback
Published in Sep 2024
Publisher Packt
ISBN-13 9781836207252
Length 298 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Toc

Table of Contents (18) Chapters Close

Preface 1. Chapter 1: Getting Started with Generative AI 2. Chapter 2: Building Blocks of Intelligent Applications FREE CHAPTER 3. Part 1: Foundations of AI: LLMs, Embedding Models, Vector Databases, and Application Design
4. Chapter 3: Large Language Models 5. Chapter 4: Embedding Models 6. Chapter 5: Vector Databases 7. Chapter 6: AI/ML Application Design 8. Part 2: Building Your Python Application: Frameworks, Libraries, APIs, and Vector Search
9. Chapter 7: Useful Frameworks, Libraries, and APIs 10. Chapter 8: Implementing Vector Search in AI Applications 11. Part 3: Optimizing AI Applications: Scaling, Fine-Tuning, Troubleshooting, Monitoring, and Analytics
12. Chapter 9: LLM Output Evaluation 13. Chapter 10: Refining the Semantic Data Model to Improve Accuracy 14. Chapter 11: Common Failures of Generative AI 15. Chapter 12: Correcting and Optimizing Your Generative AI Application 16. Other Books You May Enjoy Appendix: Further Reading: Index

Your (soon-to-be) intelligent app

With LLMs, embedding models, vector databases, and model hosting, you have the key building blocks for creating intelligent applications. While the specific architecture will vary depending on your use case, a common pattern emerges:

  • LLMs for reasoning and generation
  • Embeddings and vector search for retrieval and memory
  • Model hosting to serve these components at scale

This AI stack is integrated with traditional application components, such as backend services, APIs, frontend user interfaces, databases, and data pipelines. Additionally, intelligent applications often include components for AI-specific concerns, such as prompt management and optimization, data preparation and embedding generation, and AI safety, testing, and monitoring.

The rest of this section walks through an example architecture for a RAG-powered chatbot, showcasing how these components work together. The subsequent chapters will dive deeper into the end-to-end process of building production-grade intelligent applications.

Sample application – RAG chatbot

Consider a simple chatbot application that leverages RAG that lets users talk to some documentation.

There are seven key components of this application:

  • Chatbot UI: A website with a simple chatbot UI that communicates with the web server
  • Web server: A Python Flask server to manage conversations between the user and the LLM
  • Data ingestion extract, transform, load (ETL) pipeline: A Python script that ingests data from the data sources
  • Embedding model: The OpenAI text-embedding-3-small model, hosted by OpenAI
  • LLM: The OpenAI gpt-4-turbo model, hosted by OpenAI
  • Vector store: MongoDB Atlas Vector Search
  • MongoDB Atlas: A database-as-a-service for persisting conversations

Note

This simple example application does not include evaluation or observability modules.

In this architecture, there are two key data flows:

  • Chat interaction: The user communicates with the chatbot with RAG
  • Data ingestion: Bringing data from its original sources into the vector database

In the chat interaction, the chatbot UI communicates with the chatbot web server, which in turn interacts with the LLM, embedding model, and vector store. This occurs for every message that the user sends to the chatbot. Figure 2.1 shows the data flow for the chatbot application:

Figure 2.1: An example of a basic RAG chatbot conversation data flow

The data flow illustrated in Figure 2.1 can be described as follows:

  1. The user sends a message to the chatbot from the web UI.
  2. The web UI creates a request to the server with the user’s message.
  3. The web server sends a request to the embedding model API to create a vector embedding for the user query. The embedding model API responds with the corresponding vector embedding.
  4. The web server performs a vector search in the vector database using the query vector embedding. The vector store responds with the matching vector search results.
  5. The server constructs a message that the LLM will respond to. This message consists of a system prompt and a new message that includes the user’s original message and the content retrieved from the vector search. The LLM then responds to the user message.
  6. The server saves the conversation state to the database.
  7. The server returns the LLM-generated message to the user in a response to the original request from the web UI.

A data ingestion pipeline prepares and enriches data, generates embeddings using the embedding model, and populates the vector store and traditional database. This pipeline runs as a batch job every 24 hours. Figure 2.2 shows an example of a data ingestion pipeline:

Figure 2.2: An example of a RAG chatbot data ingestion ETL pipeline

Let’s look at the data flow shown in Figure 2.2:

  1. The data ingestion ETL pipeline pulls in data from various data sources.
  2. The ETL pipeline cleans the data into a consistent format. It also breaks the data into chunks of data.
  3. The ETL pipeline calls the embedding model API to generate a vector embedding for each data chunk.
  4. The ETL pipeline stores the chunks along with their vector embeddings in a vector database.
  5. The vector database indexes the embeddings for use with vector search.

While a simple architecture like this can be used to build compelling prototypes, transitioning from prototype to production and continuously iterating on the application requires addressing many additional considerations:

  • Data ingestion strategy: Acquiring, cleaning, and preparing the data that will be ingested into the vector store or database for retrieval.
  • Advanced retrieval patterns: Incorporating techniques for efficient and accurate retrieval of relevant information from the vector store or database, such as combining semantic search with traditional filtering, AI-based reranking, and query mutation.
  • Evaluation and testing: Adding modules for evaluating model outputs, testing end-to-end application flows, and monitoring for potential biases or errors.
  • Scalability and performance optimization: Implementing optimizations such as caching, load balancing, and efficient resource management to handle increasing workloads and ensure consistent responsiveness.
  • Security and privacy: Securing the application to ensure that users can only interact with data that they have permission to, so that user data is handled in accordance with relevant policies, standards, and laws.
  • User experience and interaction design: Incorporating new generative AI interfaces and interaction patterns, such as streaming responses, answer confidence, and source citation.
  • Continuous improvement and model updates: Building processes and systems to update AI models safely and reliably and hyperparameters in the intelligent application.

Implications of intelligent applications for software engineering

The rise of intelligent applications has significant implications for how software is made. Developing these intelligent applications requires an extension of traditional development skills. The AI engineer must possess an understanding of prompt engineering, vector search, and evaluation, as well as familiarity with the latest AI techniques and architectures. While a complete understanding of the underlying neural networks is not necessary, basic knowledge of natural language processing (NLP) is helpful.

Intelligent application development also introduces new challenges and considerations, such as data management and integration with AI components, testing and debugging of AI-driven functionality, and addressing the ethical, safety, and security implications of AI outputs. The compute-heavy nature of AI workloads also necessitates focusing on scalability and cost optimization. Developers building traditional software generally do not need to face such concerns.

To address these challenges, software development teams must adapt their processes and adopt novel approaches and best practices. This entails implementing AI governance, bridging the gap between software and ML/AI teams, and adjusting the development lifecycle for intelligent app needs.

You have been reading a chapter from
Building AI Intensive Python Applications
Published in: Sep 2024
Publisher: Packt
ISBN-13: 9781836207252
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image