Improving AI Context with RAG Using Azure Machine Learning prompt flow

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!

Introduction

Retrieval Augmented Generation, or RAG, is a method that can expand the breadth or depth of information for Large Language Models (LLM). Retrieving and delivering more data to an LLM will result in applications with more contextual, relevant information. Unlike traditional web or mobile applications designed to retrieve structured data, RAG (Retrieval-Augmented Generation) requires data to be structured and indexed, and stored differently, most commonly in a vector database. The resulting experience should provide more contextual information with the added ability to cite the source of information. This narrower scope of information can result in a higher degree of accuracy and utility for your enterprise. To summarize RAG:

Retrieval: When a question or formulated to an LLM-powered chat bot, RAG reviews the index to find relevant facts. This is like searching through an index where all the information is neatly summarized for quick access.
Augmentation: It then takes these facts and feeds them to the language model, essentially giving it a brief on the subject matter at hand.
Generation: With this briefing, the language model is now ready to craft a response that's not just based on what it already knows but also on the latest information it has just pulled in.

In short, RAG keeps language models up-to-date and relevant, providing answers that are informed by the latest available data. It's a practical way to ensure AI remains accurate and useful for your organization, especially when dealing with current and evolving topics that may not be public knowledge. In this article, we will explore the differences between RAG and Fine tuning and how you can organize your RAG solution using Azure Machine Learning prompt flow.

improving-ai-context-with-rag-using-azure-machine-learning-prompt-flow-img-0

Retrieval Augmented Generation (RAG) vs Fine Tuning

When working with large language models through chat interfaces like ChatGPT, you will see that its foundational knowledge is point-in-time data. Fine-tuning, on the other hand, is akin to customizing the model with a new layer of knowledge that reflects your specific data, which becomes part of the model's intelligence. As of this article, OpenAI released GPT 4 Turbo which is based on a data set through April 2023. Extending an LLM’s body of knowledge can involve fine-tuning or RAG.

improving-ai-context-with-rag-using-azure-machine-learning-prompt-flow-img-1

Fine Tuning Foundational Models

Fine-tuning involves training a foundational model on a dataset specific to your application, effectively customizing the model to perform better for certain tasks or styles. To fine-tune a model, you need to have a dataset that represents the task or style you are aiming for and the computational resources to perform the training.

Once fine-tuned, the model's knowledge is enhanced, and these changes are permanent unless the model is fine-tuned again with additional data. Fine-tuning is ideal for tasks needing deep customization where the information may be specialized but require re-training infrequently. While OpenAI has started to offer fine-tuning for certain models like GPT-3.5, not all foundational models or versions can be fine-tuned due to access restrictions to their parameters and training regimes.

Retrieval Augmented Generation

RAG is like adding a live feed of information to the foundational model, enabling it to respond with the latest data without modifying the model itself. It involves augmenting a foundational language model’s response by dynamically integrating information retrieved from an external database, typically a vector database, at the time of the query.

No Model Training Required: The foundational model's core parameters remain unchanged. Instead, RAG serves as a real-time data layer that the model queries to inform its responses.
Real-time and Up to Date: Because RAG queries external data sources in real-time, it ensures that the language model's responses are enhanced by the most current and relevant information available.

improving-ai-context-with-rag-using-azure-machine-learning-prompt-flow-img-2

Image Credit: https://medium.com/@minh.hoque/retrieval-augmented-generation-grounding-ai-responses-in-factual-data-b7855c059322

RAG is widely adopted as the best starting point due to the following factors:

Data Dynamics: Choose RAG for frequently changing data and fine-tuning for static, specialized domains. Like any data wrangling and model training problem, the results are only as good as your data quality.
Resource Availability: With RAG you do not need expansive computational resources and budget like fine-tuning. You will still need skilled resources to implement and test RAG.
Flexibility and Scalability: RAG offers adaptability to continuously add current information and ease of maintenance.

Approaching RAG with Azure Machine Learning Prompt Flow

With a solid foundation of RAG vs fine-tuning, we will dive into the details of an approach to Retrieval Augmented Generation within Azure. Azure provides multiple solutions for creating and accessing vector indexes per Microsoft’s latest documentation. Azure offers 3 methods currently:

Azure AI Studio, use a vector index and retrieval augmentation.
Azure OpenAI Studio, use a search index with or without vectors.
Azure Machine Learning, use a search index as a vector store in a prompt flow.

Azure's approach to RAG lets you tailor the model to your business needs and integrate in private or public facing applications. What remains consistent is the ability to prepare and feed your data into the LLM of your choice. Within Azure Machine Learning Prompt Flow, Microsoft includes a number of practical features including a fact-checking layer alongside the existing model to ensure accuracy. Additionally, you can feed supplementary data directly to your large language models as prompts, enriching their responses with up-to-date and relevant information. Azure Machine Learning simplifies the process to augment your AI powered app with the latest data without the time and financial burdens often associated with comprehensive model retraining. A benefit of using these services is the scalability and security and compliance functions that are native to Azure. A standard feature of Azure Machine Learning for ML models or LLMs is a point and click flow or notebook code interface to build your AI pipelines

1. Data Acquisition and Preparation with Azure Services for Immediate LLM Access:

Azure Blob Storage for Data Storage is perfect for staging your data. These files can be anything from text files to PDFs.

2. Vectorization and Indexing of your Data Using AI Studio and Azure AI Search

This is a step that can be completed using one of multiple approaches including both open source and Azure native. Azure AI Studio significantly simplifies the creation and integration of a vector index for Retrieval-Augmented Generation (RAG) applications. Here are the main steps in the process:

Initialization: Users start by selecting their data sources in Azure AI Studio, choosing from blob storage for easier testing and local file uploads.
Index Creation: The platform guides users through configuring search settings and choosing an index storage location, with a focus on ease of use and minimal need for manual coding.

This is one of many examples of how Azure AI Studio is democratizing the use of advanced RAG applications by merging and integrating different services in the Azure cloud together.

improving-ai-context-with-rag-using-azure-machine-learning-prompt-flow-img-3

SOURCE: Microsoft

3. Constructing RAG Pipelines with Azure Machine Learning:

Simplified RAG Pipeline Creation: With your index created, you can integrate it along with AI search as a plug-and-play component into your Prompt flow. With no/low code interface, you can drag and drop components to create your RAG pipeline.

improving-ai-context-with-rag-using-azure-machine-learning-prompt-flow-img-4

Image Source: Microsoft.com

Customization with Jupyter Notebooks: For those who are comfortable coding in Jupyter notebooks, Azure ML offers the flexibility to utilize Jupyter Notebooks natively. This will provide more control over the RAG pipeline to fit your project's unique needs. Additionally, there are other alternative flows that you can construct using libraries like LangChain as an alternative to using the Azure services.

3. Manage AI Pipeline Operations

Azure Machine Learning provides a foundation designed for iterative and continuous updates. The full lifecycle for model deployment includes test data generation and prompt evaluation. ML and AI operations are needed to understand certain adjustments. For organizations already running Azure ML, prompt flow fits nicely into broader machine learning operations.

Integrating RAG workflow into MLOPs pipeline through codes

from azureml.core import Experiment 
from azureml.pipeline.core import Pipeline 
from azureml.pipeline.steps import PythonScriptStep 
 
# Create an Azure Machine Learning experiment 
experiment_name = 'rag_experiment' 
experiment = Experiment(ws, experiment_name) 
 
# Define a PythonScriptStep for RAG workflow integration 
rag_step = PythonScriptStep(name='RAG Step', 
                            script_name='rag_workflow.py', 
                            compute_target='your_compute_target', 
                            source_directory='your_source_directory', 
                            inputs=[rag_dataset.as_named_input('rag_data')], 
                            outputs=[], 
                            arguments=['--input_data', rag_dataset], 
                            allow_reuse=True) 
 
# Create an Azure Machine Learning pipeline with the RAG step 
rag_pipeline = Pipeline(workspace=ws, steps=[rag_step]) 
 
# Run the pipeline as an experiment 
pipeline_run = experiment.submit(rag_pipeline) 
pipeline_run.wait_for_completion(show_output=True) 
 
Here is the code snippets to create and manage data using Azure. 
 
from azureml.core import Dataset 
 
# Assuming you have a dataset named 'rag_dataset' in your Azure Machine Learning workspace 
rag_dataset = Dataset.get_by_name(ws, 'rag_dataset') 
 
# Split the dataset into training and testing sets 
train_data, test_data = rag_dataset.random_split(percentage=0.8, seed=42) 
 
# Convert the datasets to pandas DataFrames for easy manipulation 
train_df = train_data.to_pandas_dataframe() 
test_df = test_data.to_pandas_dataframe()

Conclusion

It is important to note that the world of AI and LLMs is evolving at a rapid pace where months make a difference. Azure Machine Learning for Retrieval Augmented Generation offers a transformative approach to leveraging Large Language Models and provides a compelling solution for enterprises that already have a competency center. Azure ML machine learning pipelines for data ingestion, robust training, management, and deployment capabilities for RAG is lowering the barrier for dynamic data integration with LLMs like OpenAI. As adoption continues to grow, we will see lots of exciting new use cases and success stories coming from organizations that adopt early and iterate fast. The benefit of Microsoft Azure is a single, managed and supported suite of services some of which already may be deployed within your organization. Azure services to support new AI adoption demands, Retrieval Augmented Generation included!

Author Bio

Ryan Goodman has dedicated 20 years to the business of data and analytics, working as a practitioner, executive, and entrepreneur. He recently founded DataTools Pro after 4 years at Reliant Funding, where he served as the VP of Analytics and BI. There, he implemented a modern data stack, utilized data sciences, integrated cloud analytics, and established a governance structure. Drawing from his experiences as a customer, Ryan is now collaborating with his team to develop rapid deployment industry solutions. These solutions utilize machine learning, LLMs, and modern data platforms to significantly reduce the time to value for data and analytics teams.