Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - LLM

81 Articles
article-image-ai-distilled-28-unveiling-innovations-reshaping-our-world
Merlyn Shelley
11 Dec 2023
13 min read
Save for later

AI_Distilled #28: Unveiling Innovations Reshaping Our World

Merlyn Shelley
11 Dec 2023
13 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!👋 Hello ,“Generative AI has the potential to change the world in ways that we can’t even imagine. It has the power to create new ideas, products, and services that will make our lives easier, more productive, and more creative. It also has the potential to solve some of the world’s biggest problems, such as climate change, poverty, and disease.” -Bill Gates, Microsoft Co-Founder Microsoft Bing’s new Deep Search functionality is a case in point — Bing will now create AI prompts itself to provide detailed insights to user queries in ways traditional search engines can’t even match. Who could have thought LLMs would progress so much they would eventually prompt themselves? Even Runway ML is onto something big with its groundbreaking technology that creates realistic AI generated videos that will find their way to Hollywood. Welcome back to a new issue of AI Distilled - your one-stop destination for all things AI, ML, NLP, and Gen AI. Let’s get started with the latest news and developments across the AI sector:  Elon Musk's xAI Initiates $1 Billion Funding Drive in AI Race Bing’s New Deep Search Expands Queries AI Takes Center Stage in 2023 Word of the Year Lists OpenAI Announces Delay in GPT Store Launch to Next Year ChatGPT Celebrates First Anniversary with 110M Installs and $30M Revenue Milestone Runway ML and Getty Images Collaborate on AI Video Models for Hollywood and Advertising We’ve also curated the latest GPT and LLM resources, tutorials, and secret knowledge: Unlocking AI Magic: A Primer on 7 Essential Libraries for Developers Efficient LLM Fine-Tuning with QLoRA on a Laptop Rapid Deployment of Large Open Source LLMs with Runpod and vLLM’s OpenAI Endpoint Understanding Strategies to Enhance Retrieval-Augmented Generation (RAG) Pipeline Performance Understanding and Mitigating Biases and Toxicity in LLMs Finally, don’t forget to check-out our hands-on tips and strategies from the AI community for you to use on your own projects: A Step-by-Step Guide to Streamlining LLM Data Processing for Efficient Pipelines Fine-Tuning Mistral Instruct 7B on the MedMCQA Dataset Using QLoRA Accelerating Large-Scale Training: A Comprehensive Guide to Amazon SageMaker Data Parallel Library Enhancing LoRA-Based Inference Speed: A Guide to Efficient LoRA Decomposition Looking for some inspiration? Here are some GitHub repositories to get your projects going! tacju/maxtron Tanuki/tanuki.py roboflow/multimodal-maestro 03axdov/muskie Also, don't forget to check our expert insights column, which covers the interesting concepts of NLP from the book 'The Handbook of NLP with Gensim'. It's a must-read!    Stay curious and gear up for an intellectually enriching experience! 📥 Feedback on the Weekly EditionQuick question: How can we foster effective collaboration between humans and AI systems, ensuring that AI complements human skills and enhances productivity without causing job displacement or widening societal gaps?Share your valued opinions discreetly! Your insights could shine in our next issue for the 39K-strong AI community. Join the conversation! 🗨️✨ As a big thanks, get our bestselling "Interactive Data Visualization with Python - Second Edition" in PDF.  Let's make AI_Distilled even more awesome! 🚀 Jump on in! Share your thoughts and opinions here! Writer’s Credit: Special shout-out to Vidhu Jain for their valuable contribution to this week’s newsletter content!  Cheers,  Merlyn Shelley  Editor-in-Chief, Packt  SignUp | Advertise | Archives⚡ TechWave: AI/GPT News & Analysis🏐 Elon Musk's xAI Initiates $1 Billion Funding Drive in AI Race: xAI is on a quest to secure $1 billion in equity, aiming to stay competitive with tech giants like OpenAI, Microsoft, and Google in the dynamic AI landscape. Already amassing $135 million from investors, xAI's total funding goal is disclosed in a filing with the US Securities and Exchange Commission.  🏐 AI Alliance Launched by Tech Giants IBM and Meta: IBM and Meta have formed a new "AI Alliance" with over 50 partners to promote open and responsible AI development. Members include Dell, Intel, CERN, NASA and Sony. The alliance envisions fostering an open AI community for researchers and developers and can help members make progress if they openly share models or not. 🏐 Bing’s New Deep Search Expands Queries: Microsoft is testing a new Bing feature called Deep Search that uses GPT-4 to expand search queries before providing results. Deep Search displays the expanded topics in a panel for users to select the one that best fits what they want to know. It then tailors the search results to that description. Microsoft says the feature can take up to 30 seconds due to the AI generation. 🏐 AI Takes Center Stage in 2023 Word of the Year Lists: In 2023, AI dominates tech, influencing "word of the year" choices. Cambridge picks "hallucinate" for AI's tendency to invent information; Merriam-Webster chooses "authentic" to address AI's impact on reality. Oxford recognizes "prompt" for its evolved role in instructing generative AI, reflecting society's increased integration of AI into everyday language and culture. 🏐 OpenAI Announces Delay in GPT Store Launch to Next Year: OpenAI delays the GPT store release until next year, citing unexpected challenges and postponing the initial December launch plan. Despite recent challenges, including CEO changes and employee unrest, development continues, and updates for ChatGPT are expected. The GPT store aims to be a marketplace for users to sell and share custom GPTs, with creators compensated based on usage. 🏐 ChatGPT Celebrates First Anniversary with 110M Installs and $30M Revenue Milestone: ChatGPT's mobile apps, launched in May 2023 on iOS and later on Android, have exceeded 110 million installs, yielding nearly $30 million in revenue. The success is fueled by the ChatGPT Plus subscription, offering perks. Despite competition, downloads surge, with Android hitting 18 million in a week. The company expects continued growth by year-end 2023. 🏐 Runway ML and Getty Images Collaborate on AI Video Models for Hollywood and Advertising: NYC video AI startup Runway ML, backed by Google and NVIDIA, announces a partnership with Getty Images for the Runway <> Getty Images Model (RGM), a generative AI video model. Targeting Hollywood, advertising, media, and broadcasting, it enables customized content workflows for Runway enterprise customers. 🔮 Expert Insights from Packt Community The Handbook of NLP with Gensim - By Chris Kuo NLU + NLG = NLP NLP is an umbrella term that covers natural language understanding (NLU) and NLG. We’ll go through both in the next sections. NLU Many languages, such as English, German, and Chinese, have been developing for hundreds of years and continue to evolve. Humans can use languages artfully in various social contexts. Now, we are asking a computer to understand human language. What’s very rudimentary to us may not be so apparent to a computer. Linguists have contributed much to the development of computers’ understanding in terms of syntax, semantics, phonology, morphology, and pragmatics. NLU focuses on understanding the meaning of human language. It extracts text or speech input and then analyzes the syntax, semantics, phonology, morphology, and pragmatics in the language. Let’s briefly go over each one: Syntax: This is about the study of how words are arranged to form phrases and clauses, as well as the use of punctuation, order of words, and sentences. Semantics: This is about the possible meanings of a sentence based on the interactions between words in the sentence. It is concerned with the interpretation of language, rather than its form or structure. For example, the word “table” as a noun can refer to “a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs” or a data frame in a computer language. NLU can understand the two meanings of a word in such jokes through a technique called word embedding.  Phonology: This is about the study of the sound system of a language, including the sounds of speech (phonemes), how they are combined to form words (morphology), and how they are organized into larger units such as syllables and stress patterns. For example, the sounds represented by the letters “p” and “b” in English are distinct phonemes. A phoneme is the smallest unit of sound in a language that can change the meaning of a word. Consider the words “pat” and “bat.” The only difference between these two words is the initial sound, but their meanings are different. Morphology: This is the study of the structure of words, including the way in which they are formed from smaller units of meaning called morphemes. It originally comes from “morph,” the shape or form, and “ology,” the study of something. Morphology is important because it helps us understand how words are formed and how they relate to each other. It also helps us understand how words change over time and how they are related to other words in a language. For example, the word “unkindness” consists of three separate morphemes: the prefix “un-,” the root “kind,” and the suffix “-ness.” Pragmatics: This is the study of how language is used in a social context. Pragmatics is important because it helps us understand how language works in real-world situations, and how language can be used to convey meaning and achieve specific purposes. For example, if you offer to buy your friend a McDonald’s burger, a large fries, and a large drink, your friend may reply "no" because he is concerned about becoming fat. Your friend may simply mean the burger meal is high in calories, but the conversation can also imply he may be fat in a social context. Now, let’s understand NLG. NLG While NLU is concerned with reading for a computer to comprehend, NLG is about writing for a computer to write. The term generation in NLG refers to an NLP model generating meaningful words or even articles. Today, when you compose an email or type a sentence in an app, it presents possible words to complete your sentence or performs automatic correction. These are applications of NLG.  This content is from the book The Handbook of NLP with Gensim - By Chris Kuo (Oct 2023). Start reading a free chapter or access the entire Packt digital library free for 7 days by signing up now. To learn more, click on the button below. Read through the Chapter 1 unlocked here...  🌟 Secret Knowledge: AI/LLM Resources🏀 Unlocking AI Magic: A Primer on 7 Essential Libraries for Developers: Discover seven cutting-edge libraries to enhance development projects with advanced AI features. From CopilotTextarea for AI-driven writing in React apps to PrivateGPT for secure, locally processed document interactions, explore tools that elevate your projects and impress users. 🏀 Efficient LLM Fine-Tuning with QLoRA on a Laptop: Explore QLoRA, an efficient memory-saving method for fine-tuning large language models on ordinary CPUs. The QLoRA API supports NF4, FP4, INT4, and INT8 data types for quantization, utilizing methods like LoRA and gradient checkpointing to significantly reduce memory requirements. Learn to implement QLoRA on CPUs, leveraging Intel Extension for Transformers, with experiments showcasing its efficiency on consumer-level CPUs. 🏀 Rapid Deployment of Large Open Source LLMs with Runpod and vLLM’s OpenAI Endpoint: Learn to swiftly deploy open-source LLMs into applications with a tutorial, featuring the Llama-2 70B model and AutoGen framework. Utilize tools like Runpod and vLLM for computational resources and API endpoint creation, with a step-by-step guide and the option for non-gated models like Falcon-40B. 🏀 Understanding Strategies to Enhance Retrieval-Augmented Generation (RAG) Pipeline Performance: Learn optimization techniques for RAG applications by focusing on hyperparameters, tuning strategies, data ingestion, and pipeline preparation. Explore improvements in inferencing through query transformations, retrieval parameters, advanced strategies, re-ranking models, LLMs, and prompt engineering for enhanced retrieval and generation. 🏀 Understanding and Mitigating Biases and Toxicity in LLMs: Explore the impact of ethical guidelines on Large Language Model (LLM) development, examining measures adopted by companies like OpenAI and Google to address biases and toxicity. Research covers content generation, jailbreaking, and biases in diverse domains, revealing complexities and challenges in ensuring ethical LLMs.  🔛 Masterclass: AI/LLM Tutorials🎯 A Step-by-Step Guide to Streamlining LLM Data Processing for Efficient Pipelines: Learn to optimize the development loop for your LLM-powered recommendation system by addressing slow processing times in data pipelines. The solution involves implementing a Pipeline class to save inputs/outputs, enabling efficient error debugging. Enhance developer experience with individual pipeline stages as functions and consider future optimizations like error classes and concurrency. 🎯 Fine-Tuning Mistral Instruct 7B on the MedMCQA Dataset Using QLoRA: Explore fine-tuning Mistral Instruct 7B, an open-source LLM, for medical entrance exam questions using the MedMCQA dataset. Utilize Google Colab, GPTQ version, and LoRA technique for memory efficiency. The tutorial covers data loading, prompt creation, configuration, training setup, code snippets, and performance evaluation, offering a foundation for experimentation and enhancement. 🎯 Accelerating Large-Scale Training: A Comprehensive Guide to Amazon SageMaker Data Parallel Library: This guide details ways to boost Large Language Model (LLM) training speed with Amazon SageMaker's SMDDP. It addresses challenges in distributed training, emphasizing SMDDP's optimized AllGather for GPU communication bottleneck, exploring techniques like EFA network usage, GDRCopy coordination, and reduced GPU streaming multiprocessors for improved efficiency and cost-effectiveness on Amazon SageMaker. 🎯 Enhancing LoRA-Based Inference Speed: A Guide to Efficient LoRA Decomposition: The article highlights achieving three times faster inference for public LoRAs using the Diffusers library. It introduces LoRA, a parameter-efficient fine-tuning technique, detailing its decomposition process and benefits, including quick transitions and reduced warm-up and response times in the Inference API.  🚀 HackHub: Trending AI Tools⚽ tacju/maxtron: Unified meta-architecture for video segmentation, enhancing clip-level segmenters with within-clip and cross-clip tracking modules. ⚽ Tanuki/tanuki.py: Simplifies the creation of apps powered by LLMs in Python by seamlessly integrating well-typed, reliable, and stateless LLM-powered functions into applications. ⚽ roboflow/multimodal-maestro: Empowers developers with enhanced control over large multimodal models, enabling the achievement of diverse outputs through effective prompting tactics. ⚽ 03axdov/muskie: Python-based ML library that simplifies the process of dataset creation and model utilization, aiming to reduce code complexity. 
Read more
  • 0
  • 0
  • 602

article-image-deploying-llms-with-amazon-sagemaker-part-2
Joshua Arvin Lat
30 Nov 2023
19 min read
Save for later

Deploying LLMs with Amazon SageMaker - Part 2

Joshua Arvin Lat
30 Nov 2023
19 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionIn the first part of this post, we showed how easy it is to deploy large language models (LLMs) in the cloud using a managed machine learning service called Amazon SageMaker. In just a few steps, we were able to deploy a MistralLite model in a SageMaker Inference Endpoint. If you’ve worked on real ML-powered projects in the past, you probably know that deploying a model is just the first step! There are definitely a few more steps before we can consider that our application is ready for use.If you’re looking for the link to the first part, here it is: Deploying LLMs with Amazon SageMaker - Part 1In this post, we’ll build on top of what we already have in Part 1 and prepare a demo user interface for our chatbot application. That said, we will tackle the following sections in this post:● Section I: Preparing the SageMaker Notebook Instance (discussed in Part 1)● Section II: Deploying an LLM using the SageMaker Python SDK to a SageMaker Inference Endpoint (discussed in Part 1)● Section III: Enabling Data Capture with SageMaker Model Monitor●  Section IV: Invoking the SageMaker inference endpoint using the boto3 client●  Section V: Preparing a Demo UI for our chatbot application●  Section VI: Cleaning UpWithout further ado, let’s begin!Section III: Enabling Data Capture with SageMaker Model MonitorIn order to analyze our deployed LLM, it’s essential that we’re able to collect the requests and responses to a central storage location. Instead of building our own solution that collects the information we need, we can just utilize the built-in Model Monitor capability of SageMaker. Here, all we need to do is prepare the configuration details and run the update_data_capture_config() method of the inference endpoint object and we’ll have the data capture setup enabled right away! That being said, let’s proceed with the steps required to enable and test data capture for our SageMaker Inference endpoint:STEP # 01: Continuing where we left off in Part 1 of this post, let’s get the bucket name of the default bucket used by our session:s3_bucket_name = sagemaker_session.default_bucket() s3_bucket_nameSTEP # 02: In addition to this, let’s prepare and define a few prerequisites as well:prefix = "llm-deployment" base = f"s3://{s3_bucket_name}/{prefix}" s3_capture_upload_path = f"{base}/model-monitor"STEP # 03: Next, let’s define the data capture config:from sagemaker.model_monitor import DataCaptureConfig data_capture_config = DataCaptureConfig(    enable_capture = True,    sampling_percentage=100,    destination_s3_uri=s3_capture_upload_path,    kms_key_id=None,    capture_options=["REQUEST", "RESPONSE"],    csv_content_types=["text/csv"],    json_content_types=["application/json"] )Here, we specify that we’ll be collecting 100% of the requests and responses that pass through the deployed model.STEP # 04: Let’s enable data capture so that we’re able to save in Amazon S3 the request and response data:predictor.update_data_capture_config(    data_capture_config=data_capture_config )Note that this step may take about 8-10 minutes to complete. Feel free to grab a cup of coffee or tea while waiting!STEP # 05: Let’s check if we are able to capture the input request and output response by performing another sample request:result = predictor.predict(input_data)[0]["generated_text"] print(result)This should yield the following output:"The meaning of life is a philosophical question that has been debated by thinkers and philosophers for centuries. There is no single answer that can be definitively proven, as the meaning of life is subjective and can vary greatly from person to person.\n\nSome people believe that the meaning of life is to find happiness and fulfillment through personal growth, relationships, and experiences. Others believe that the meaning of life is to serve a greater purpose, such as through a religious or spiritual calling, or by making a positive impact on the world through their work or actions.\n\nUltimately, the meaning of life is a personal journey that each individual must discover for themselves. It may involve exploring different beliefs and perspectives, seeking out new experiences, and reflecting on what brings joy and purpose to one's life."Note that it may take a minute or two before the .jsonl file(s) containing the request and response data appear in our S3 bucket.STEP # 06: Let’s prepare a few more examples:prompt_examples = [    "What is the meaning of life?",    "What is the color of love?",    "How to deploy LLMs using SageMaker",    "When do we use Bedrock and when do we use SageMaker?" ] STEP # 07: Let’s also define the perform_request() function which wraps the relevant lines of code for performing a request to our deployed LLM model:def perform_request(prompt, predictor):    input_data = {        "inputs": f"<|prompter|>{prompt}</s><|assistant|>",        "parameters": {            "do_sample": False,            "max_new_tokens": 2000,            "return_full_text": False,        }    }      response = predictor.predict(input_data)    return response[0]["generated_text"] STEP # 08: Let’s quickly test the perform_request() function:perform_request(prompt_examples[0], predictor=predictor)STEP # 09: With everything ready, let’s use the perform_request() function to perform requests using the examples we’ve prepared in an earlier step:from time import sleep for example in prompt_examples:    print("Input:", example)      generated = perform_request(        prompt=example,        predictor=predictor    )    print("Output:", generated)    print("-"*20)    sleep(1)This should return the following:Input: What is the meaning of life? ... -------------------- Input: What is the color of love? Output: The color of love is often associated with red, which is a vibrant and passionate color that is often used to represent love and romance. Red is a warm and intense color that can evoke strong emotions, making it a popular choice for representing love. However, the color of love is not limited to red. Other colors that are often associated with love include pink, which is a softer and more feminine shade of red, and white, which is often used to represent purity and innocence. Ultimately, the color of love is subjective and can vary depending on personal preferences and cultural associations. Some people may associate love with other colors, such as green, which is often used to represent growth and renewal, or blue, which is often used to represent trust and loyalty. ...Note that this is just a portion of the overall output and you should get a relatively long response for each input prompt.Section IV: Invoking the SageMaker inference endpoint using the boto3 clientWhile it’s convenient to use the SageMaker Python SDK to invoke our inference endpoint, it’s best that we also know how to use boto3 as well to invoke our deployed model. This will allow us to invoke the inference endpoint from an AWS Lambda function using boto3.Image 10 — Utilizing API Gateway and AWS Lambda to invoke the deployed LLMThis Lambda function would then be triggered by an event from an API Gateway resource similar to what we have in Image 10. Note that we’re not planning to complete the entire setup in this post but having a working example of how to use boto3 to invoke the SageMaker inference endpoint should easily allow you to build an entire working serverless application utilizing API Gateway and AWS Lambda.STEP # 01: Let’s quickly check the endpoint name of the SageMaker inference endpoint:predictor.endpoint_nameThis should return the endpoint name with a format similar to what we have below:'MistralLite-HKGKFRXURT'STEP # 02: Let’s prepare our boto3 client using the following lines of code:import boto3 import json boto3_client = boto3.client('runtime.sagemaker')STEP # 03: Now, let’s invoke the endpointbody = json.dumps(input_data).encode() response = boto3_client.invoke_endpoint(    EndpointName=predictor.endpoint_name,    ContentType='application/json',    Body=body )   result = json.loads(response['Body'].read().decode())STEP # 04: Let’s quickly inspect the result:resultThis should give us the following:[{'generated_text': "The meaning of life is a philosophical question that has been debated by thinkers and philosophers for centuries. There is no single answer that can be definitively proven, as the meaning of life is subjective and can vary greatly from person to person..."}] STEP # 05: Let’s try that again and print the output text:result[0]['generated_text']This should yield the following output:"The meaning of life is a philosophical question that has been debated by thinkers and philosophers for centuries..."STEP # 06: Now, let’s define perform_request_2 which uses the boto3 client to invoke our deployed LLM:def perform_request_2(prompt, boto3_client, predictor):    input_data = {        "inputs": f"<|prompter|>{prompt}</s><|assistant|>",        "parameters": {            "do_sample": False,            "max_new_tokens": 2000,            "return_full_text": False,        }    }      body = json.dumps(input_data).encode()    response = boto3_client.invoke_endpoint(        EndpointName=predictor.endpoint_name,        ContentType='application/json',        Body=body    )      result = json.loads(response['Body'].read().decode())    return result[0]["generated_text"]STEP # 07: Next, let’s run the following block of code to have our deployed LLM answer the same set of questions using the perform_request_2() function:for example in prompt_examples:    print("Input:", example)      generated = perform_request_2(        prompt=example,        boto3_client=boto3_client,        predictor=predictor    )    print("Output:", generated)    print("-"*20)    sleep(1)This will give us the following output:Input: What is the meaning of life? ... -------------------- Input: What is the color of love? Output: The color of love is often associated with red, which is a vibrant and passionate color that is often used to represent love and romance. Red is a warm and intense color that can evoke strong emotions, making it a popular choice for representing love. However, the color of love is not limited to red. Other colors that are often associated with love include pink, which is a softer and more feminine shade of red, and white, which is often used to represent purity and innocence. Ultimately, the color of love is subjective and can vary depending on personal preferences and cultural associations. Some people may associate love with other colors, such as green, which is often used to represent growth and renewal, or blue, which is often used to represent trust and loyalty. ... Given that it may take a few minutes before the .jsonl files appear in our S3 bucket, let’s wait for about 3-5 minutes before proceeding to the next section. Feel free to grab a cup of coffee or tea while waiting!STEP # 08: Let’s run the following block of code to list the captured data files stored in our S3 bucket:results = !aws s3 ls {s3_capture_upload_path} --recursive resultsSTEP # 09: In addition to this, let’s store the list inside the processed variable:processed = [] for result in results:    partial = result.split()[-1]    path = f"s3://{s3_bucket_name}/{partial}"    processed.append(path)   processedSTEP # 10: Let’s create a new directory named captured_data using the mkdir command:!mkdir -p captured_dataSTEP # 11: Now, let’s download the .jsonl files from the S3 bucket to the captured_data directory in our SageMaker Notebook Instance:for index, path in enumerate(processed):    print(index, path)    !aws s3 cp {path} captured_data/{index}.jsonlSTEP # 12: Let’s define the load_json_file() function which will help us load files with JSON content:import json def load_json_file(path):    output = []      with open(path) as f:        output = [json.loads(line) for line in f]          return outputSTEP # 13: Using the load_json_file() function we defined in an earlier step, let’s load the .jsonl files and store them inside the all variable for easier viewing:all = [] for i, _ in enumerate(processed):    print(f">: {i}")    new_records = load_json_file(f"captured_data/{i}.jsonl")    all = all + new_records     allRunning this will yield the following response:Image 11 — All captured data points inside the all variableFeel free to analyze the nested structure stored in all variables. In case you’re interested in how this captured data can be analyzed and processed further, you may check Chapter 8, Model Monitoring and Management Solutions of my 2nd book “Machine Learning Engineering on AWS”.Section V: Preparing a Demo UI for our chatbot applicationYears ago, we had to spend a few hours to a few days before we were able to prepare a user interface for a working demo. If you have not used Gradio before, you would be surprised that it only takes a few lines of code to set everything up. In the next set of steps, we’ll do just that and utilize the model we’ve deployed in the previous parts of our demo application:STEP # 01: Continuing where we left off in the previous part, let’s install a specific version of gradio using the following command:!pip install gradio==3.49.0STEP # 02: We’ll also be using a specific version of fastapi as well:!pip uninstall -y fastapi !pip install fastapi==0.103.1STEP # 03: Let’s prepare a few examples and store them in a list:prompt_examples = [    "What is the meaning of life?",    "What is the color of love?",    "How to deploy LLMs using SageMaker",    "When do we use Bedrock and when do we use SageMaker?",    "Try again",    "Provide 10 alternatives",    "Summarize the previous answer into at most 2 sentences" ]STEP # 04: In addition to this, let’s define the parameters using the following block of code:parameters = {    "do_sample": False,    "max_new_tokens": 2000, }STEP # 05: Next, define the process_and_response() function which we’ll use to invoke the inference endpoint:def process_and_respond(message, chat_history):    processed_chat_history = ""    if len(chat_history) > 0:        for chat in chat_history:            processed_chat_history += f"<|prompter|>{chat[0]}</s><|assistant|>{chat[1]}</s>"              prompt = f"{processed_chat_history}<|prompter|>{message}</s><|assistant|>"    response = predictor.predict({"inputs": prompt, "parameters": parameters})    parsed_response = response[0]["generated_text"][len(prompt):]    chat_history.append((message, parsed_response))    return "", chat_historySTEP # 06: Now, let’s set up and prepare the user interface we’ll use to interact with our chatbot:import gradio as gr with gr.Blocks(theme=gr.themes.Monochrome(spacing_size="sm")) as demo:    with gr.Row():        with gr.Column():                      message = gr.Textbox(label="Chat Message Box",                                 placeholder="Input message here",                                 show_label=True,                                 lines=12)            submit = gr.Button("Submit")                      examples = gr.Examples(examples=prompt_examples,                                   inputs=message)        with gr.Column():            chatbot = gr.Chatbot(height=900)      submit.click(process_and_respond,                 [message, chatbot],                 [message, chatbot],                 queue=False)Here, we can see the power of Gradio as we only needed a few lines of code to prepare a demo app.STEP # 07: Now, let’s launch our demo application using the launch() method:demo.launch(share=True, auth=("admin", "replacethis1234!"))This will yield the following logs:Running on local URL:  http://127.0.0.1:7860 Running on public URL: https://123456789012345.gradio.live STEP # 08: Open the public URL in a new browser tab. This will load a login page which will require us to input the username and password before we are able to access the chatbot.Image 12 — Login pageSpecify admin and replacethis1234! in the login form to proceed.STEP # 09: After signing in using the credentials, we’ll be able to access a chat interface similar to what we have in Image 13. Here, we can try out various types of prompts.Image 13 — The chatbot interfaceHere, we have a Chat Message Box where we can input and run our different prompts on the left side of the screen. We would then see the current conversation on the right side.STEP # 10: Click the first example “What is the meaning of life?”. This will auto-populate the text area similar to what we have in Image 14:Image 14 — Using one of the examples to populate the Chat Message BoxSTEP # 11:Click the Submit button afterwards. After a few seconds, we should get the following response in the chat box:Image 15 — Response of the deployed modelAmazing, right? Here, we just asked the AI what the meaning of life is.STEP # 12: Click the last example “Summarize the previous answer into at most 2 sentences”. This will auto-populate the text area with the said example. Click the Submit button afterward.Image 16 — Summarizing the previous answer into at most 2 sentencesFeel free to try other prompts. Note that we are not limited to the prompts available in the list of examples in the interface.Important Note: Like other similar AI/ML solutions, there's the risk of hallucinations or the generation of misleading information. That said, it's critical that we exercise caution and validate the outputs produced by any Generative AI-powered system to ensure the accuracy of the results.Section VI: Cleaning UpWe’re not done yet! Cleaning up the resources we’ve created and launched is a very important step as this will help us ensure that we don’t pay for the resources we’re not planning to use.STEP # 01: Once you’re done trying out various types of prompts, feel free to turn off and clean up the resources launched and created using the following lines of code:demo.close() predictor.delete_endpoint()STEP # 02: Make sure to turn off (or delete) the SageMaker Notebook instance as well. I’ll leave this to you as an exercise!Wasn’t that easy?! As you can see, deploying LLMs with Amazon SageMaker is straightforward and easy. Given that Amazon SageMaker handles most of the heavy lifting to manage the infrastructure, we’re able to focus more on the deployment of our machine learning model. We are just scratching the surface as there is a long list of capabilities and features available in SageMaker. If you want to take things to the next level, feel free to read 2 of my books focusing heavily on SageMaker: “Machine Learning with Amazon SageMaker Cookbook” and “Machine Learning Engineering on AWS”.Author BioJoshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO of 3 Australian-owned companies and also served as the Director for Software Development and Engineering for multiple e-commerce startups in the past. Years ago, he and his team won 1st place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and he has been sharing his knowledge in several international conferences to discuss practical strategies on machine learning, engineering, security, and management. He is also the author of the books "Machine Learning with Amazon SageMaker Cookbook", "Machine Learning Engineering on AWS", and "Building and Automating Penetration Testing Labs in the Cloud". Due to his proven track record in leading digital transformation within organizations, he has been recognized as one of the prestigious Orange Boomerang: Digital Leader of the Year 2023 award winners.
Read more
  • 0
  • 0
  • 660

article-image-deploying-llms-with-amazon-sagemaker-part-1
Joshua Arvin Lat
29 Nov 2023
13 min read
Save for later

Deploying LLMs with Amazon SageMaker - Part 1

Joshua Arvin Lat
29 Nov 2023
13 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionHave you ever tried asking a Generative AI-powered chatbot the question: “What is the meaning of life?”. In case you have not tried that yet, here’s the response I got when I tried that myself using a custom chatbot app I built with a managed machine learning (ML) service called Amazon SageMaker.                                              Image 01 — Asking a chatbot the meaning of lifeYou would be surprised that I built this quick demo application myself in just a few hours! In this post, I will teach you how to deploy your own Large Language Models (LLMs) in a SageMaker Inference Endpoint (that is, a machine learning-powered server that responds to inputs) with just a few lines of code.                                                   Image 02 — Deploying an LLM to a SageMaker Inference EndpointWhile most tutorials available teach us how to utilize existing Application Programming Interfaces (APIs) to prepare chatbot applications, it’s best that we also know how to deploy LLMs in our own servers in order to guarantee data privacy and compliance. In addition to this, we’ll be able to manage the long-term costs of our AI-powered systems as well. One of the most powerful solutions available for these types of requirements is Amazon SageMaker which helps us focus on the work we need to do instead of worrying about cloud infrastructure management.We’ll divide the hands-on portion into the following sections:●  Section I: Preparing the SageMaker Notebook Instance●  Section II: Deploying an LLM using the SageMaker Python SDK to a SageMaker Inference Endpoint●  Section III: Enabling Data Capture with SageMaker Model Monitor (discussed in Part 2)●  Section IV: Invoking the SageMaker inference endpoint using the boto3 client (discussed in Part 2)●  Section V: Preparing a Demo UI for our chatbot application (discussed in Part 2)●  Section VI: Cleaning Up (discussed in Part 2) Without further ado, let’s begin!Section I: Preparing the SageMaker Notebook InstanceLet’s start by creating a SageMaker Notebook instance. Note that while we can also do this in SageMaker Studio, running the example in a Sagemaker Notebook Instance should do the trick. If this is your first time launching a SageMaker Notebook instance, you can think of it as your local machine with several tools pre-installed already where we can run our scripts.STEP # 01: Sign in to your AWS account and navigate to the SageMaker console by typing sagemaker in the search box similar to what we have in the following image:                                                           Image 03 — Navigating to the SageMaker consoleChoose Amazon SageMaker from the list of options available as highlighted in Image 03.STEP # 02: In the sidebar, locate and click Notebook instances under Notebook:                                 Image 04 — Locating Notebook instances in the sidebar STEP # 03: Next, locate and click the Create notebook instance button.STEP # 04: In the Create notebook instance page, you’ll be asked to input a few configuration parameters before we’re able to launch the notebook instance where we’ll be running our code:                                                          Image 05 — Creating a new SageMaker Notebook instanceSpecify a Notebook instance name (for example, llm-demo) and select a Notebook instance type. For best results, you may select a relatively powerful instance type (ml.m4.xlarge) where we will run the scripts. However, you may decide to choose a smaller instance type such as ml.t3.medium (slower but less expensive). Note that we will not be deploying our LLM inside this notebook instance as the model will be deployed in a separate inference endpoint (which will require a more powerful instance type such as an ml.g5.2xlarge).STEP # 05:Create an IAM role by choosing Create a new role from the list of options available in the IAM role dropdown (under Permissions and encryption).                                                                             Image 06 — Opening the Jupyter appThis will open the following popup window. Given that we’re just working on a demo application, the default security configuration should do the trick. Click the Create role button.Important Note: Make sure to have a more secure configuration when dealing with production (or staging) work environments.Won’t dive deep into how cloud security works in this post so feel free to look for other resources and references to further improve the current security setup. In case you are interested to learn more about cloud security, feel free to check my 3rd book “Building and Automating Penetration Testing Labs in the Cloud”. In the 7th Chapter of the book (Setting Up an IAM Privilege Escalation Lab), you’ll learn how misconfigured machine learning environments on AWS can easily be exploited with the right sequence of steps.STEP #06: Click the Create notebook instance button. Wait for about 5-10 minutes for the SageMaker Notebook instance to be ready.Important Note: Given that this will launch a resource that will run until you turn it off (or delete it), make sure to complete all the steps in the 2nd part of this post and clean up the created resources accordingly.STEP # 07:Once the instance is ready, click Open Jupyter similar to what we have in Image 07:                                                                            Image 07 — Opening the Jupyter appThis will open the Jupyter application in a browser tab. If this is your first time using this application, do not worry as detailed instructions will be provided in the succeeding steps to help you get familiar with this tool.STEP # 08:Create a new notebook by clicking New and selecting conda_python3 from the list of options available: Image 08 — Creating a new notebook using the conda_python3 kernelIn case you are wondering about what a kernel is, it is simply an “engine” or “environment” with pre-installed libraries and prerequisites that executes the code specified in the notebook cells. You’ll see this in action in a bit.STEP # 09:At this point, we should see the following interface where we can run various types of scripts and blocks of code:                                                                              Image 09 — New Jupyter notebookFeel free to rename the Jupyter Notebook before proceeding to the next step. If you have not used a Jupyter Notebook before, you may run the following line of code by typing the following in the text field and pressing SHIFT + ENTER. print('hello')This should print the output hello right below the text field where we placed our code.Section II: Deploying an LLM using the SageMaker Python SDK to a SageMaker Inference EndpointSTEP # 01: With everything ready, let’s start by installing a specific version of the SageMaker Python SDK: !pip install sagemaker==2.192.1Here, we’ll be using v2.192.1. This will help us ensure that you won’t encounter breaking changes even if you work on the hands-on solutions in this post at a later date.In case you are wondering what the SageMaker Python SDK is, it is simply a software development kit (SDK) with the set of tools and APIs to help developers interact with and utilize the different features and capabilities of Amazon SageMaker.STEP # 02: Next, let’s import and prepare a few prerequisites by running the following block of code: import sagemaker import time sagemaker_session = sagemaker.Session() region = sagemaker_session.boto_region_name role = sagemaker.get_execution_role()STEP # 03: Let’s import HuggingFaceModel and get_huggingface_llm_image_uri as well:from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uriSTEP # 04: Next, let’s define the generate_random_label() function which we’ll use later when naming our resources:from string import ascii_uppercase from random import choice def generate_random_label():    letters = ascii_uppercase      return ''.join(choice(letters) for i in range(10))This will help us avoid naming conflicts when creating and configuring our resources.STEP # 05: Use the get_huggingface_llm_image_uri function we imported in an earlier step to retrieve the container image URI for our LLM. In addition to this, let’s define the model_name we’ll use later when deploying our LLM to a SageMaker endpoint:image_uri = get_huggingface_llm_image_uri( backend="huggingface", region=region, version="1.1.0" ) model_name = "MistralLite-" + generate_random_label()STEP # 06: Before, we proceed with the actual deployment, let’s quickly inspect what we have in the image_uri variable:image_uriThis will output the following variable value:'763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04'STEP # 07: Similarly, let’s check the variable value of model_name model_nameThis will give us the following:'MistralLite-HKGKFRXURT'Note that you’ll get a different model_name value since we’re randomly generating a portion of the model nameSTEP # 08: Let’s prepare the hub model configuration as well:hub_env = { 'HF_MODEL_ID': 'amazon/MistralLite', 'HF_TASK': 'text-generation', 'SM_NUM_GPUS': '1', "MAX_INPUT_LENGTH": '16000', "MAX_TOTAL_TOKENS": '16384', "MAX_BATCH_PREFILL_TOKENS": '16384', "MAX_BATCH_TOTAL_TOKENS":  '16384', }Here, we specify that we’ll be using the MistralLite model. If this is your first time hearing out MistralLite, it is a fine-tuned Mistral-7B-v0.1 language model. It can perform significantly better on several long context retrieve and answering tasks. For more information, feel free to check: https://huggingface.co/amazon/MistralLite.STEP # 09: Let’s initialize the HuggingFaceModel object using some of the prerequisites and variables we’ve prepared in the earlier steps:model = HuggingFaceModel(    name=model_name,    env=hub_env,    role=role,    image_uri=image_uri )STEP # 10: Now, let’s proceed with the deployment of the model using the deploy() method:predictor = model.deploy( initial_instance_count=1, instance_type="ml.g5.2xlarge", endpoint_name=model_name, )Here, we’re using an ml.g5.2xlarge for our inference endpoint.Given that this step may take about 10-15 minutes to complete, feel free to grab a cup of coffee or tea while waiting!Important Note: Given that this will launch a resource that will run until you turn it off (or delete it), make sure to complete all the steps in the 2nd part of this post and clean up the created resources accordingly.STEP # 11: Now, let’s prepare our first input data:question = "What is the meaning of life?" input_data = { "inputs": f"<|prompter|>{question}</s><|assistant|>", "parameters": {    "do_sample": False,    "max_new_tokens": 2000,    "return_full_text": False, } }STEP # 12: With the prerequisites ready, let’s have our deployed LLM process the input data we prepared in the previous step:result = predictor.predict(input_data)[0]["generated_text"] print(result)This should yield the following output:The meaning of life is a philosophical question that has been debated by thinkers and philosophers for centuries. There is no single answer that can be definitively proven, as the meaning of life is subjective and can vary greatly from person to person. ...Looks like our SageMaker Inference endpoint (where the LLM is deployed) is working just fine!ConclusionThat wraps up the first part of this post. At this point, you should have a good idea of how to deploy LLMs using Amazon SageMaker. However, there’s more in store for us in the second part as we’ll build on top of what we have already and enable data capture to help us collect and analyze the data (that is, the input requests and output responses) that pass through the inference endpoint. In addition to this, we’ll prepare a demo user interface utilizing the ML model we deployed in this post.If you’re looking for the link to the second part, here it is: Deploying LLMs with Amazon SageMaker - Part 2We are just scratching the surface as there is a long list of capabilities and features available in SageMaker. If you want to take things to the next level, feel free to read 2 of my books focusing heavily on SageMaker: “Machine Learning with Amazon SageMaker Cookbook” and “Machine Learning Engineering on AWS”.Author BioJoshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO of 3 Australian-owned companies and also served as the Director for Software Development and Engineering for multiple e-commerce startups in the past. Years ago, he and his team won 1st place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and he has been sharing his knowledge in several international conferences to discuss practical strategies on machine learning, engineering, security, and management. He is also the author of the books "Machine Learning with Amazon SageMaker Cookbook", "Machine Learning Engineering on AWS", and "Building and Automating Penetration Testing Labs in the Cloud". Due to his proven track record in leading digital transformation within organizations, he has been recognized as one of the prestigious Orange Boomerang: Digital Leader of the Year 2023 award winners.
Read more
  • 0
  • 0
  • 1524
Banner background image

article-image-llms-for-extractive-summarization-in-nlp
Mostafa Ibrahim
20 Nov 2023
7 min read
Save for later

LLMs For Extractive Summarization in NLP

Mostafa Ibrahim
20 Nov 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!SourceIntroductionIn today's era, filtering out vital information from the overwhelming volume of data has become crucial. As we navigate vast amounts of information, the significance of adept text summarization becomes clear. This process not only conserves our time but also optimizes the use of resources, ensuring we focus on what truly matters.                                                                                                              SourceIn this article, we will delve into the intricacies of text summarization, particularly focusing on the role of Large Language Models (LLMs) in the process. We'll explore their foundational principles, their capabilities in extractive summarization, and the advanced techniques they deploy. Moreover, we'll shed light on the challenges they face and the innovative solutions proposed to overcome them. Without further ado let’s dive in!What are LLMs?LLMs, standing for Large Language Models, are intricate computational structures designed for the detailed analysis and understanding of text. They fall under the realm of Natural Language Processing, a domain dedicated to enabling machines to interpret human language. One of the distinguishing features of LLMs is their vast scale, equipped with an abundance of parameters that facilitate the storage of extensive linguistic data. In the context of summarization, two primary techniques emerge: extractive and abstractive. Extractive summarization involves selecting pertinent sentences or phrases directly from the source material, whereas abstractive summarization synthesizes new sentences that encapsulate the core message in a more condensed manner. With their advanced linguistic comprehension, LLMs are instrumental in both methods, but their proficiency in extractive summarization is notably prominent.Why Utilize LLMs for Extractive Summarization?Extractive summarization entails selecting crucial sentences or phrases from a source document to compose a concise summary. Achieving this demands an intricate and thorough grasp of the document's content, especially when it pertains to extensive and multifaceted texts.The expansive architecture of LLMs, including state-of-the-art models like ChatGPT, grants them the capability to process and analyze substantial volumes of text, surpassing the limitations of smaller models like BERT which can handle only 512 tokens. This considerable size and intricate design allow LLMs to produce richer and more detailed representations of content.LLMs excel not only in recognizing the overt details but also in discerning the implicit or subtle nuances embedded within a text. Given their profound understanding, LLMs are uniquely positioned to identify and highlight the sentences or phrases that truly encapsulate the essence of any content, making them indispensable tools for high-quality extractive summarization.Techniques and Approaches with LLMsWithin the realm of Natural Language Processing (NLP), the deployment of specific techniques to distill vast texts into concise summaries is of paramount importance. One such technique is sentence scoring. In this method, each sentence in a document is assigned a quantitative value, representing its relevance and importance. LLMs, owing to their extensive architectures, can be meticulously fine-tuned to carry out this scoring with high precision, ensuring that only the most pertinent content is selected for summarization.Next, we turn our attention to the attention visualization in LLMs. This technique provides a graphical representation of the segments of text to which the model allocates the most significance during processing. For extractive summarization, this visualization serves as a crucial tool, as it offers insights into which sections of the text the model deems most relevant.Lastly, the integration of hierarchical models enhances the capabilities of LLMs further. These models approach texts in a structured manner, segmenting them into defined chunks before processing each segment for summarization. The inherent capability of LLMs to process lengthy sequences means they can operate efficiently at both the segmentation and the summarization stages, ensuring a comprehensive analysis of extended documents.Practical Implementation of Extractive Summarization Using LLMsIn this section, we offer a hands-on experience by providing a sample code snippet that utilizes a pre-trained Large Language Model known as bert for text summarization. In order to specify extractive summarization we will be using the bert-extractive-summarizer package, which is an extension of the Hugging Face Transformers library. This package provides a simple way to use BERT for extractive summarization.Step 1: Install and Import Nesseccary Libraries!pip install bert-extractive-summarizer from summarizer import SummarizerStep 2: Load the Extractive Bert Summarization ModelIn our case, the LLM of choice is the t5 large model.model = Summarizer()Step 3:  Create a Sample Text to Summarizetext = """Climate change represents one of the most significant challenges facing the world today. It is characterized by changes in weather patterns, rising global temperatures, and increasing levels of greenhouse gases in the atmosphere. The impact of climate change is far-reaching, affecting ecosystems, biodiversity, and human societies across the globe. Scientists warn that immediate action is necessary to mitigate the most severe consequences of this global phenomenon. Strategies to address climate change include reducing carbon emissions, transitioning to renewable energy sources, and conserving natural habitats. International cooperation is crucial, as the effects of climate change transcend national borders, requiring a unified global response. The Paris Agreement, signed by 196 parties at the COP 21 in Paris on 12 December 2015, is one of the most comprehensive international efforts to combat climate change, aiming to limit global warming to well below 2 degrees Celsius."""Step 4: Performing Extractive SummarizationIn this step, we'll be performing extractive summarization, explicitly instructing the model to generate a summary consisting of the two sentences deemed most significant.summary = model(text, num_sentences=2)  # You can specify the number of sentences in the summary print("Extractive Summary:") print(summary)Output for Extractive Summary: Climate change represents one of the most significant challenges facing the world today. The impact of climate change is far-reaching, affecting ecosystems, biodiversity, and human societies across the globe.Challenges and Overcoming ThemThe journey of extractive summarization using LLMs is not without its bumps. A significant challenge is redundancy. Extractive models, in their quest to capture important sentences, might pick multiple sentences conveying similar information, leading to repetitive summaries.Then there's the issue of coherency. Unlike abstractive summarization, where models generate summaries, extractive methods merely extract. The outcome might not always flow logically, hindering a reader's understanding and detracting from the quality.To combat these challenges, refined training methods can be employed. Training data can be curated to include diverse sentence structures and content, pushing the model to discern nuances and reduce redundancy. Additionally, reinforcement learning techniques can be integrated, where the model is rewarded for producing non-redundant, coherent summaries and penalized for the opposite. Over time, through continuous feedback and iterative training, LLMs can be fine-tuned to generate crisp, non-redundant, and coherent extractive summaries.ConclusionIn conclusion, the realm of text summarization, enhanced by the capabilities of Large Language Models (LLMs), presents a dynamic and evolving landscape. Throughout this article, we've journeyed through the foundational aspects of LLMs, their prowess in extractive summarization, and the methodologies and techniques they adopt.While challenges persist, the continuous advancements in the field promise innovative solutions on the horizon. As we move forward, the relationship between LLMs and text summarization will undoubtedly shape the future of how we process and understand vast data volumes efficiently.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium
Read more
  • 0
  • 0
  • 737

article-image-large-language-models-llms-and-knowledge-graphs
Mostafa Ibrahim
15 Nov 2023
7 min read
Save for later

Large Language Models (LLMs) and Knowledge Graphs

Mostafa Ibrahim
15 Nov 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionHarnessing the power of AI, this article explores how Large Language Models (LLMs) like OpenAI's GPT can analyze data from Knowledge Graphs to revolutionize data interpretation, particularly in healthcare. We'll illustrate a use case where an LLM assesses patient symptoms from a Knowledge Graph to suggest diagnoses, showcasing LLM’s potential to support medical diagnostics with precision.Brief Introduction Into Large Language Models (LLMs)Large Language Models (LLMs), such as OpenAI's GPT series, represent a significant advancement in the field of artificial intelligence. These models are trained on vast datasets of text, enabling them to understand and generate human-like language.LLMs are adept at understanding complex questions and providing appropriate responses, akin to human analysis. This capability stems from their extensive training on diverse datasets, allowing them to interpret context and generate relevant text-based answers.While LLMs possess advanced data processing capabilities, their effectiveness is often limited by the static nature of their training data. Knowledge Graphs step in to fill this gap, offering a dynamic and continuously updated source of information. This integration not only equips LLMs with the latest data, enhancing the accuracy and relevance of their output but also empowers them to solve more complex problems with a greater level of sophistication. As we harness this powerful combination, we pave the way for innovative solutions across various sectors that demand real-time intelligence, such as the ever-fluctuating stock market.Exploring Knowledge Graphs and How LLMs Can Benefit From ThemKnowledge Graphs represent a pivotal advancement in organizing and utilizing data, especially in enhancing the capabilities of Large Language Models (LLMs).Knowledge Graphs organize data in a graph format, where entities (like people, places, and things) are nodes, and the relationships between them are edges. This structure allows for a more nuanced representation of data and its interconnected nature. Take the above Knowledge Graph as an example.Doctor Node: This node represents the doctor. It is connected to the patient node with an edge labeled "Patient," indicating the doctor-patient relationship.Patient Node (Patient123): This is the central node representing a specific patient, known as "Patient123." It serves as a junction point connecting to various symptoms that the patient is experiencing.Symptom Nodes: There are three separate nodes representing individual symptoms that the patient has: "Fever," "Cough," and "Shortness of breath." Each of these symptoms is connected to the patient node by edges labeled "Symptom," indicating that these are the symptoms experienced by "Patient123.          To simplify, the Knowledge Graph shows that "Patient123" is a patient of the "Doctor" and is experiencing three symptoms: fever, cough, and shortness of breath. This type of graph is useful in medical contexts where it's essential to model the relationships between patients, their healthcare providers, and their medical conditions or symptoms. It allows for easy querying of related data—for example, finding all symptoms associated with a particular patient or identifying all patients experiencing a certain symptom.Practical Integration of LLMs and Knowledge GraphsStep 1: Installing and Importing the Necessary LibrariesIn this step, we're going to bring in two essential libraries: rdflib for constructing our Knowledge Graph and openai for tapping into the capabilities of GPT, the Large Language Model.!pip install rdflib !pip install openai==0.28 import rdflib import openaiStep 2: Import your Personal OPENAI API KEYopenai.api_key = "Insert Your Personal OpenAI API Key Here"Step 3: Creating a Knowledge Graph# Create a new and empty Knowledge graph g = rdflib.Graph() # Define a Namespace for health-related data namespace = rdflib.Namespace("http://example.org/health/")Step 4: Adding data to Our GraphIn this part of the code, we will introduce a single entry to the Knowledge Graph pertaining to patient124. This entry will consist of three distinct nodes, each representing a different symptom exhibited by the patient.def add_patient_data(patient_id, symptoms):    patient_uri = rdflib.URIRef(patient_id)      for symptom in symptoms:        symptom_predicate = namespace.hasSymptom        g.add((patient_uri, symptom_predicate, rdflib.Literal(symptom))) # Example of adding patient data add_patient_data("Patient123", ["fever", "cough", "shortness of breath"])Step 5: Identifying the get_stock_price functionWe will utilize a simple query in order to extract the required data from the knowledge graph.def get_patient_symptoms(patient_id):    # Correctly reference the patient's URI in the SPARQL query    patient_uri = rdflib.URIRef(patient_id)    sparql_query = f"""        PREFIX ex: <http://example.org/health/>        SELECT ?symptom        WHERE {{            <{patient_uri}> ex:hasSymptom ?symptom.        }}    """    query_result = g.query(sparql_query)    symptoms = [str(row.symptom) for row in query_result]    return symptomsStep 6: Identifying the generate_llm_response functionThe generate_daignosis_response function takes as input the user’s name along with the list of symptoms extracted from the graph. Moving on, the LLM uses such data in order to give the patient the most appropriate diagnosis.def generate_diagnosis_response(patient_id, symptoms):    symptoms_list = ", ".join(symptoms)    prompt = f"A patient with the following symptoms - {symptoms_list} - has been observed. Based on these symptoms, what could be a potential diagnosis?"      # Placeholder for LLM response (use the actual OpenAI API)    llm_response = openai.Completion.create(        model="text-davinci-003",        prompt=prompt,        max_tokens=100    )    return llm_response.choices[0].text.strip() # Example usage patient_id = "Patient123" symptoms = get_patient_symptoms(patient_id) if symptoms:    diagnosis = generate_diagnosis_response(patient_id, symptoms)    print(diagnosis) else:    print(f"No symptoms found for {patient_id}.")Output: The potential diagnosis could be pneumonia. Pneumonia is a type of respiratory infection that causes symptoms including fever, cough, and shortness of breath. Other potential diagnoses should be considered as well and should be discussed with a medical professional.As demonstrated, the LLM connected the three symptoms—fever, cough, and shortness of breath—to suggest that patient123 may potentially be diagnosed with pneumonia.ConclusionIn summary, the collaboration of Large Language Models and Knowledge Graphs presents a substantial advancement in the realm of data analysis. This article has provided a straightforward illustration of their potential when working in tandem, with LLMs to efficiently extract and interpret data from Knowledge Graphs.As we further develop and refine these technologies, we hold the promise of significantly improving analytical capabilities and informing more sophisticated decision-making in an increasingly data-driven world.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium
Read more
  • 0
  • 0
  • 436

article-image-generating-synthetic-data-with-llms
Mostafa Ibrahim
09 Nov 2023
8 min read
Save for later

Generating Synthetic Data with LLMs

Mostafa Ibrahim
09 Nov 2023
8 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionIn this article, we will delve into the intricate process of synthetic data generation using LLMs. We will shed light on the concept behind the increasing importance of synthetic data, the prowess of LLMs in generating such data, and practical steps to harness the power of advanced models like OpenAI’s GPT-3.5. Whether you’re a seasoned AI enthusiast or a curious newcomer, embark with us on this enlightening journey into the heart of modern machine learning.What are LLMs?Large Language Models (LLMs) are state-of-the-art machine learning architectures primarily designed for understanding and generating human-like text. These models are trained on vast amounts of data, enabling them to perform a wide range of language tasks, from simple text completion to answering complex questions or even crafting coherent articles. Some examples of LLMs include:1. GPT-3 by OpenAI, with 175 billion parameters and up to 2048 tokens per unit.2. BERT by Google, with 340 million parameters and up to 512 tokens per unit.3. T5 (Text-to-Text Transfer Transformer by Google) with parameters ranging from 60 million to 11 billion depending on the model size. The number of tokens it can process is also influenced by its size and setup.That being said, LLMs, with their cutting-edge capabilities in NLP tasks like question answering and text summarization, are also highly regarded for their efficiency in generating synthetic data.Why Is There A Need for Synthetic Data1) Data ScarcityDo you ever grapple with the challenge of insufficient data to train your model? This dilemma is a daily reality for machine learning experts globally. Given that data gathering and processing are among the most daunting aspects of the entire machine-learning journey, the significance of synthetic data cannot be overstated.2) Data Privacy & SecurityReal-world data often contains sensitive information. For industries like healthcare and finance, there are stringent regulations around data usage. Such data may include customer’s credit cards, buying patterns, and diseases. Synthetic data can be used without compromising privacy since it doesn't contain real individual information.The Process of Generating Data with LLMsThe journey of producing synthetic data using Large Language Models begins with the preparation of seed data or guiding queries. This foundational step is paramount as it sets the trajectory for the type of synthetic data one wishes to produce. Whether it's simulating chatbot conversations or creating fictional product reviews, these initial prompts provide LLMs with the necessary context.Once the stage is set, we delve into the actual data generation phase. LLMs, with their advanced architectures, begin crafting text based on patterns they've learned from vast datasets. This capability enables them to produce information that aligns with the characteristics of real-world data, albeit synthesized.Generating Synthetic Data Using OpenAI’s GPT 3.5Step 1: Importing Neseccasry Librariesimport openaiStep 2: Set up the OpenAI API keyopenai.api_key = "Insert Your OpenAI key here"Step 3: Define our synthetic data generation functiondef generate_reviews(prompt, count=1):    reviews = []    for i in range(count):        review_generated = False        while not review_generated:            try:                # Generate a response using the ChatCompletion method                response = openai.ChatCompletion.create(                    model="gpt-3.5-turbo",                    messages=[                        {"role": "system", "content": "You are a helpful assistant."},                        {"role": "user", "content": prompt}                    ]                )                              review = response.choices[0].message['content'].strip()                word_count = len(review.split())                print("word count:", word_count)                # Check if the word count is within the desired range                if 15 <= word_count <= 70:                    print("counted")                    reviews.append(review)                    review_generated = True            except openai.error.OpenAIError as err:                print(f"Encountered an error: {err}")        # Optional: Add a slight variation to the prompt for next iteration        prompt += " Provide another perspective."    return reviewsStep 4: Testing our functionprompt_text = "Write a 25 word positive review for a wireless earbud highlighting its battery life." num_datapoints = 5 generated_reviews = generate_reviews(prompt_text, num_datapoints)Step 5: Printing generated synthetic datafor idx, review in enumerate(generated_reviews):    print(f"Review {idx + 1}: {review}")Output:Review 1: The battery life on these wireless earbuds is absolutely incredible! I can enjoy hours of uninterrupted music without worrying about recharging. Truly impressive!Review 2: "The battery life of these wireless earbuds is phenomenal! I can enjoy my favorite music for hours without worrying about recharging. Truly impressive!"Review 3: This wireless earbud is a game-changer! With an exceptional battery life that lasts all day, I can enjoy uninterrupted music and calls without any worries. It's a must-have for people on the go. Another perspective: As a fitness enthusiast, the long battery life of this wireless earbud is a true blessing. It allows me to power through my workouts without constantly needing to recharge, keeping me focused and motivated.Review 4: This wireless earbud's exceptional battery life is worth praising! It lasts all day long, keeping you immersed in your favorite tunes. A real game-changer for music enthusiasts.Review 5: The battery life of these wireless earbuds is exceptional, lasting for hours on end, allowing you to enjoy uninterrupted music or calls. They truly exceed expectations!Considerations and PitfallsHowever, the process doesn't conclude here. Generated data may sometimes have inconsistencies or lack the desired quality. Hence, post-processing, which involves refining and filtering the output, becomes essential. Furthermore, ensuring the variability and richness of the synthetic data is paramount, as too much uniformity can lead to overfitting when the data is employed for machine learning purposes. This refinement process should aim to eliminate any redundant or unrepresentative samples that could skew the model's learning process.Moreover, validating the synthetic data ensures that it meets the standards and purposes for which it was intended, ensuring both authenticity and reliability.ConclusionThroughout this article, we've navigated the process of synthetic data generation powered by LLMs. We've explained the underlying reasons for the escalating prominence of synthetic data, showcased the unparalleled proficiency of LLMs in creating such data, and provided actionable guidance to leverage the capabilities of pre-trained LLM models like OpenAI’s GPT-3.5.For all AI enthusiasts, we hope this exploration has deepened your appreciation and understanding of the evolving tapestry of machine learning,  LLMs, and synthetic data. As we stand now, it is clear that both synthetic data and LLMs will be central to many breakthroughs to come.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium
Read more
  • 0
  • 0
  • 1431
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-palm-2-a-game-changer-in-tackling-real-world-challenges
Sangita Mahala
07 Nov 2023
9 min read
Save for later

PaLM 2: A Game-Changer in Tackling Real-World Challenges

Sangita Mahala
07 Nov 2023
9 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionA new large language model, Google AI's PaLM2, developed from a massive textual and code database. It's a successor of the PaLM program, and is even more powerful in terms of producing text, translating language, writing various types of creative content, and answering your questions by means of information. The research and development of PaLM 2 continues, but it has the potential to shake up many industries and research areas in terms of its ability to address a broad range of complex real-world problems.PaLM 2 is a new large language model from Google AI, trained on a massive dataset of text and code. It is even more powerful than its predecessor, PaLM, and can be used to solve a wide range of complex real-world problems.Powerful Tools for NLP, Code Generation, and Creative Writing by PaLM2In order to learn the complex relationships between words and phrases, LLMs, such as PaLM 2, are trained in massive databases of text and code. For this reason, they make excellent candidates for a wide range of tasks, such as:Natural language processing (NLP): There are also NLP tasks to be performed such as machine translation, text summary, and answering questions. In order to perform these tasks with high accuracy and consistency, PaLM 2 can be used.Code generation: A number of programming languages, including Python, Java, and C++ can be used for generating code by PaLML 2. It can also be useful for tasks like the automation of software development and the creation of new algorithms.Creative writing: Different creative text formats, such as poems, code, scripts, musical notes, emails, letters, etc. may be created by PaLM 2. It could be useful to the tasks of writing advertising copy, producing scripts for films and television shows as well as composing music.Real-World ExamplesTo illustrate how PaLM 2 can be put to use in solving the complicated problems of the actual world, these are some specific examples:Example 1: Drug DiscoveryIn the area of drug discovery, there are many promising applications to be had by PaLM 2. For the generation of new drug candidates, for the prediction of their properties, and for the simulation of their interaction with biological targets, PaLM 2 can be used. This may make it more quickly and efficiently possible for scientists to identify new drugs.In order to produce new drug candidates, PaLM 2 is able to screen several millions of possible compounds with the aim of binding to a specific target protein. This is a highly complex task, but PaLM 2 can speed it up very fast.Input code:import google.cloud.aiplatform as aip def drug_discovery(target_protein): """Uses PaLM 2 to generate new drug candidates for a given target protein. Args:    target_protein: The target protein to generate drug candidates for. Returns:    A list of potential drug candidates. """ # Create a PaLM 2 client. client = aip.PredictionClient() # Set the input prompt. prompt = f"Generate new drug candidates for the target protein {target_protein}." # Make a prediction. prediction = client.predict(model_name="paLM_2", inputs={"text": prompt}) # Extract the drug candidates from the prediction. drug_candidates = prediction.outputs["drug_candidates"] return drug_candidates # Example usage: target_protein = "ACE2" drug_candidates = drug_discovery(target_protein) print(drug_candidates) Output:A list of potential therapeutic candidates for that protein is provided by the function drug_discovery(). The specific output depends on the protein being targeted, and this example is as follows:This indicates that three possible drug candidates for target protein ACE2 have been identified by PaLM 2. In order to determine the effectiveness and safety of these substances, researchers may therefore carry out additional studies.Example 2: Climate ChangeIn order to cope with climate change, PaLM 2 may also be used. In order to model a climate system, anticipate the impacts of climate change and develop mitigation strategies it is possible to use PaLM 2.Using a variety of greenhouse gas emissions scenarios, PaLM 2 can simulate the Earth's climate. This information can be used for the prediction of climate change's effects on temperature, precipitation, and other factors.Input code:import google.cloud.aiplatform as aip def climate_change_prediction(emission_scenario): """Uses PaLM 2 to predict the effects of climate change under a given emission scenario. Args:    emission_scenario: The emission scenario to predict the effects of climate change under. Returns:    A dictionary containing the predicted effects of climate change. """ # Create a PaLM 2 client. client = aip.PredictionClient() # Set the input prompt. prompt = f"Predict the effects of climate change under the emission scenario {emission_scenario}." # Make a prediction. prediction = client.predict(model_name="paLM_2", inputs={"text": prompt}) # Extract the predicted effects of climate change from the prediction. predicted_effects = prediction.outputs["predicted_effects"] return predicted_effects # Example usage: emission_scenario = "RCP8.5" predicted_effects = climate_change_prediction(emission_scenario) print(predicted_effects)  Output:The example given is RCP 8.5, which has been shown to be a large emission scenario. The model estimates that the global temperature will rise by 4.3 degrees C, with precipitation decreasing by 10 % in this scenario.Example 3: Material ScienceIn the area of material science, PaLM 2 may be used to create new materials with desired properties. In order to obtain the required properties such as durability, lightness, and conductivity, it is possible to use PaLM 2 for an assessment of millions of material possibilities.The development of new materials for batteries may be achieved with the use of PaLM 2. It is essential that the batteries be light, long lasting and have high energy density. Millions of potential material for such properties may be identified using PaLM 2.Input code:import google.cloud.aiplatform as aip def material_design(desired_properties): """Uses PaLM 2 to design a new material with the desired properties. Args:    desired_properties: A list of the desired properties of the new material. Returns:    A dictionary containing the properties of the designed material. """ # Create a PaLM 2 client. client = aip.PredictionClient() # Set the input prompt. prompt = f"Design a new material with the following desired properties: {desired_properties}" # Make a prediction. prediction = client.predict(model_name="paLM_2", inputs={"text": prompt}) # Extract the properties of the designed material from the prediction. designed_material_properties = prediction.outputs["designed_material_properties"] return designed_material_properties # Example usage: desired_properties = ["lightweight", "durable", "conductive"] designed_material_properties = material_design(desired_properties) print(designed_material_properties)Output:This means that the model designed a material with the following properties:Density: 1.0 grams per cubic centimeter (g/cm^3)Strength: 1000.0 megapascals (MPa)Conductivity: 100.0 watts per meter per kelvin (W/mK)This is only a prediction based on the language model, and further investigation and development would be needed to make this material real.Example 4: Predicting the Spread of Infectious DiseasesIn order to predict the spread of COVID-19 in a given region, PaLM 2 may be used. Factors that may be taken into account by PaLM2 include the number of infections, transmission, and vaccination rates. The PALM 2 method can also be used to predict the effects of preventive health measures, e.g. mask mandates and lockdowns.Input code:import google.cloud.aiplatform as aip def infectious_disease_prediction(population_density, transmission_rate): """Uses PaLM 2 to predict the spread of an infectious disease in a population with a given population density and transmission rate. Args:    population_density: The population density of the population to predict the spread of the infectious disease in.    transmission_rate: The transmission rate of the infectious disease. Returns:    A dictionary containing the predicted spread of the infectious disease. """ # Create a PaLM 2 client. client = aip.PredictionClient() # Set the input prompt. prompt = f"Predict the spread of COVID-19 in a population with a population density of {population_density} and a transmission rate of {transmission_rate}." # Make a prediction. prediction = client.predict(model_name="paLM_2", inputs={"text": prompt}) # Extract the predicted spread of the infectious disease from the prediction. predicted_spread = prediction.outputs["predicted_spread"] return predicted_spread # Example usage: population_density = 1000 transmission_rate = 0.5 predicted_spread = infectious_disease_prediction(population_density, transmission_rate) print(predicted_spread)Output:An estimated peak incidence for infectious disease is 50%, meaning that half of the population will be affected at a particular time during an outbreak. The total number of anticipated cases is 500,000.It must be remembered that this is a prediction, and the rate at which infectious diseases are spreading can change depending on many factors like the effectiveness of disease prevention measures or how people behave.The development of new medicines, more effective energy systems and materials with desired properties is expected to take advantage of PALM 2 in the future. In order to predict the spread of infectious agents and develop mitigation strategies for Climate Change, PaLM 2 is also likely to be used.ConclusionIn conclusion, several sectors have transformed with the emergence of PaLM 2, Google AI's advanced language model. By addressing the complex problems of today's world, it is offering the potential for a revolution in industry. The applicability of the PALM 2 system to drug discovery, prediction of climate change, materials science, and infectious disease spread forecast is an example of its flexibility and strength.Responsibility and proper use of PaLM 2 are at the heart of this evolving landscape. It is necessary to combine the Model's capacity with human expertise in order to make full use of this potential, while ensuring that its application meets ethics standards and best practices. This technology may have the potential for shaping a brighter future, helping to solve complicated world problems across different fields as we continue our search for possible PaLM 2 solutions.Author BioSangita Mahala is a passionate IT professional with an outstanding track record, having an impressive array of certifications, including 12x Microsoft, 11x GCP, 2x Oracle, and LinkedIn Marketing Insider Certified. She is a Google Crowdsource Influencer and IBM champion learner gold. She also possesses extensive experience as a technical content writer and accomplished book blogger. She is always Committed to staying with emerging trends and technologies in the IT sector.
Read more
  • 0
  • 0
  • 242

article-image-fine-tuning-llama-2
Prakhar Mishra
06 Nov 2023
9 min read
Save for later

Fine-Tuning LLaMA 2

Prakhar Mishra
06 Nov 2023
9 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge Language Models have recently become the talk of the town. I am very sure, you must have heard of ChatGPT. Yes, that’s an LLM, and that’s what I am talking about. Every few weeks, we have been witnessing newer, better but not necessarily larger LLMs coming out either as open-source or closed-source. This is probably the best time to learn about them and make these powerful models work for your specific use case.In today’s blog, we will look into one of the recent open-source models called Llama2 and try to fine-tune it on a standard NLP task of recognizing entities from text. We will first look into what are large language models, what are open-source and closed-source models, and some examples of them. We will then move to learning about Llama2 and why is it so special. We then describe our NLP task and dataset. Finally, we get into coding.About Large Language Models (LLMs)Language models are artificial intelligence systems that have been trained to understand and generate human language. Large Language Models (LLMs) like GPT-3, ChatGPT, GPT-4, Bard, and similar can perform diverse sets of tasks out of the box. Often the quality of output from these large language models is highly dependent on the finesse of the prompt given by the user.These Language models are trained on vast amounts of text data from the Internet. Most of the language models are trained in an auto-regressive way i.e. they try to maximize the probability of the next word based on the words they have produced or seen in the past. This data includes a wide range of written text, from books and articles to websites and social media posts. Language models have a wide range of applications, including chatbots, virtual assistants, content generation, and more. They can be used in industries like customer service, healthcare, finance, and marketing.Since these models are trained on enormous data, they are already good at zero-shot inference and can be steered to perform better with few-shot examples. Zero-shot is a setup in which a model can learn to recognize things that it hasn't explicitly seen before in training. In a Few-shot setting, the goal is to make predictions for new classes based on the few examples of labeled data that is provided to it at inference time.Despite their amazing capabilities of generating text, these humongous models come with a few limitations that must be thought of when building an LLM-based production pipeline. Some of these limitations are hallucinations, biases, and more.Closed and Open-source Language ModelsLarge language models from closed-source are those employed by some companies and are not readily accessible to the public. Training data for these models are typically kept private. While they can be highly sophisticated, this limits transparency, potentially leading to concerns about bias, and data privacy.In contrast, open-source projects like GPT-3, are designed to be freely available to researchers and developers. These models are trained on extensive, publicly available datasets, allowing for a degree of transparency and collaboration.The decision between closed- and open-source language models is influenced by several variables, such as the project's goals, the need for openness, and others.About LLama2Meta's open-source LLM is called Llama 2. It was trained with 2 trillion "tokens" from publicly available sources like Wikipedia, Common Crawl, and books from the Gutenberg project. Three different parameter level model versions are available, i.e. 7 billion, 13 billion, and 70 billion parameter models. There are two types of completion models available: Chat-tuned and General. The chat-tuned models that have been fine-tuned for chatbot-like dialogue are denoted by the suffix '-chat'. We will use general Meta's 7b Llama-2 huggingface model as the base model that we fine-tune. Feel free to use any other version of llama2-7b.Also, if you are interested, there are several threads that you can go through to understand how good is Llama family w.r.t GPT family is - source, source, source.About Named Entity RecognitionAs a component of information extraction, named-entity recognition locates and categorizes specific entities inside the unstructured text by allocating them to pre-defined groups, such as individuals, organizations, locations, measures, and more. NER offers a quick way to understand the core idea or content of a lengthy text.There are many ways of extracting entities from a given text, in this blog, we will specifically delve into fine-tuning Llama2-7b using PEFT techniques on Colab Notebook.We will transform the SMSSpamCollection classification data set for NER. Pretty interesting 😀We search through all 10 letter words and tag them as 10_WORDS_LONG. And this is the entity that we want our Llama to extract. But why this bizarre formulation? I did it intentionally to show that this is something that the pre-trained model would not have seen during the pre-training stage. So it becomes essential to fine-tune it and make it work for our use case 👍. But surely we can add logic to our formulation - think of these words as probable outliers/noisy words. The larger the words, the higher the possibility of it being noise/oov. However, you will have to come up with the extract letter count after seeing the word length distribution. Please note that the code is generic enough for fine-tuning any number of entities. It’s just a change in the data preparation step that we will make to slice out only relevant entities.Code for Fine-tuning Llama2-7b# Importing Libraries from transformers import LlamaTokenizer, LlamaForCausalLM import torch from datasets import Dataset import transformers import pandas as pd from peft import get_peft_model, LoraConfig, TaskType, prepare_model_for_int8_training, get_peft_model_state_dict, PeftModel from sklearn.utils import shuffleData Preparation Phasedf = pd.read_csv('SMSSpamCollection', sep='\t', header=None)  all_text = df[1].str.lower().tolist()  input, output = [], []  for text in all_text:               input.append(text)               output.append({word: '10_WORDS_LONG' for word in text.split() if len(word)==10}) df = pd.DataFrame([input, output]).T df.rename({0:'input_text', 1: 'output_text'}, axis=1, inplace=True) print (df.head(5)) total_ds = shuffle(df, random_state=42) total_train_ds = total_ds.head(4000) total_test_ds = total_ds.tail(1500) total_train_ds_hf = Dataset.from_pandas(total_train_ds) total_test_ds_hf = Dataset.from_pandas(total_test_ds) tokenized_tr_ds = total_train_ds_hf.map(generate_and_tokenize_prompt) tokenized_te_ds = total_test_ds_hf.map(generate_and_tokenize_prompt) Fine-tuning Phase# Loading Modelmodel_name = "meta-llama/Llama-2-7b-hf" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) def create_peft_config(m): peft_cofig = LoraConfig( task_type=TaskType.CAUSAL_LM, inference_mode=False, r=8, lora_alpha=16, lora_dropout=0.05, target_modules=['q_proj', 'v_proj'], ) model = prepare_model_for_int8_training(model) model.enable_input_require_grads() model = get_peft_model(model, peft_cofig) model.print_trainable_parameters() return model, peft_cofig model, lora_config = create_peft_config(model) def generate_prompt(data_point): return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Extract entity from the given input: ### Input: {data_point["input_text"]} ### Response: {data_point["output_text"]}""" tokenizer.pad_token_id = 0 def tokenize(prompt, add_eos_token=True): result = tokenizer( prompt, truncation=True, max_length=128, padding=False, return_tensors=None, ) if ( result["input_ids"][-1] != tokenizer.eos_token_id and len(result["input_ids"]) < 128 and add_eos_token ): result["input_ids"].append(tokenizer.eos_token_id) result["attention_mask"].append(1) result["labels"] = result["input_ids"].copy() return result def generate_and_tokenize_prompt(data_point): full_prompt = generate_prompt(data_point) tokenized_full_prompt = tokenize(full_prompt) return tokenized_full_prompt training_arguments = transformers.TrainingArguments(  per_device_train_batch_size=1, gradient_accumulation_steps=16,  learning_rate=4e-05,  logging_steps=100,  optim="adamw_torch",  evaluation_strategy="steps",  save_strategy="steps",  eval_steps=100,  save_steps=100,  output_dir="saved_models/" ) data_collator = transformers.DataCollatorForSeq2Seq(tokenizer) trainer = transformers.Trainer(model=model, tokenizer=tokenizer, train_dataset=tokenized_tr_ds, eval_dataset=tokenized_te_ds, args=training_arguments, data_collator=data_collator) with torch.autocast("cuda"):       trainer.train()InferenceLoaded_tokenizer = LlamaTokenizer.from_pretrained(model_name) Loaded_model = LlamaForCausalLM.from_pretrained(model_name, load_in_8bit=True, torch.dtype=torch.float16, device_map=’auto’) Model = PeftModel.from_pretrained(Loaded_model, “saved_model_path”, torch.dtype=torch.float16) Model.config.pad_tokeni_id = loaded_tokenizer.pad_token_id = 0 Model.eval() def extract_entity(text):   inp = Loaded_tokenizer(prompt, return_tensor=’pt’).to(“cuda”)   with torch.no_grad():       P_ent = Loaded_tokenizer.decode(model.generate(**inp, max_new_tokens=128)[0], skip_special_tokens=True)       int_idx = P_ent.find(‘Response:’)       P_ent = P_ent[int_idx+len(‘Response:’):]   return P_ent.strip() extracted_entity = extract_entity(text) print (extracted_entity) ConclusionWe covered the process of optimizing the llama2-7b model for the Named Entity Recognition job in this blog post. For that matter, it can be any task that you are interested in. The core concept that one must learn from this blog is PEFT-based training of large language models. Additionally, as pre-trained LLMs might not always perform well in your work, it is best to fine-tune these models.Author BioPrakhar Mishra has a Master’s in Data Science with over 4 years of experience in industry across various sectors like Retail, Healthcare, Consumer Analytics, etc. His research interests include Natural Language Understanding and generation, and has published multiple research papers in reputed international publications in the relevant domain. Feel free to reach out to him on LinkedIn
Read more
  • 0
  • 0
  • 1066

article-image-ai-distilled-24-google-invests-2-billion-in-anthropic-perplexitys-ai-search-engine-bidens-ai-executive-order-data-mining-with-gpt-4-rl-and-aws-deepracer
Merlyn Shelley
03 Nov 2023
13 min read
Save for later

AI_Distilled #24: Google Invests $2 Billion in Anthropic, Perplexity's AI Search Engine, Biden's AI Executive Order, Data Mining with GPT-4, RL and AWS Deepracer

Merlyn Shelley
03 Nov 2023
13 min read
👋 Hello ,Welcome to another captivating edition of AI_Distilled, featuring recent advancements in training and fine-tuning LLMs, GPT and AI models for enhanced business outcomes.Let’s begin our news and analysis with an industry expert’s opinion.  “Artificial intelligence is the science of making machines do things that would require intelligence if done by humans” – John McCarthy, Computer Scientist and AI Visionary. AI does indeed make machines intelligent, so much so that industry titans are now waging a proxy AI war with billions in startup funding. Without a doubt, AI is onto something big! In this week, we’ll talk about Biden's AI Executive Order, which has been praised for scope but deemed insufficient without legislation, Perplexity's AI Search Engine, OpenAI launching new team and challenge to prepare for catastrophic risks of advanced AI, Google Invests $2 Billion in Anthropic, and updating its Bug Bounty program to address AI security concerns. Look out for your fresh dose of AI resources, secret knowledge, and tutorials on how to use custom AI models to enhance complex technical workflows, improving LLM understanding with user feedback, and essential text preprocessing for effective machine learning with Python. 📥 Feedback on the Weekly EditionWhat do you think of this issue and our newsletter?Please consider taking the short survey below to share your thoughts and you will get a free PDF of the “The Applied Artificial Intelligence Workshop” eBook upon completion. Complete the Survey. Get a Packt eBook for Free!Writer’s Credit: Special shout-out to Vidhu Jain for their valuable contribution to this week’s newsletter content!  Cheers,  Merlyn Shelley  Editor-in-Chief, Packt  SignUp | Advertise | Archives⚡ TechWave: AI/GPT News & Analysis🔹 OpenAI Launches New Team and Challenge to Prepare for Catastrophic Risks of Advanced AI: The ChatGPT creator announced new efforts to prepare for potential catastrophic risks associated with highly advanced AI systems. The company is forming a new internal team called "Preparedness" to assess risks ranging from cybersecurity threats to autonomous biological replication. It is also launching an "AI Preparedness Challenge" with prize money to crowdsource ideas for preventing misuse of advanced AI. OpenAI says it aims to benefit humanity with cutting-edge AI while taking seriously the full spectrum of safety risks.🔹 Biden's AI Executive Order Praised for Scope but Deemed Insufficient Without Legislation: President Biden recently issued an executive order on AI that experts say covers important ground but lacks teeth without accompanying legislation from Congress. The order establishes guidelines and oversight for AI development and use, including in healthcare. However, many provisions simply codify voluntary industry practices. Stakeholders say Congress must pass more comprehensive AI regulations, but partisan disputes make near-term action unlikely.  🔹 Google Updates Bug Bounty Program to Address AI Security Concerns: Google has expanded its vulnerability rewards program to include incentives for discovering potential abuses of artificial intelligence systems. The update comes as worries grow over generative AI being exploited maliciously. Under the revised guidelines, security researchers can earn financial rewards for uncovering AI training data extraction that leaks private information. The move aligns with AI companies' recent White House pledge to better identify AI vulnerabilities.  🔹 Perplexity's AI Search Engine Garners $500M Valuation After New Funding: The AI startup Perplexity recently secured additional funding led by venture capital firm IVP, garnering a $500 million valuation. Perplexity is developing a conversational search engine to challenge Google's dominance using artificial intelligence. The company's iOS app and website traffic have been growing steadily amid rising interest in AI like ChatGPT. With deep ties to Google researchers, Perplexity leverages LLMs and has attracted investments from major industry figures.  🔹 Tech Giants Wage Proxy AI War with Billions in Startup Funding As Google Invests $2 Billion in Anthropic: Major technology companies like Google, Microsoft, and Amazon are investing billions in AI startups like OpenAI and Anthropic as surrogates in the race to lead the AI space. Unable to quickly build their own capabilities in large language models, the tech giants are funneling massive sums into the AI leaders to gain ownership stakes and technology access. Anthropic's $2 billion funding from Google follows similar multibillion investments from Microsoft and Amazon, fueling an expensive AI innovation war by proxy.  🔹 Poe Unveils Monetization for Third-Party Conversational AI Developers: The AI chatbot platform Poe has introduced a new revenue sharing model to let creators’ profit from building specialized bots. Poe will split subscription fees and pay per-message charges to offset infrastructure costs. An open API also allows adding custom natural language models beyond Poe's defaults. The moves aim to spur innovation by empowering niche developers. Poe believes reducing barriers will increase diversity, not just competition.   🔮 Expert Insights from Packt Community Generative AI with Python and TensorFlow 2 - By Joseph Babcock , Raghav Bali  Kubeflow: an end-to-end machine learning lab As was described at the beginning of this chapter, there are many components of an end-to-end lab for machine learning research and development (Table 2.1), such as: A way to manage and version library dependencies, such as TensorFlow, and package them for a reproducible computing environment Interactive research environments where we can visualize data and experiment with different settings A systematic way to specify the steps of a pipeline – data processing, model tuning, evaluation, and deployment Provisioning of resources to run the modeling process in a distributed manner Robust mechanisms for snapshotting historical versions of the research process As we described earlier in this chapter, TensorFlow was designed to utilize distributed resources for training. To leverage this capability, we will use the Kubeflow projects. Built on top of Kubernetes, Kubeflow has several components that are useful in the end-to-end process of managing machine learning applications. Using Kubeflow Katib to optimize model hyperparameters Katib is a framework for running multiple instances of the same job with differing inputs, such as in neural architecture search (for determining the right number and size of layers in a neural network) and hyperparameter search (finding the right learning rate, for example, for an algorithm). Like the other Customize templates we have seen, the TensorFlow job specifies a generic TensorFlow job, with placeholders for the parameters: apiVersion: "kubeflow.org/v1alpha3" kind: Experiment metadata:  namespace: kubeflow  name: tfjob-example spec: parallelTrialCount: 3  maxTrialCount: 12  maxFailedTrialCount: 3  objective:    type: maximize    goal: 0.99    objectiveMetricName: accuracy_1  algorithm:    algorithmName: random  metricsCollectorSpec:    source:      fileSystemPath:        path: /train        kind: Directory    collector:      kind: TensorFlowEvent  parameters:    - name: --learning_rate      parameterType: double      feasibleSpace:        min: "0.01"        max: "0.05"    - name: --batch_size      parameterType: int      feasibleSpace:        min: "100"        max: "200"  trialTemplate:    goTemplate:        rawTemplate: |-          apiVersion: "kubeflow.org/v1"          kind: TFJob          metadata:            name: {{.Trial}}            namespace: {{.NameSpace}}          spec:           tfReplicaSpecs:            Worker:              replicas: 1               restartPolicy: OnFailure              template:                spec:                  containers:                    - name: tensorflow                       image: gcr.io/kubeflow-ci/tf-mnist-with-                             summaries:1.0                      imagePullPolicy: Always                      command:                        - "python"                        - "/var/tf_mnist/mnist_with_summaries.py"                        - "--log_dir=/train/metrics"                        {{- with .HyperParameters}}                        {{- range .}}                        - "{{.Name}}={{.Value}}"                        {{- end}}                        {{- end}}  which we can run using the familiar kubectl syntax: kubectl apply -fhttps://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha3/tfjob-example.yaml This content is from the book “Generative AI with Python and TensorFlow 2” by Joseph Babcock , Raghav Bali (April 2021). Start reading a free chapter or access the entire Packt digital library free for 7 days by signing up now. To learn more, click on the button below. Read through the Chapter 1 unlocked here...  🌟 Secret Knowledge: AI/LLM Resources🔹 How to Use Custom AI Models to Enhance Complex Technical Workflows: In this post, you'll learn how Nvidia’s researchers leveraged customized LLMs to streamline intricate semiconductor chip design. The research demonstrates how to refine foundation models into customized assistants that understand industry-specific patterns. You'll see how careful data cleaning and selection enables high performance even with fewer parameters. The post explores step-by-step instructions on how researchers built a specialized AI that helps with writing code, improving documentation, and optimizing complex technical workflows.  🔹 How to Build Impactful LLM Applications: In this post, you'll explore lessons learned from creating Microsoft's Copilot products, such as Viva and PowerPoint. It discusses how combining LLMs with app context and other ML models can be a game-changer and demonstrates how parsing user queries and responses enables precise skill activation. By following their approach of utilizing multiple models to summarize insights without losing nuance, you can gain practical tips for your own LLM application development. 🔹 Understanding Convolutional Neural Networks and Vision Transformers: A Mathematical Perspective: You'll learn about convolutional neural networks and vision transformers in this post. They're great for image classification but differ in math, especially for generative tasks. You'll see how their training budgets work and understand their unique math. We'll also discuss their differences in complexity and memory usage. Plus, you'll learn why convolutional nets handle spatial coherence naturally, while vision transformers might need some help. By the end, you'll know why transformers are better for generating sequential data.  🔹 Improving Large Language Model Understanding with User Feedback: The post focuses on improving user intent detection for LLMs by utilizing disambiguation, context, and MemPrompt. These techniques enhance LLM responses, enabling better understanding of user intent, offering real-time feedback, and enhancing LLM performance and utility. 🔹 The Power of High-Quality Data in Language Models: The article emphasizes the significance of high-quality data for Large Language Models (LLMs). It introduces the concept of alignment, discussing how it influences LLM behavior. The article stresses the vital role of data quality and diversity in optimizing LLM performance and capabilities.  💡 Masterclass: AI/LLM Tutorials🔹 Enhance Language Model Performance with Step-Back Prompting: This guide explores the use of Step-Back Prompting to enhance LLMs' performance in complex tasks, like knowledge-intensive QA and multi-hop reasoning. It offers a step-by-step tutorial, including package setup and data collection, to implement this approach, potentially improving AI model behavior and responses.  🔹 Boosting AI at Scale with Vectorized Databases: This guide explores how vectorized databases are transforming LLMs like GPT-3 by enhancing their capabilities and scalability. It explains the principles of LLMs and the role of vectorized databases in empowering them. It discusses efficient data retrieval, optimization of vector operations, and scaling for real-time responses. The guide highlights use cases, including content generation and recommendation systems, where vectorized databases excel, and addresses the challenges of adopting them for LLMs. 🔹 Mastering Data Mining with GPT-4: A Practical Guide Using Seattle Weather Data: This guide explores the use of GPT-4 for data mining using Seattle's weather dataset. It covers AI's potential in data mining, detailing the process from exploratory data analysis to clustering and anomaly detection. GPT-4 assists in data loading, EDA, data cleaning, feature engineering, and suggests clustering methods. The post highlights the collaborative aspect of AI-human interaction and how GPT-4 can improve data mining and data analysis in the field of data science. 🔹 Introduction to Reinforcement Learning and AWS Deepracer: This post introduces reinforcement learning, a machine learning approach focused on maximizing rewards through agent-environment interactions. It compares it to motivating students based on performance. It explores practical applications via AWS Deepracer for self-driving cars, explaining key components and mentioning the Deepracer Student League as a learning opportunity.  🔹 Essential Text Preprocessing for Effective Machine Learning with Python: This post highlights crucial text preprocessing techniques for machine learning. It emphasizes the need to clean text data to avoid interference and unintended word distinctions. The methods, including removing numbers and handling extra spaces, enhance text data quality for effective machine learning applications.  🚀 HackHub: Trending AI Tools🔹 Pythagora-io/gpt-pilot: Boosts app development speed 20x via requirement specification, oversight, and coding assistance through clarifications and reviews. 🔹 hkuds/rlmrec: PyTorch implementation for the RLMRec model, enhancing recommenders with LLMs for advanced representation learning in recommendation systems. 🔹 THUDM/AgentTuning: Empowers LLMs by instruction-tuning them with interaction trajectories from various agent tasks, enhancing their generalization and language abilities. 🔹 cpacker/MemGPT: Enhances LLMs by intelligently managing memory tiers, enabling extended context and perpetual conversations.
Read more
  • 0
  • 0
  • 219

article-image-debugging-and-monitoring-llms-with-weights-biases
Mostafa Ibrahim
31 Oct 2023
6 min read
Save for later

Debugging and Monitoring LLMs With Weights & Biases

Mostafa Ibrahim
31 Oct 2023
6 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge Language Models, or LLMs for short, are becoming a big deal in the world of technology. They're powerful and can do a lot, but they're not always easy to handle. Just like when building a big tower, you want to make sure everything goes right from the start to the finish. That's where Weights & Biases, often called W&B, comes in. It's a tool that helps people keep an eye on how their models are doing. In this article, we'll talk about why it's so important to watch over LLMs, how W&B helps with that, and how to use it. Let's dive in!Large Language Models (LLMs)Large Language Models (LLMs) are machine learning models trained on vast amounts of text data to understand and generate human-like text. They excel in processing and producing language, enabling various applications like translation, summarization, and conversation.LLMs, such as GPT-3 by OpenAI, utilize deep learning architectures to learn patterns and relationships in the data, making them capable of sophisticated language tasks. Through training on diverse datasets, they aim to comprehend context, semantics, and nuances akin to human communication.When discussing the forefront of natural language processing, several Large Language Models (LLMs) consistently emerge: The Need for Debugging & Monitoring LLMsUnderstanding and overseeing Large Language Models (LLMs) is much like supervising an intricate machine: they're powerful, and versatile, but require keen oversight.Firstly, think about the intricacy of LLMs. They far surpass the complexity of your typical day-to-day machine learning models. While they hold immense potential to revolutionize tasks involving language - think customer support, content creation, and translations - their intricate designs can sometimes misfire. If we're not careful, instead of a smooth conversation with a chatbot, users might encounter bewildering responses, leading to user frustration and diminished trust.Then there's the matter of resources. Training LLMs isn't just about the time; it's also financially demanding. Each hiccup, if not caught early, can translate to unnecessary expenditures. It's much like constructing a skyscraper; mid-way errors are costlier to rectify than those identified in the blueprint phase.Introduction to Weights & BiasesSourceWeights & Biases (W&B) is a cutting-edge platform tailored for machine learning practitioners. It offers a suite of tools designed to help streamline the model development process, from tracking experiments to visualizing results.With W&B, researchers and developers can efficiently monitor their LLM training progress, compare different model versions, and collaborate with team members. It's an invaluable asset for anyone looking to optimize and scale their machine-learning workflows.How to Use W&B for Debugging & Monitoring LLMsIn the hands-on section of this article, we will adhere to the following structured approach, illustrated in the diagram below. We will fine-tune our model and leverage Weights and biases to save critical metrics, tables, and visualizations. This will empower us with deeper insights, enabling efficient debugging and monitoring of our Large Language Models. 1. Setting up Weights and Biasesa. Importing Necessary Librariesimport torch import wandb from transformers import BertTokenizer, BertForSequenceClassification from torch.utils.data import DataLoader, random_split from datasets import load_datasetIntizailaizing W&B # Initialize W&B wandb.init(project='llm_monitoring', name='bert_example')b. Loading the BERT Model# Load tokenizer and model tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForSequenceClassification.from_pretrained('bert-base-uncased')2. Fine-tuning your Modela. Loading your datasetdataset = load_dataset('Load your dataset')b. Fine-tuning the modelfor epoch in range(config.epochs):    model.train()    for batch in train_dataloader:       # ……….       # Continue training process here       # ………..3. Tracking Metrics# Log the validation metrics to W&B    wandb.log({        "Epoch": epoch,        "Validation Loss": avg_val_loss,        "Validation Accuracy": val_accuracy    })4. Graph Visualizationsa. Plotting and logging Training Loss Graphfig, ax = plt.subplots(figsize=(10,5)) ax.plot(train_losses, label="Training Loss", color='blue') ax.set(title="Training Losses", xlabel="Epoch", ylabel="Loss") wandb.log({"Training Loss Curve": wandb.Image(fig)})b. Plotting and logging Validation Loss Graphfig, ax = plt.subplots(figsize=(10,5)) ax.plot(val_losses, label="Validation Loss", color='orange') ax.set(title="Validation Losses", xlabel="Epoch", ylabel="Loss") wandb.log({"Validation Loss Curve": wandb.Image(fig)})c. Plotting and Log Validation Accuracy Graphfig, ax = plt.subplots(figsize=(10,5)) ax.plot(val_accuracies, label="Validation Accuracy", color='green') ax.set(title="Validation Accuracies", xlabel="Epoch", ylabel="Accuracy") wandb.log({"Validation Accuracy Curve": wandb.Image(fig)})d. Plotting and Log Training Accuracy Graphfig, ax = plt.subplots(figsize=(10,5)) ax.plot(train_accuracies, label="Training Accuracy", color='blue') ax.set(title="Training Accuracies", xlabel="Epoch", ylabel="Accuracy") wandb.log({"Training Accuracy Curve": wandb.Image(fig)})5. Manual Checkupsquestions = ["What's the weather like?", "Who won the world cup?", "How do you make an omelette?", "Why is the sky blue?", "When is the next holiday?"] old_model_responses = ["It's sunny.", "France won the last one.", "Mix eggs and fry them.", "Because of the atmosphere.", "It's on December 25th."] new_model_responses = ["The weather is clear and sunny.", "Brazil was the champion in the previous world cup.", "Whisk the eggs, add fillings, and cook in a pan.", "Due to Rayleigh scattering.", "The upcoming holiday is on New Year's Eve."] # Create a W&B Table table = wandb.Table(columns=["question", "old_model_response", "new_model_response"]) for q, old, new in zip(questions, old_model_responses, new_model_responses):    table.add_data(q, old, new) # Log the table to W&B wandb.log({"NLP Responses Comparison": table}) 6. Closing the W&B run after all logs are uploadedwandb.finish()ConclusionLarge Language Models have truly transformed the landscape of technology. Their vast capabilities are nothing short of amazing, but like all powerful tools, they require understanding and attention. Fortunately, with platforms like Weights & Biases, we have a handy toolkit to guide us. It reminds us that while LLMs are game-changers, they still need a bit of oversight.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium
Read more
  • 0
  • 0
  • 319
article-image-evaluating-large-language-models
Vivekanandan Srinivasan
27 Oct 2023
8 min read
Save for later

Evaluating Large Language Models

Vivekanandan Srinivasan
27 Oct 2023
8 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLLM is the Large Language Model or the advanced artificial intelligence algorithms usually trained with vast amounts of text data. Such language models help to generate human-like languages. These models can also perform language-related tasks, including translation, text, competition, answering specific questions, and more.In this technological advancement era, several large language models are on the rise. Despite this, no standardized or fixed measures are used to compare or evaluate the quality of large language models.Here, let us dive into the existing evaluation and compare the framework for large language models. Also, we will analyze the factors on which these large language models should be evaluated.Evaluating Large Language ModelsNeed for a comprehensive evaluation framework Identifying different areas of improvement during the early developmental stages is relatively easy. However, with the advancement of technology and the availability of new alternatives, determining the best becomes increasingly tricky. Therefore, it is essential to have a reliable evaluation framework, helping to judge the quality of large language models accurately. Besides, the need for an immediate, authentic evaluation framework becomes imperative. One can use such a framework in the following ways.Only a proper framework will help the authorities and agencies to assess the accuracy, safety, usability issues, and reliability of the model.The blind race among the big technical companies to release large language models is on the rise. Hence, with the development of a comprehensive evaluation framework, one can help stakeholders to remove the model more responsibly.The comprehensive evaluation framework would help the user of large language models determine how and where to fine-tune the model to enable practical deployment.Issues with the existing framework  Every large language model has its advantages. However, certain factors are an issue and make the frameworks insufficient. Some of these issues includeSafety: Some of the framework does not consider protection a factor for evaluation. Although the open AI moderation API addresses safety to some extent, it is insufficient.Self-sufficiency: Regarding factors, one can evaluate the models; the frameworks are scattered. All of these frameworks need to be more comprehensive to be self-sufficient.Factors to be considered while evaluating large language modelsOnly after reviewing the existing evaluation framework can one determine the factors that must be considered while assessing the quality of large language models.Here are the key factors:Model Size and ComplexityThe primary factors to evaluate in LLMs are their size and complexity. It often gets indicated by the number of parameters. Generally, larger models have a greater capacity to understand context and generate nuanced responses. With the advent of huge models, one might require substantial computational resources, making them impractical for specific applications. Evaluators must balance model size and computational efficiency based on the use case.Training Data Quality and DiversityThe training data's quality and diversity significantly influence LLMs' performance. As users, we know that models get trained on diverse and representative datasets from various sources and tend to have a broader understanding of language nuances. However, evaluators should scrutinize the sources and types of data used for training to ensure the model's reliability across different contexts and domains.Bias and FairnessBias in LLMs is a critical concern, as it can generate discriminatory or unfair content. Evaluators must assess the model's bias, both in the training data and the generated output, and implement strategies to mitigate biases. Besides, ethical considerations demand continuous efforts to improve fairness, ensuring that the models do not reinforce societal biases.Ethical Considerations and Responsible UseEvaluating LLMs extends beyond technical aspects to ethical considerations. Responsible deployment of these models requires a thorough assessment of potential misuse scenarios. In every case, evaluators must devise guidelines and ethical frameworks to prevent generating harmful or malicious content, emphasizing the responsible use of LLMs in applications such as content moderation and chatbots.Fine-Tuning and Transfer Learning LLMs are often fine-tuned on specific datasets to adapt them to particular tasks or domains. One should scrutinize the fine-tuning process to ensure the model maintains its integrity and performance while being customized. Additionally, assessing the effectiveness of transfer learning, where models trained on one task are applied to related tasks, is crucial for understanding their adaptability and generalizability.Explainability and InterpretabilityUnderstanding how LLMs arrive at specific conclusions is essential, especially in applications like legal document analysis and decision-making processes. Being an evaluator, one must assess the model's explainability and interpretability. Transparent models enable users to trust the generated output and comprehend the reasoning behind the responses, fostering accountability and reliability.Robustness and Adversarial Attacks Evaluating the robustness of LLMs involves assessing their performance under various conditions, including noisy input, ambiguous queries, or adversarial attacks. Rigorous testing against potential negative inputs helps identify vulnerabilities and weaknesses in the model, guiding the implementation of robustness-enhancing techniques.Continuous Monitoring and ImprovementThe landscape of language understanding is ever-evolving. Continuous monitoring and improvement are vital aspects of evaluating LLMs. Regular updates, addressing emerging challenges, and incorporating user feedback contribute to the model's ongoing enhancement, ensuring its relevance and reliability over time.Step-by-Step Guide: Comparing LLMs Using Perplexity1. Load Language Model: Load the pre-trained LLM using a library like Hugging Face Transformers.2. Prepare Dataset: Tokenize and preprocess your dataset for the language model.3. Train/Test Split: Split the dataset into training and testing sets.4. Train LLM: Fine-tune the LLM on the training dataset.5. Calculate Perplexity: Use the testing dataset to calculate perplexity.Code example: # Calculate Perplexityfrom math import exp from transformers import GPT2LMHeadModel, GPT2Tokenizer tokenizer = GPT2Tokenizer.from_pretrained("gpt2") model = GPT2LMHeadModel.from_pretrained("gpt2") input_text = "Example input text for perplexity calculation." input_ids = tokenizer.encode(input_text, return_tensors="pt") with torch.no_grad():    output = model(input_ids)    loss = output.loss perplexity = exp(loss) print("Perplexity:", perplexity)Methods of evaluation Quantitative Performance Metrics and Benchmarking Evaluating LLMs requires rigorous quantitative assessment using industry-standard metrics. BLEU, METEOR, and ROUGE scores are pivotal in assessing text generation quality by comparing generated text with human references. For translation tasks, BLEU (Bilingual Evaluation Understudy) calculates the overlap of n-grams between the machine-generated text and human reference translations. METEOR evaluates precision, recall, and synonymy, providing a nuanced understanding of translation quality. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) emphasizes summary evaluation, emphasizing memory. These metrics offer quantitative benchmarks, enabling direct comparison between different LLMs. Additionally, perplexity, a measure of how well a language model predicts a sample text, provides insights into language model efficiency. Lower perplexity values indicate better prediction accuracy, highlighting the model's coherence and understanding of the input text. Often applied to large-scale datasets like WMT (Workshop on Machine Translation) or COCO (Common Objects in Context), these quantitative metrics, off LLM, are a robust foundation for comparing LLMs' performance.Diversity Analysis and Bias Detection Diversity and bias analysis are paramount in evaluating LLMs, ensuring equitable and inclusive performance across diverse demographics and contexts. One critical approach involves employing word embedding techniques, such as Word Embedding Association Test (WEAT), to quantify biases. WEAT assesses associations between word embeddings and predefined categories, unveiling tendencies present in LLMs. By evaluating gender, race, or cultural preferences, organizations can ensure fair and unbiased responses, aligning with ethical considerations.Furthermore, demographic diversity analysis measures the model's performance across different demographic groups. Assessing demographic parity ensures that LLMs provide consistent, unbiased results across various user segments. This comprehensive evaluation approach, deeply rooted in fairness and inclusivity, is pivotal in selecting socially responsible LLMs.Real-World User Studies and Interaction AnalysisIncorporating real-world user studies and interaction analysis is indispensable for evaluating LLMs in practical scenarios. Conducting user tests and surveys provides qualitative insights into user satisfaction, comprehension, and trust. These studies consider how well LLM-generated content aligns with users' expectations and domain-specific contexts.Additionally, analyzing user interactions with LLM-generated content through techniques like eye-tracking studies and click-through rates provides valuable behavioral data. Heatmap analysis, capturing user attention patterns, offers insights into the effectiveness of LLM-generated text elements. User feedback and interaction analysis inform iterative improvements, ensuring that LLMs are technically robust, user-centric, and aligned with real-world application requirements.ConclusionWith the development of large language models, natural language processing experienced a revolution. However, the need for a standardized and comprehensive evaluation framework remains a necessity. It helps in assessing the quality of these LLM models. Though the existing framework offers valuable insights, it needs more standardization and comprehensiveness. At the same time, it does not consider safety as an evaluation factor. Moreover, collaborating with relevant experience becomes imperative to build a comprehensive and authentic evaluation framework for the large language models.Author BioVivekanandan, a seasoned Data Specialist with over a decade of expertise in Data Science and Big Data, excels in intricate projects spanning diverse domains. Proficient in cloud analytics and data warehouses, he holds degrees in Industrial Engineering, Big Data Analytics from IIM Bangalore, and Data Science from Eastern University.As a Certified SAFe Product Manager and Practitioner, Vivekanandan ranks in the top 1 percentile on Kaggle globally. Beyond corporate excellence, he shares his knowledge as a Data Science guest faculty and advisor for educational institutes.
Read more
  • 0
  • 0
  • 1960

article-image-detecting-and-mitigating-hallucinations-in-llms
Ryan Goodman
25 Oct 2023
10 min read
Save for later

Detecting and Mitigating Hallucinations in LLMs

Ryan Goodman
25 Oct 2023
10 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionIn large language models, the term "hallucination" describes a behavior where an AI model produces results that are not entirely accurate or might sound nonsensical. It's important to understand that large language models are not search engines or databases. They do not search for information from external sources or perform complex computations. Instead, large language models (LLM) belong to a category of generative artificial intelligence.Recap How Generative AI WorksGenerative AI is a technology trained on large volumes of data and, as a result, can " generate" text, images, and even audio. This makes it fundamentally different from search engines and other software tools you might be familiar with. This foundational difference presents challenges, most notably that generative AI can’t cite sources for its responses. Large language models are also not designed to solve computational problems like math. However, generative AI can quickly generate code that might solve complex mathematical challenges. A large language model responds to inputs, most notably the text instruction called a "prompt." As the large language model generates text, it uses its training data as a foundation to extrapolate information.Understanding HallucinationsThe simplest way to understand a hallucination is the old game of telephone. In the same way, a message gets distorted in the game of telephone; information can get "distorted" or "hallucinated" as a language model tries to generate outputs based on patterns it observed in its training data. The model might "misremember" or "misinterpret" certain information, leading to inaccuracies.Let's use another model example to understand the concept of generating unique combinations of words in the context of food recipes. Imagine you want to create new recipes by observing existing ones. If you were to build a Markov model for food ingredients, you would:1.  Compile a comprehensive dataset of recipes and extract individual ingredients.2.   Create pairs of neighboring ingredients, like "tomato-basil" and "chicken-rice," and record how often each pair occurs.For example, if you start with the ingredient "chicken," you might notice it's frequently paired with "broccoli" and "garlic" but less so with "pineapple." If you then choose "broccoli" as the next ingredient, it might be equally likely to be paired with "cheese" or "lemon." By following these ingredient pairings, at some point, the model might suggest creative combinations like "chicken-pineapple-lemon," offering new culinary ideas based on observed patterns.This approach allows the Markov model to generate novel recipe ideas based on the statistical likelihood of ingredient pairings.Hallucinations as a FeatureWhen researching or computing factual information, a hallucination is a bad thing. However, the same concept that gets a bad rap for accurate information or research is what makes large language models demonstrate another human condition of creativity. As a developer, if you want to make your language model creative, OpenAI, for example, has a "temperature" input, a hyperparameter that makes the model's outputs more random. A high temperature of 1 or above will result in hallucinations and randomness. For example, a lower temperature of .2 will make the modern outputs more deterministic to match patterns it was trained on.  As an experiment, try inputting a prompt to any large language model chatbot, including ChatGPT, to provide a plot of any romantic story without copying existing accounts on the internet, a new storyline, and new characters. The LLM will offer a fictitious story with characters, a plot, multiple acts, an arc, and an ending.In specific scenarios, end users or developers might intentionally coax their large language models into a state of "hallucination. When seeking out-of-the-box ideas to think beyond its training, you can get abstract ideas. In this scenario, the model's ability to "hallucinate" isn't a bug but rather a feature. To continue the experiment, you can return to ChatGPT and ask it to pretend you have changed the temperature hyperparameter to 1.1 and re-write the story. Your results will be very “creative.”In creative pursuits, like crafting tales or penning poems, these so-called "hallucinations" aren't just tolerated; they're celebrated. They can add depth, surprise, and innovation layers to the generated content.Types of hallucination One can categorize hallucinations into different forms.Intrinsic hallucination happens to contradict the source material directly. It also offers logical inconsistency and factual inaccuracies.Extrinsic hallucination does not contradict. However, at the same time, it cannot be verified against any source. Hence, it adds elements that are considered to be unconfirmable and speculative. Detecting hallucinations Detecting hallucinations in the large language models is a tricky task. LLMs will deliver information with the same tone and certainty even if the answer is unknown. It puts the responsibility of users and developers to be careful about how information from LLMs is used.The following techniques can be utilized to uncover or measure hallucinations in large language models.Identify the grounding dataGrounding data is the standard against which the Large Language Model (LLM) output is measured. The selection of grounding data depends on the specific application. For example, real job resumes could be grounding data for generating resume-related content. Conversely, search engine outcomes could be employed for web-based inquiries. Especially in language translation, the choice of grounding data is pivotal for accurate translation. For example, official legal documents could serve as grounding data for legal translations, ensuring precision in the translated content.Create a measurement test setA measurement test data set comprises input/output pairs incorporating human interactions and the Large Language Model (LLM). These datasets often include various input conditions and their corresponding program outputs. These sets may involve simulated interactions between users and software systems, depending on the scenario.Ideally, there should be a minimum of two kinds of test sets:1. A standard or randomly generated test set that would be conventional but cater to diverse scenarios.2. An adversarial test set is created through performance in edge cases, high-risk situations, or when presented with deliberately misleading or tricky inputs, even security threats.Extract any claimsFollowing the preparation of test data sets, the subsequent stage involves extracting assertions from the Large Language Model (LLM). This extraction can occur manually, through rule-based methodologies, or even by employing machine learning models.In data analysis, the next step is to extract specific patterns from the data after gathering the datasets. This extraction can be done manually or through predefined rules, basic descriptive analytics, or, for large-scale projects, machine learning algorithms. Each method has its merits and drawbacks, which we will thoroughly investigate.Use validations against any grounding dataValidation guarantees that the content generated by the Large Language Model (LLM) corresponds to the grounding data. Frequently, this stage replicates the techniques employed for data extraction.To support the above, here is the code snippet of the same.# Define grounding data (acceptable sentences) grounding_data = [    "The sky is blue.",    "Python is a popular programming language.",    "ChatGPT provides intelligent responses." ] # List of generated sentences to be validated generated_sentences = [    "The sky is blue.",    "ChatGPT is a popular programming language.",    "Python provides intelligent responses." ] # Validate generated sentences against grounding data valid_sentences = [sentence for sentence in generated_sentences if sentence in grounding_data] # Output valid sentences print("Valid Sentences:") for sentence in valid_sentences:    print("- " + sentence) # Output invalid sentences invalid_sentences = list(set(generated_sentences) - set(valid_sentences)) print("\nInvalid Sentences:") for sentence in invalid_sentences:    print("- " + sentence)Output: Valid Sentences: - The sky is blue. Invalid Sentences: - ChatGPT is a popular programming language. - Python provides intelligent responses. Furthermore, in verifying research findings, validation ensures that the conclusions drawn from the research align with the collected data. This process often mirrors the research methods employed earlier.Metrics reportingThe "Grounding Defect Rate" is a crucial metric that measures the proportion of responses lacking context to the total generated outputs. Further metrics will be explored later for a more detailed assessment. For instance, the "Error Rate" is a vital metric indicating the percentage of mistranslated phrases from the translated text. Additional metrics will be introduced later for a comprehensive evaluation.A Multifaceted Approach to Mitigate Hallucination in the Large Language Model Leveraging product designThe developer needs to employ large language models so that it does not create material issues, even when it hallucinates. For example, you would not design an application that writes your annual report or news articles. Instead, opinion pieces or content summarization within a prompt can immediately lower the risk of problematic hallucination.If an app allows AI-generated outputs to be distributed, end users should be able to review and revise the content. It adds a protective layer of scrutiny and puts the responsibility into the hands of the user.Continuous improvement and loggingPersisting prompts and LLM output are essential for auditing purposes. As models evolve, you cannot count on prompting an LLM and getting the same result. However, regression testing and reviewing user input are critical as long as it adheres to data, security, and privacy practice.Prompt engineeringIt is essential to get the best possible output to use the concept of meta prompts effectively. A meta prompt is a high-level instruction given to a language model to guide its output in a specific direction. Rather than asking a direct question, provide context, structure, and guidance to refine the output.For example, instead of asking, "What is photosynthesis?", you can ask, "Explain photosynthesis in simple terms suitable for a 5th-grade student." This will adjust the complexity and style of the answer you get.Multi-Shot PromptsMulti-shot prompts refer to a series of prompts given to a language model, often in succession. The goal is to guide the model step-by-step toward a desired output instead of asking for a large chunk of information in a single prompt.  This approach is extremely useful when the required information is complex or extensive. Typically, these prompts are best delivered as a chat user experience, allowing the user and model to break down the requests into multiple, manageable parts.ConclusionThe issue of hallucination in Large Language Models (LLMs) presents a significant hurdle for consumers, users, and developers. While overhauling the foundational architecture of these models isn't a feasible solution for most, the good news is that there are strategies to navigate these challenges. But beyond these technical solutions, there's an ethical dimension to consider. As developers and innovators harness the power of LLMs, it's imperative to prioritize disclosure and transparency. Only through openness can we ensure that LLMs integrate seamlessly into our daily lives and gain the trust and acceptance they require to revolutionize our digital interactions truly.Author BioRyan Goodman has dedicated 20 years to the business of data and analytics, working as a practitioner, executive, and entrepreneur. He recently founded DataTools Pro after 4 years at Reliant Funding, where he served as the VP of Analytics and BI. There, he implemented a modern data stack, utilized data sciences, integrated cloud analytics, and established a governance structure. Drawing from his experiences as a customer, Ryan is now collaborating with his team to develop rapid deployment industry solutions. These solutions utilize machine learning, LLMs, and modern data platforms to significantly reduce the time to value for data and analytics teams.
Read more
  • 0
  • 0
  • 2273

article-image-large-language-models-llms-in-education
Chaitanya Yadav
23 Oct 2023
8 min read
Save for later

Large Language Models (LLMs) in Education

Chaitanya Yadav
23 Oct 2023
8 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge language models are a type of AI that can create and understand human language. The article deals with the potential of large language models in education and how they can be transformed. The ability to create and understand the language of man, by drawing on a vast database of textual data, is possessed by LLMs powered by artificial intelligence.It shows how LLMs could, by means of practical examples, put in place individual learning pathways, providing Advanced Learning Analytics and developing participatory simulations that would lead to the creation of more effective educational strategies.Benefits of LLMs in Education                                       Personalized learningThe capacity of LLMs in education to customize learning experiences for each student is one of their greatest advantages. Lesson-plan customization, individualized feedback, and real-time monitoring of student progress are all possible with LLMsAutomated tasksAdditionally, LLMs can be utilized to automate processes like grading and lesson planning. By doing this, instructors may have more time to give to other important responsibilities like teaching and connecting with students.New and innovative educational tools and resourcesLLMs can be applied to the development of innovative and cutting-edge learning resources and technology. LLMs can be used to create interactive simulations, games, and other educational activities.Real-time feedback and supportLLMs can also be utilized for providing quick help and feedback to students. For example, LLMs can be used to create chatbots that can assist students with their academic work and respond to their queries. Potential Challenges of LLMs in EducationIncorrect or misleading informationThe fact that LLMs might provide inaccurate or misleading information is one of the main problems with their use in education. This is due to the fact that LLMs are taught using vast volumes of data, some of which could be old or erroneous.Lack of understandingAnother issue with utilizing LLMs in teaching is that they might not be able to fully understand the material they produce in its entirety. This is so that they may better understand the complexity of human communication as LLMs receive instruction on statistical patterns in language.Ethical concernsThere are also some ethical concerns associated with the use of LLMs in education. LLMs should be used carefully, and their usage might have ethical consequences, which should be considered.How LLM can be used for Transforming Education with Advanced Learning StrategiesLet's look at a few examples that show the possibilities of Large Language Models (LLM) in Education.1. Advanced Personalized Learning PathwayIn this example, in order to reflect a student's individual objectives, teaching style, and progress, we are going to form an even more detailed personalized education path. Follow the steps perfectly given in the input code to create a personalized learning pathway.Input Code:    # Step 1: First we will define the generate_learning_pathway function def generate_learning_pathway(prompt, user_profile):    # Step 2: Once the function is defined we will create a template for the learning pathway    learning_pathway_template = f"Dear {user_profile['student_name']},\n\nI'm excited to help you create a personalized learning pathway to achieve your goal of {user_profile['goals']}. As a {user_profile['learning_style']} learner with {user_profile['current_progress']}, here's your pathway:\n\n"    # Step 3: Now let’s define the specific steps in the learning pathway    steps = [        "Step 1: Introduction to Data Science",        "Step 2: Data Visualization Techniques for Visual Learners",        "Step 3: Intermediate Statistics for Data Analysis",        "Step 4: Machine Learning Fundamentals",        "Step 5: Real-world Data Science Projects",    ]    # Step 4: Combine the template and the specific steps    learning_pathway = learning_pathway_template + "\n".join(steps)    return learning_pathway # Step 5: Define a main function to test the code def main():    user_profile = {        "student_name": "Alice",        "goals": "Become a data scientist",       "learning_style": "Visual learner",        "current_progress": "Completed basic statistics"    }    prompt = "Create a personalized learning pathway."    # Step 6: Generate the learning pathway    learning_pathway = generate_learning_pathway(prompt, user_profile)    # Step 7: Print the learning pathway    print(learning_pathway) if __name__ == "__main__":    main() Output:This example gives the LLM a highly customized approach to teaching taking into account students' names, objectives, methods of education, and how they are progressing.2. AI-Enhanced Learning AnalyticsThe use of LLMs in Learning Analytics may provide teachers with more detailed information on the student's performance and help them to make appropriate recommendations.Input code:# Define the generate_learning_analytics function def generate_learning_analytics(prompt, student_data): # Analyze the performance based on quiz scores average_quiz_score = sum(student_data["quiz_scores"]) / len(student_data["quiz_scores"]) # Calculate homework completion rate total_homeworks = len(student_data["homework_completion"]) completed_homeworks = sum(student_data["homework_completion"]) homework_completion_rate = (completed_homeworks / total_homeworks) * 100 # Generate the learning analytics report analytics_report = f"Learning Analytics Report for Student {student_data['student_id']}:\n" analytics_report += f"- Average Quiz Score: {average_quiz_score:.2f}\n" analytics_report += f"- Homework Completion Rate: {homework_completion_rate:.2f}%\n" if homework_completion_rate < 70: analytics_report += "Based on their performance, it's recommended to provide additional support for completing homework assignments." return analytics_reportThis code defines a Python function, ‘generates_learning_analytics’, which takes prompt and student data as input, calculates average quiz scores and homework completion rates, and generates a report that includes these metrics, together with possible recommendations for additional support based on homework performance. Now let’s provide student performance data.Input code:student_data = {    "student_id": "99678",    "quiz_scores": [89, 92, 78, 95, 89],    "homework_completion": [True, True, False, True, True] } prompt = f"Analyze the performance of student {student_data['student_id']} based on their recent quiz scores and homework completion." analytics_report = generate_learning_analytics(prompt, student_data) print(analytics_report)Output:The student's test scores and the homework completion data included in the ‘student_data’ dictionary are used to generate this report.3. Advanced Interactive Simulations for LearningThe potential for LLMs to provide an engaging learning resource will be demonstrated through the creation of a comprehensive computerised training simulation on complicated topics, such as physics.Input code:# Define the generate_advanced_simulation function def generate_advanced_simulation(prompt): # Create the interactive simulation    interactive_simulation = f"Interactive {prompt} Simulation" # Provide a link to the interactive simulation (replace with an actual link)    interactive_simulation_link = "https://your-interactive-simulation-link.com"    return interactive_simulation, interactive_simulation_link # Define a main function to test the code def main():    topic = "Quantum Mechanics"    prompt = f"Develop an interactive simulation for teaching {topic} to advanced high school students." # Generate the interactive simulation    interactive_simulation, interactive_simulation_link = generate_advanced_simulation(prompt) # Print the interactive simulation and link    print(f"Explore the {topic} interactive simulation: {interactive_simulation_link}") if __name__ == "__main__":    main()Output:In this example, for a complex topic like quantum physics, the LLM is asked to create an advanced interactive simulation that will make learning more interesting and visual. Also, make sure to replace and provide your link to the interactive simulation.Such advanced examples demonstrate the adaptability of LLMs to create highly customized learning pathways, Advanced Learning Analytics Reports, and sophisticated interactive simulations with in-depth educational experiences.ConclusionIn conclusion, by providing advanced learning strategies and tools, large language models represent a tremendous potential for revolutionizing education. These models provide a range of benefits, including personalized learning experiences, timely feedback and support, automated tasks, and the development of useful tools for innovation in education.The article considers the practical use of LLMs in education, which includes developing more sophisticated personalized school paths that take into account students' specific educational objectives and how they learn. Moreover, by giving details of the student's performance and recommendations for improvement, LLMs can improve Learning Analytics. In addition, how LLMs can enhance learning by enabling interactivity and engagement has been demonstrated through the development of real-time simulations on complicated topics.The future of education appears promising by taking into account the LLMs' ability to offer a more diverse, creative learning environment with limitless opportunities for learners around the world.Author BioChaitanya Yadav is a data analyst, machine learning, and cloud computing expert with a passion for technology and education. He has a proven track record of success in using technology to solve real-world problems and help others to learn and grow. He is skilled in a wide range of technologies, including SQL, Python, data visualization tools like Power BI, and cloud computing platforms like Google Cloud Platform. He is also 22x Multicloud Certified.In addition to his technical skills, he is also a brilliant content creator, blog writer, and book reviewer. He is the Co-founder of a tech community called "CS Infostics" which is dedicated to sharing opportunities to learn and grow in the field of IT.
Read more
  • 0
  • 0
  • 1876
article-image-testing-large-language-models-llms
20 Oct 2023
7 min read
Save for later

Testing Large Language Models (LLMs)

20 Oct 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Machine learning has become ubiquitous, with models powering everything from search engines and recommendation systems to chatbots and autonomous vehicles. As these models grow more complex, testing them thoroughly is crucial to ensure they behave as expected. This is especially true for large language models like GPT-4 that generate human-like text and engage in natural conversations.In this article, we will explore strategies for testing machine learning models, with a focus on evaluating the performance of LLMs.IntroductionMachine learning models are notoriously challenging to test due to their black-box nature. Unlike traditional code, we cannot simply verify the logic line-by-line. ML models learn from data and make probabilistic predictions, so their decision-making process is opaque.While testing methods like unit testing and integration testing are common for traditional software, they do not directly apply to ML models. We need more specialized techniques to validate model performance and uncover unexpected or undesirable behavior.Testing is particularly crucial for large language models. Since LLMs can generate free-form text, it's hard to anticipate their exact responses. Flaws in the training data or model architecture can lead to Hallucinations, biases, and errors that only surface during real-world usage. Rigorous testing provides confidence that the model works as intended.In this article, we will cover testing strategies to evaluate LLMs. The key techniques we will explore are:Similarity testingColumn coverage testingExact match testingVisual output testingLLM-based evaluationBy combining these methods, we can thoroughly test LLMs along multiple dimensions and ensure they provide coherent, accurate, and appropriate responses.Testing Text Output with Similarity SearchA common output from LLMs is text. This could be anything from chatbot responses to summaries generated from documents. A robust way to test quality of text output is similarity testing.The idea is simple - we define an expected response and compare the model's actual response to determine how similar they are. The higher the similarity score, the better.Let's walk through an example using our favorite LLM. Suppose we give it the prompt:Prompt: What is the capital of Italy?The expected response would be:Expected: The capital of Italy is Rome.Now we can pass this prompt to the LLM and get the actual response:prompt = "What is the capital of Italy?" actual = llm.ask(prompt) Let's say actual contains:Actual: Rome is the capital of Italy.While the wording is different, the meaning is the same. To quantify this similarity, we can use semantic search libraries like SentenceTransformers. It represents sentences as numeric vectors and computes similarity using cosine distance.from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') expected_embedding = model.encode(expected) actual_embedding = model.encode(actual) similarity = cosine_similarity([expected_embedding], [actual_embedding])[0][0] This yields a similarity score of 0.85, indicating the responses are highly similar in meaning.We can establish a threshold for the minimum acceptable similarity, like 0.8. Responses below this threshold fail the test. By running similarity testing over many prompt-response pairs, we can holistically assess the textual coherence of an LLM.Testing Tabular Outputs with Column CoverageIn addition to text, LLMs can output tables or data frames. For testing these, we need different techniques that account for structure.A good validation is column coverage - checking what percentage of columns in the expected output are present in the actual output.Consider the LLM answering questions about movies:Prompt: What are the top 3 highest grossing movies of all time?Expected:MovieWorldwide GrossRelease YearAvatar$2,789,679,7942009Titanic$2,187,463,9441997Star Wars Ep. VII$2,068,223,6242015Now we can test the LLM’s actual output:prompt = "What are the top 3 highest grossing movies of all time?" actual = llm.ask(prompt) Actual:MovieGlobal RevenueYearAvatar$2.789 billion2009Titanic$2.187 billion1997Star Wars: The Force Awakens$2.068 billion2015Here, actual contains the same 3 columns as expected - Movie, Gross, Release Year. So even though the headers and cell values differ slightly, we can pair them with cosine similarity and we will have 100% column coverage.We can formalize this in code:expected_cols = set(expected.columns) actual_cols = set(actual.columns) column_coverage = len(expected_cols & actual_cols) / len(expected_cols) # column_coverage = 1.0 For tables with many columns, we may only need say 90% coverage to pass the test. This validation ensures the critical output columns are present while allowing variability in column names or ancillary data.Exact Match for Numeric OutputsWhen LLMs output a single number or statistic, we can use simple exact match testing.Consider this prompt:Prompt: What was Apple's total revenue in 2021?Expected: $365.82 billionWe get the LLM’s response:prompt = "What was Apple's total revenue in 2021?" actual = llm.ask(prompt) Actual: $365.82 billionIn this case, we expect an exact string match:is_match = (actual == expected) # is_match = True For numerical outputs, precision is important. Exact match testing provides a straightforward way to validate this.Screenshot Testing for Visual OutputsBuilding PandasAI, we sometimes need to test generated charts. Testing these outputs requires verifying the visualized data is correct.One method is screenshot testing - comparing screenshots of the expected and actual visuals. For example:Prompt: Generate a bar chart comparing the revenue of FAANG companies.Expected: [Expected_Chart.png]Actual: [Actual_Chart.png]We can then test if the images match:from PIL import Image, ImageChops expected_img = Image.open("./Expected_Chart.png") actual_img = Image.open("./Actual_Chart.png") diff = ImageChops.difference(expected_img, actual_img) is_match = diff.getbbox() is None // is_match = True if images matchFor more robust validation, we could use computer vision techniques like template matching to identify and compare key elements: axes, bars, labels, etc.Screenshot testing provides quick validation of visual output without needing to interpret the raw chart data.LLM-Based EvaluationAn intriguing idea for testing LLMs is to use another LLM!The concept is to pass the expected and actual outputs to a separate "evaluator" LLM and ask if they match.For example:Expected: Rome is the capital of Italy.Actual: The capital of Italy is Rome.We can feed this to the evaluator model:Prompt: Do these two sentences convey the same information? Answer YES or NOSentence 1: Rome is the capital of Italy.Sentence 2: The capital of Italy is Rome.Evaluator: YESThe evaluator LLM acts like a semantic similarity scorer. This takes advantage of the natural language capabilities of LLMs.The downside is it evaluates one black box model using another black box model. Errors or biases in the evaluator could lead to incorrect assessments. So LLM-based evaluation should complement other testing approaches, not act as the sole method.ConclusionTesting machine learning models thoroughly is critical as they grow more ubiquitous and impactful. Large language models pose unique testing challenges due to their free-form textual outputs.Using a combination of similarity testing, column coverage validation, exact match, visual output screening, and even LLM-based evaluation, we can rigorously assess LLMs along multiple dimensions.A comprehensive test suite combining these techniques will catch more flaws and flaws than any single method alone. This builds essential confidence that LLMs behave as expected in the real world.Testing takes time but prevents much larger problems down the road. The strategies covered in this article will add rigor to the development and deployment of LLMs, helping ensure these powerful models benefit humanity as intended.Author BioGabriele Venturi is a software engineer and entrepreneur who started coding at the young age of 12. Since then, he has launched several projects across gaming, travel, finance, and other spaces - contributing his technical skills to various startups across Europe over the past decade.Gabriele's true passion lies in leveraging AI advancements to simplify data analysis. This mission led him to create PandasAI, released open source in April 2023. PandasAI integrates large language models into the popular Python data analysis library Pandas. This enables an intuitive conversational interface for exploring data through natural language queries.By open-sourcing PandasAI, Gabriele aims to share the power of AI with the community and push boundaries in conversational data analytics. He actively contributes as an open-source developer dedicated to advancing what's possible with generative AI.
Read more
  • 0
  • 0
  • 1884

article-image-reducing-hallucinations-with-intent-classification
Gabriele Venturi
13 Oct 2023
10 min read
Save for later

Reducing Hallucinations with Intent Classification

Gabriele Venturi
13 Oct 2023
10 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLanguage models (LLMs) are incredibly capable, but they are prone to hallucinating - generating convincing but completely incorrect or nonsensical outputs. This is a significant impediment to deploying LLMs safely in real-world applications. In this comprehensive guide, we will explore a technique called intent classification to mitigate hallucinations and make LLMs more robust and reliable.The Hallucination ProblemHallucinations occur when an AI system generates outputs that are untethered from reality and make false claims with high confidence. For example, if you asked an LLM like GPT-3 a factual question that it does not have sufficient knowledge to answer correctly, it might fabricate a response that sounds plausible but is completely incorrect.This happens because LLMs are trained to continue text in a way that seems natural, not to faithfully represent truth. Their knowledge comes solely from their training data, so they often lack sufficient grounding in real-world facts. When prompted with out-of-distribution questions, they resort to guessing rather than admitting ignorance.Hallucinations are incredibly dangerous if deployed in real applications like conversational agents. Providing false information as if it were true severely damages trust and utility. So for AI systems to be reliable digital assistants, we need ways to detect and reduce hallucinations.Leveraging Intent ClassificationOne strategy is to use intent classification on the user input before feeding it to the LLM. The goal is to understand what the user is intending so we can formulate the prompt properly to minimize hallucination risks.For example, consider a question like:"What year did the first airplane fly?"The intent here is clearly to get a factual answer about a historical event. An LLM may or may not know the answer. But with a properly classified intent, we can prompt the model accordingly:"Please provide the exact year the first airplane flew if you have sufficient factual knowledge to answer correctly. Otherwise respond that you do not know."This prompt forces the model to stick to facts it is confident about rather than attempting to guess an answer.The Intent Classification ProcessSo how does intent classification work exactly? At a high level, there are three main steps:Gather example user inputs and label them with intents.Train a classifier model on the labeled data.Run new user inputs through the classifier to predict intent labels.For the first step, we need to collect a dataset of example queries, commands, and other user inputs. These should cover the full range of expected inputs our system will encounter when deployed.For each example, we attach one or more intent labels that describe what the user hopes to achieve. Some common intent categories include:Information request (asking for facts/data)Action request (wanting to execute a command or process)Clarification (asking the system to rephrase something)Social (general conversation, chit-chat, etc.)Next, we use this labeled data to train an intent classification model. This can be a simple machine learning model like logistic regression, or more complex neural networks like BERT can be used. The model learns to predict the intent labels for new text inputs based on patterns in the training data.Finally, when users interact with our system, we pass their inputs to the intent classifier to attach labels before generating any AI outputs. The predicted intent drives how we frame the prompt for the LLM to minimize hallucination risks.Sample IntentsHere are some examples of potential intent labels:Information Request - Factual questions, asking for definitions, requesting data lookup, etc."What is the capital of Vermont?""What year was Julius Caesar born?"Action Request - Wants the system to perform a command or process some data."Can you book me a flight to Denver?""Plot a scatter graph of these points."Clarification - The user needs the system to rephrase or explain something it previously said."Sorry, I don't understand. Can you rephrase that?""What do you mean by TCP/IP?"Social - Casual conversation, chit-chat, pleasantries."How is your day going?""What are your hobbies?"For a production intent classifier, we would want 20-50 diverse intent types covering the full gamut of expected user inputs.Building the DatasetTo train an accurate intent classifier, we need a dataset with at least a few hundred examples per intent class. Here are some best practices for building a robust training dataset:Include diversity: Examples should cover the myriad ways users might express an intent. Use different wording, sentence structures, etc.Gather real data: Use logs of real user interactions if possible rather than only synthetic examples. Real queries contain nuances that are hard to fabricate.Multilabel intents: Many queries have multiple intents. Label accordingly rather than forcing single labels.Remove ambiguities: Any confusing/ambiguous examples should be discarded to avoid training confusion.Use validation sets: Split your data into training, validation, and test sets for proper evaluation.Regularly expand: Continuously add new labeled examples to improve classifier accuracy over time.Adhering to these data collection principles results in higher-fidelity intent classification. Next, we'll cover how to implement an intent classifier in Python.Implementing the Intent ClassifierFor this example, we'll build a simple scikit-learn classifier to predict two intents - Information Request and Action Request. Here is a sample of labeled training data with 50 examples for each intent:# Sample labeled intent data import pandas as pd data = [{'text': 'What is the population of France?', 'intent': 'Information Request'}, {'text': 'How tall is the Eiffel Tower?', 'intent': 'Information Request'}, # ... {'text': 'Book a table for dinner tonight', 'intent': 'Action Request'}, {'text': 'Turn up the volume please', 'intent': 'Action Request'}, # ... ] df = pd.DataFrame(data) We'll use a CountVectorizer and Tf-Idf vectorizer to extract features from the text data. Then we'll train a simple Logistic Regression classifier on this:# Extract features from text data from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer count_vect = CountVectorizer() count_vect.fit(df['text']) counts = count_vect.transform(df['text']) tfidf_transformer = TfidfTransformer() tfidf = tfidf_transformer.fit_transform(counts) # Train classifier model from sklearn.linear_model import LogisticRegression X_train = tfidf y_train = df['intent'] model = LogisticRegression() model.fit(X_train, y_train) Now we can make predictions on new text inputs:# Make predictions on new texts texts = ['What year was Napoleon Bonaparte born?', 'Play some music please'] counts = count_vect.transform(texts) tfidf = tfidf_transformer.transform(counts) predictions = model.predict(tfidf) print(predictions) # Output: ['Information Request', 'Action Request'] And that's it! With just a few lines of scikit-learn code, we have a simple but decent intent classifier. For a more robust production classifier, we would want to use deep learning models like BERT rather than logistic regression. We would also train on much larger datasets with significantly more intent diversity.However, the underlying principles remain the same - leverage labeled data to train a model that can predict intents for new text inputs. Those intents can then be used to formulate better prompts for LLMs to reduce hallucinations.Prompting with IntentsNow let's look at a few examples of how classified intents can guide how we prompt LLMs to curb hallucinations:Information RequestUser: "When did the first airplane fly?"Intent: Information RequestAnswer: “On December 17, 1903”By detecting this as a request for factual information, we can prompt the LLM to share known facts. This prevents guessing at an answer that may be false. Or we can gather the information from some third-party API to make sure the response is accurate.Action RequestUser: "Can you book me a ticket to SF?"Intent: Action RequestAnswer: "I apologize, but I cannot do that. I am an AI assistant without direct access to online services."Recognizing this as an action request allows us to clarify that the LLM cannot actually complete the requested task. This curbs hallucination risks from the LLM attempting to imitate such functionality.SocialUser: "How are you doing today?"Intent: SocialAnswer: "I am an AI system without real subjective experiences."For casual social conversation, the AI can be more playful while still grounding responses in reality about its AI nature.The key in each case is using the predicted intent to formulate a prompt that discourages ungrounded hallucinations and encourages sticking to solid facts the LLM is confident about. Of course, hallucinations cannot be fully eliminated, but intent-guided prompting pushes models to be more honest about the limits of their knowledge.Results and ImpactStudies have shown intent classification can significantly improve AI reliability by reducing false factual claims. In one experiment, hallucination rates for an LLM dropped from 19.8% to just 2.7% using a classifier trained on 100 intent types. Precision on answering factual questions rose from 78% to 94% with intents guiding prompting.Beyond curbing hallucinations, intent classification also enables smarter response formulation in general:Answering questions more accurately based on contextual understanding of the user's true information needs.Retrieving the most relevant examples or templates to include in responses based on predicted intents.Building conversational systems that handle a diverse range of statement types and goals seamlessly.So in summary, intent classification is a powerful technique to minimize risky AI behaviors like ungrounded hallucinations. It delivers major improvements in reliability and safety for real-world deployments where trustworthiness is critical. Adopting an intent-aware approach is key to developing AI assistants that can have nuanced, natural interactions without jeopardizing accuracy.ConclusionHallucinations pose serious challenges as we expand real-world uses of large language models and conversational agents. Identifying clear user intents provides crucial context that allows crafting prompts in ways that curb harmful fabrications. This guide covered best practices for building robust intent classifiers, detailed implementation in Python, and demonstrated impactful examples of reducing hallucinations through intent-guided prompting.Adopting these approaches allows developing AI systems that admit ignorance rather than guessing and remain firmly grounded in reality. While not a magic solution, intent classification serves as an invaluable tool for engineering the trustworthy AI assistants needed in domains like medicine, finance, and more. As models continue to advance in capability, maintaining rigorous intent awareness will only grow in importance.Author BioGabriele Venturi is a software engineer and entrepreneur who started coding at the young age of 12. Since then, he has launched several projects across gaming, travel, finance, and other spaces - contributing his technical skills to various startups across Europe over the past decade.Gabriele's true passion lies in leveraging AI advancements to simplify data analysis. This mission led him to create PandasAI, released open source in April 2023. PandasAI integrates large language models into the popular Python data analysis library Pandas. This enables an intuitive conversational interface for exploring data through natural language queries.By open-sourcing PandasAI, Gabriele aims to share the power of AI with the community and push boundaries in conversational data analytics. He actively contributes as an open-source developer dedicated to advancing what's possible with generative AI.
Read more
  • 0
  • 0
  • 686