LLM | 81 articles | Tech News, Tutorials & Expert Insights

article-image-ai-distilled-28-unveiling-innovations-reshaping-our-world

11 Dec 2023

13 min read

AI_Distilled #28: Unveiling Innovations Reshaping Our World

11 Dec 2023

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!👋 Hello ,“Generative AI has the potential to change the world in ways that we can’t even imagine. It has the power to create new ideas, products, and services that will make our lives easier, more productive, and more creative. It also has the potential to solve some of the world’s biggest problems, such as climate change, poverty, and disease.” -Bill Gates, Microsoft Co-Founder Microsoft Bing’s new Deep Search functionality is a case in point — Bing will now create AI prompts itself to provide detailed insights to user queries in ways traditional search engines can’t even match. Who could have thought LLMs would progress so much they would eventually prompt themselves? Even Runway ML is onto something big with its groundbreaking technology that creates realistic AI generated videos that will find their way to Hollywood. Welcome back to a new issue of AI Distilled - your one-stop destination for all things AI, ML, NLP, and Gen AI. Let’s get started with the latest news and developments across the AI sector: Elon Musk's xAI Initiates $1 Billion Funding Drive in AI Race Bing’s New Deep Search Expands Queries AI Takes Center Stage in 2023 Word of the Year Lists OpenAI Announces Delay in GPT Store Launch to Next Year ChatGPT Celebrates First Anniversary with 110M Installs and $30M Revenue Milestone Runway ML and Getty Images Collaborate on AI Video Models for Hollywood and Advertising We’ve also curated the latest GPT and LLM resources, tutorials, and secret knowledge: Unlocking AI Magic: A Primer on 7 Essential Libraries for Developers Efficient LLM Fine-Tuning with QLoRA on a Laptop Rapid Deployment of Large Open Source LLMs with Runpod and vLLM’s OpenAI Endpoint Understanding Strategies to Enhance Retrieval-Augmented Generation (RAG) Pipeline Performance Understanding and Mitigating Biases and Toxicity in LLMs Finally, don’t forget to check-out our hands-on tips and strategies from the AI community for you to use on your own projects: A Step-by-Step Guide to Streamlining LLM Data Processing for Efficient Pipelines Fine-Tuning Mistral Instruct 7B on the MedMCQA Dataset Using QLoRA Accelerating Large-Scale Training: A Comprehensive Guide to Amazon SageMaker Data Parallel Library Enhancing LoRA-Based Inference Speed: A Guide to Efficient LoRA Decomposition Looking for some inspiration? Here are some GitHub repositories to get your projects going! tacju/maxtron Tanuki/tanuki.py roboflow/multimodal-maestro 03axdov/muskie Also, don't forget to check our expert insights column, which covers the interesting concepts of NLP from the book 'The Handbook of NLP with Gensim'. It's a must-read! Stay curious and gear up for an intellectually enriching experience! 📥 Feedback on the Weekly EditionQuick question: How can we foster effective collaboration between humans and AI systems, ensuring that AI complements human skills and enhances productivity without causing job displacement or widening societal gaps?Share your valued opinions discreetly! Your insights could shine in our next issue for the 39K-strong AI community. Join the conversation! 🗨️✨ As a big thanks, get our bestselling "Interactive Data Visualization with Python - Second Edition" in PDF. Let's make AI_Distilled even more awesome! 🚀 Jump on in! Share your thoughts and opinions here! Writer’s Credit: Special shout-out to Vidhu Jain for their valuable contribution to this week’s newsletter content! Cheers, Merlyn Shelley Editor-in-Chief, Packt SignUp | Advertise | Archives⚡ TechWave: AI/GPT News & Analysis🏐 Elon Musk's xAI Initiates $1 Billion Funding Drive in AI Race: xAI is on a quest to secure $1 billion in equity, aiming to stay competitive with tech giants like OpenAI, Microsoft, and Google in the dynamic AI landscape. Already amassing $135 million from investors, xAI's total funding goal is disclosed in a filing with the US Securities and Exchange Commission. 🏐 AI Alliance Launched by Tech Giants IBM and Meta: IBM and Meta have formed a new "AI Alliance" with over 50 partners to promote open and responsible AI development. Members include Dell, Intel, CERN, NASA and Sony. The alliance envisions fostering an open AI community for researchers and developers and can help members make progress if they openly share models or not. 🏐 Bing’s New Deep Search Expands Queries: Microsoft is testing a new Bing feature called Deep Search that uses GPT-4 to expand search queries before providing results. Deep Search displays the expanded topics in a panel for users to select the one that best fits what they want to know. It then tailors the search results to that description. Microsoft says the feature can take up to 30 seconds due to the AI generation. 🏐 AI Takes Center Stage in 2023 Word of the Year Lists: In 2023, AI dominates tech, influencing "word of the year" choices. Cambridge picks "hallucinate" for AI's tendency to invent information; Merriam-Webster chooses "authentic" to address AI's impact on reality. Oxford recognizes "prompt" for its evolved role in instructing generative AI, reflecting society's increased integration of AI into everyday language and culture. 🏐 OpenAI Announces Delay in GPT Store Launch to Next Year: OpenAI delays the GPT store release until next year, citing unexpected challenges and postponing the initial December launch plan. Despite recent challenges, including CEO changes and employee unrest, development continues, and updates for ChatGPT are expected. The GPT store aims to be a marketplace for users to sell and share custom GPTs, with creators compensated based on usage. 🏐 ChatGPT Celebrates First Anniversary with 110M Installs and $30M Revenue Milestone: ChatGPT's mobile apps, launched in May 2023 on iOS and later on Android, have exceeded 110 million installs, yielding nearly $30 million in revenue. The success is fueled by the ChatGPT Plus subscription, offering perks. Despite competition, downloads surge, with Android hitting 18 million in a week. The company expects continued growth by year-end 2023. 🏐 Runway ML and Getty Images Collaborate on AI Video Models for Hollywood and Advertising: NYC video AI startup Runway ML, backed by Google and NVIDIA, announces a partnership with Getty Images for the Runway <> Getty Images Model (RGM), a generative AI video model. Targeting Hollywood, advertising, media, and broadcasting, it enables customized content workflows for Runway enterprise customers. 🔮 Expert Insights from Packt Community The Handbook of NLP with Gensim - By Chris Kuo NLU + NLG = NLP NLP is an umbrella term that covers natural language understanding (NLU) and NLG. We’ll go through both in the next sections. NLU Many languages, such as English, German, and Chinese, have been developing for hundreds of years and continue to evolve. Humans can use languages artfully in various social contexts. Now, we are asking a computer to understand human language. What’s very rudimentary to us may not be so apparent to a computer. Linguists have contributed much to the development of computers’ understanding in terms of syntax, semantics, phonology, morphology, and pragmatics. NLU focuses on understanding the meaning of human language. It extracts text or speech input and then analyzes the syntax, semantics, phonology, morphology, and pragmatics in the language. Let’s briefly go over each one: Syntax: This is about the study of how words are arranged to form phrases and clauses, as well as the use of punctuation, order of words, and sentences. Semantics: This is about the possible meanings of a sentence based on the interactions between words in the sentence. It is concerned with the interpretation of language, rather than its form or structure. For example, the word “table” as a noun can refer to “a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs” or a data frame in a computer language. NLU can understand the two meanings of a word in such jokes through a technique called word embedding. Phonology: This is about the study of the sound system of a language, including the sounds of speech (phonemes), how they are combined to form words (morphology), and how they are organized into larger units such as syllables and stress patterns. For example, the sounds represented by the letters “p” and “b” in English are distinct phonemes. A phoneme is the smallest unit of sound in a language that can change the meaning of a word. Consider the words “pat” and “bat.” The only difference between these two words is the initial sound, but their meanings are different. Morphology: This is the study of the structure of words, including the way in which they are formed from smaller units of meaning called morphemes. It originally comes from “morph,” the shape or form, and “ology,” the study of something. Morphology is important because it helps us understand how words are formed and how they relate to each other. It also helps us understand how words change over time and how they are related to other words in a language. For example, the word “unkindness” consists of three separate morphemes: the prefix “un-,” the root “kind,” and the suffix “-ness.” Pragmatics: This is the study of how language is used in a social context. Pragmatics is important because it helps us understand how language works in real-world situations, and how language can be used to convey meaning and achieve specific purposes. For example, if you offer to buy your friend a McDonald’s burger, a large fries, and a large drink, your friend may reply "no" because he is concerned about becoming fat. Your friend may simply mean the burger meal is high in calories, but the conversation can also imply he may be fat in a social context. Now, let’s understand NLG. NLG While NLU is concerned with reading for a computer to comprehend, NLG is about writing for a computer to write. The term generation in NLG refers to an NLP model generating meaningful words or even articles. Today, when you compose an email or type a sentence in an app, it presents possible words to complete your sentence or performs automatic correction. These are applications of NLG. This content is from the book The Handbook of NLP with Gensim - By Chris Kuo (Oct 2023). Start reading a free chapter or access the entire Packt digital library free for 7 days by signing up now. To learn more, click on the button below. Read through the Chapter 1 unlocked here... 🌟 Secret Knowledge: AI/LLM Resources🏀 Unlocking AI Magic: A Primer on 7 Essential Libraries for Developers: Discover seven cutting-edge libraries to enhance development projects with advanced AI features. From CopilotTextarea for AI-driven writing in React apps to PrivateGPT for secure, locally processed document interactions, explore tools that elevate your projects and impress users. 🏀 Efficient LLM Fine-Tuning with QLoRA on a Laptop: Explore QLoRA, an efficient memory-saving method for fine-tuning large language models on ordinary CPUs. The QLoRA API supports NF4, FP4, INT4, and INT8 data types for quantization, utilizing methods like LoRA and gradient checkpointing to significantly reduce memory requirements. Learn to implement QLoRA on CPUs, leveraging Intel Extension for Transformers, with experiments showcasing its efficiency on consumer-level CPUs. 🏀 Rapid Deployment of Large Open Source LLMs with Runpod and vLLM’s OpenAI Endpoint: Learn to swiftly deploy open-source LLMs into applications with a tutorial, featuring the Llama-2 70B model and AutoGen framework. Utilize tools like Runpod and vLLM for computational resources and API endpoint creation, with a step-by-step guide and the option for non-gated models like Falcon-40B. 🏀 Understanding Strategies to Enhance Retrieval-Augmented Generation (RAG) Pipeline Performance: Learn optimization techniques for RAG applications by focusing on hyperparameters, tuning strategies, data ingestion, and pipeline preparation. Explore improvements in inferencing through query transformations, retrieval parameters, advanced strategies, re-ranking models, LLMs, and prompt engineering for enhanced retrieval and generation. 🏀 Understanding and Mitigating Biases and Toxicity in LLMs: Explore the impact of ethical guidelines on Large Language Model (LLM) development, examining measures adopted by companies like OpenAI and Google to address biases and toxicity. Research covers content generation, jailbreaking, and biases in diverse domains, revealing complexities and challenges in ensuring ethical LLMs. 🔛 Masterclass: AI/LLM Tutorials🎯 A Step-by-Step Guide to Streamlining LLM Data Processing for Efficient Pipelines: Learn to optimize the development loop for your LLM-powered recommendation system by addressing slow processing times in data pipelines. The solution involves implementing a Pipeline class to save inputs/outputs, enabling efficient error debugging. Enhance developer experience with individual pipeline stages as functions and consider future optimizations like error classes and concurrency. 🎯 Fine-Tuning Mistral Instruct 7B on the MedMCQA Dataset Using QLoRA: Explore fine-tuning Mistral Instruct 7B, an open-source LLM, for medical entrance exam questions using the MedMCQA dataset. Utilize Google Colab, GPTQ version, and LoRA technique for memory efficiency. The tutorial covers data loading, prompt creation, configuration, training setup, code snippets, and performance evaluation, offering a foundation for experimentation and enhancement. 🎯 Accelerating Large-Scale Training: A Comprehensive Guide to Amazon SageMaker Data Parallel Library: This guide details ways to boost Large Language Model (LLM) training speed with Amazon SageMaker's SMDDP. It addresses challenges in distributed training, emphasizing SMDDP's optimized AllGather for GPU communication bottleneck, exploring techniques like EFA network usage, GDRCopy coordination, and reduced GPU streaming multiprocessors for improved efficiency and cost-effectiveness on Amazon SageMaker. 🎯 Enhancing LoRA-Based Inference Speed: A Guide to Efficient LoRA Decomposition: The article highlights achieving three times faster inference for public LoRAs using the Diffusers library. It introduces LoRA, a parameter-efficient fine-tuning technique, detailing its decomposition process and benefits, including quick transitions and reduced warm-up and response times in the Inference API. 🚀 HackHub: Trending AI Tools⚽ tacju/maxtron: Unified meta-architecture for video segmentation, enhancing clip-level segmenters with within-clip and cross-clip tracking modules. ⚽ Tanuki/tanuki.py: Simplifies the creation of apps powered by LLMs in Python by seamlessly integrating well-typed, reliable, and stateless LLM-powered functions into applications. ⚽ roboflow/multimodal-maestro: Empowers developers with enhanced control over large multimodal models, enabling the achievement of diverse outputs through effective prompting tactics. ⚽ 03axdov/muskie: Python-based ML library that simplifies the process of dataset creation and model utilization, aiming to reduce code complexity.

0
0
377

article-image-deploying-llms-with-amazon-sagemaker-part-2

Joshua Arvin Lat

30 Nov 2023

19 min read

Deploying LLMs with Amazon SageMaker - Part 2

Joshua Arvin Lat

30 Nov 2023

19 min read

0
0
450

article-image-deploying-llms-with-amazon-sagemaker-part-1

Joshua Arvin Lat

29 Nov 2023

13 min read

Deploying LLMs with Amazon SageMaker - Part 1

Joshua Arvin Lat

29 Nov 2023

13 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionHave you ever tried asking a Generative AI-powered chatbot the question: “What is the meaning of life?”. In case you have not tried that yet, here’s the response I got when I tried that myself using a custom chatbot app I built with a managed machine learning (ML) service called Amazon SageMaker. Image 01 — Asking a chatbot the meaning of lifeYou would be surprised that I built this quick demo application myself in just a few hours! In this post, I will teach you how to deploy your own Large Language Models (LLMs) in a SageMaker Inference Endpoint (that is, a machine learning-powered server that responds to inputs) with just a few lines of code. Image 02 — Deploying an LLM to a SageMaker Inference EndpointWhile most tutorials available teach us how to utilize existing Application Programming Interfaces (APIs) to prepare chatbot applications, it’s best that we also know how to deploy LLMs in our own servers in order to guarantee data privacy and compliance. In addition to this, we’ll be able to manage the long-term costs of our AI-powered systems as well. One of the most powerful solutions available for these types of requirements is Amazon SageMaker which helps us focus on the work we need to do instead of worrying about cloud infrastructure management.We’ll divide the hands-on portion into the following sections:● Section I: Preparing the SageMaker Notebook Instance● Section II: Deploying an LLM using the SageMaker Python SDK to a SageMaker Inference Endpoint● Section III: Enabling Data Capture with SageMaker Model Monitor (discussed in Part 2)● Section IV: Invoking the SageMaker inference endpoint using the boto3 client (discussed in Part 2)● Section V: Preparing a Demo UI for our chatbot application (discussed in Part 2)● Section VI: Cleaning Up (discussed in Part 2) Without further ado, let’s begin!Section I: Preparing the SageMaker Notebook InstanceLet’s start by creating a SageMaker Notebook instance. Note that while we can also do this in SageMaker Studio, running the example in a Sagemaker Notebook Instance should do the trick. If this is your first time launching a SageMaker Notebook instance, you can think of it as your local machine with several tools pre-installed already where we can run our scripts.STEP # 01: Sign in to your AWS account and navigate to the SageMaker console by typing sagemaker in the search box similar to what we have in the following image: Image 03 — Navigating to the SageMaker consoleChoose Amazon SageMaker from the list of options available as highlighted in Image 03.STEP # 02: In the sidebar, locate and click Notebook instances under Notebook: Image 04 — Locating Notebook instances in the sidebar STEP # 03: Next, locate and click the Create notebook instance button.STEP # 04: In the Create notebook instance page, you’ll be asked to input a few configuration parameters before we’re able to launch the notebook instance where we’ll be running our code: Image 05 — Creating a new SageMaker Notebook instanceSpecify a Notebook instance name (for example, llm-demo) and select a Notebook instance type. For best results, you may select a relatively powerful instance type (ml.m4.xlarge) where we will run the scripts. However, you may decide to choose a smaller instance type such as ml.t3.medium (slower but less expensive). Note that we will not be deploying our LLM inside this notebook instance as the model will be deployed in a separate inference endpoint (which will require a more powerful instance type such as an ml.g5.2xlarge).STEP # 05:Create an IAM role by choosing Create a new role from the list of options available in the IAM role dropdown (under Permissions and encryption). Image 06 — Opening the Jupyter appThis will open the following popup window. Given that we’re just working on a demo application, the default security configuration should do the trick. Click the Create role button.Important Note: Make sure to have a more secure configuration when dealing with production (or staging) work environments.Won’t dive deep into how cloud security works in this post so feel free to look for other resources and references to further improve the current security setup. In case you are interested to learn more about cloud security, feel free to check my 3rd book “Building and Automating Penetration Testing Labs in the Cloud”. In the 7th Chapter of the book (Setting Up an IAM Privilege Escalation Lab), you’ll learn how misconfigured machine learning environments on AWS can easily be exploited with the right sequence of steps.STEP #06: Click the Create notebook instance button. Wait for about 5-10 minutes for the SageMaker Notebook instance to be ready.Important Note: Given that this will launch a resource that will run until you turn it off (or delete it), make sure to complete all the steps in the 2nd part of this post and clean up the created resources accordingly.STEP # 07:Once the instance is ready, click Open Jupyter similar to what we have in Image 07: Image 07 — Opening the Jupyter appThis will open the Jupyter application in a browser tab. If this is your first time using this application, do not worry as detailed instructions will be provided in the succeeding steps to help you get familiar with this tool.STEP # 08:Create a new notebook by clicking New and selecting conda_python3 from the list of options available: Image 08 — Creating a new notebook using the conda_python3 kernelIn case you are wondering about what a kernel is, it is simply an “engine” or “environment” with pre-installed libraries and prerequisites that executes the code specified in the notebook cells. You’ll see this in action in a bit.STEP # 09:At this point, we should see the following interface where we can run various types of scripts and blocks of code: Image 09 — New Jupyter notebookFeel free to rename the Jupyter Notebook before proceeding to the next step. If you have not used a Jupyter Notebook before, you may run the following line of code by typing the following in the text field and pressing SHIFT + ENTER. print('hello')This should print the output hello right below the text field where we placed our code.Section II: Deploying an LLM using the SageMaker Python SDK to a SageMaker Inference EndpointSTEP # 01: With everything ready, let’s start by installing a specific version of the SageMaker Python SDK: !pip install sagemaker==2.192.1Here, we’ll be using v2.192.1. This will help us ensure that you won’t encounter breaking changes even if you work on the hands-on solutions in this post at a later date.In case you are wondering what the SageMaker Python SDK is, it is simply a software development kit (SDK) with the set of tools and APIs to help developers interact with and utilize the different features and capabilities of Amazon SageMaker.STEP # 02: Next, let’s import and prepare a few prerequisites by running the following block of code: import sagemaker import time sagemaker_session = sagemaker.Session() region = sagemaker_session.boto_region_name role = sagemaker.get_execution_role()STEP # 03: Let’s import HuggingFaceModel and get_huggingface_llm_image_uri as well:from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uriSTEP # 04: Next, let’s define the generate_random_label() function which we’ll use later when naming our resources:from string import ascii_uppercase from random import choice def generate_random_label(): letters = ascii_uppercase return ''.join(choice(letters) for i in range(10))This will help us avoid naming conflicts when creating and configuring our resources.STEP # 05: Use the get_huggingface_llm_image_uri function we imported in an earlier step to retrieve the container image URI for our LLM. In addition to this, let’s define the model_name we’ll use later when deploying our LLM to a SageMaker endpoint:image_uri = get_huggingface_llm_image_uri( backend="huggingface", region=region, version="1.1.0" ) model_name = "MistralLite-" + generate_random_label()STEP # 06: Before, we proceed with the actual deployment, let’s quickly inspect what we have in the image_uri variable:image_uriThis will output the following variable value:'763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04'STEP # 07: Similarly, let’s check the variable value of model_name model_nameThis will give us the following:'MistralLite-HKGKFRXURT'Note that you’ll get a different model_name value since we’re randomly generating a portion of the model nameSTEP # 08: Let’s prepare the hub model configuration as well:hub_env = { 'HF_MODEL_ID': 'amazon/MistralLite', 'HF_TASK': 'text-generation', 'SM_NUM_GPUS': '1', "MAX_INPUT_LENGTH": '16000', "MAX_TOTAL_TOKENS": '16384', "MAX_BATCH_PREFILL_TOKENS": '16384', "MAX_BATCH_TOTAL_TOKENS": '16384', }Here, we specify that we’ll be using the MistralLite model. If this is your first time hearing out MistralLite, it is a fine-tuned Mistral-7B-v0.1 language model. It can perform significantly better on several long context retrieve and answering tasks. For more information, feel free to check: https://huggingface.co/amazon/MistralLite.STEP # 09: Let’s initialize the HuggingFaceModel object using some of the prerequisites and variables we’ve prepared in the earlier steps:model = HuggingFaceModel( name=model_name, env=hub_env, role=role, image_uri=image_uri )STEP # 10: Now, let’s proceed with the deployment of the model using the deploy() method:predictor = model.deploy( initial_instance_count=1, instance_type="ml.g5.2xlarge", endpoint_name=model_name, )Here, we’re using an ml.g5.2xlarge for our inference endpoint.Given that this step may take about 10-15 minutes to complete, feel free to grab a cup of coffee or tea while waiting!Important Note: Given that this will launch a resource that will run until you turn it off (or delete it), make sure to complete all the steps in the 2nd part of this post and clean up the created resources accordingly.STEP # 11: Now, let’s prepare our first input data:question = "What is the meaning of life?" input_data = { "inputs": f"<|prompter|>{question}</s><|assistant|>", "parameters": { "do_sample": False, "max_new_tokens": 2000, "return_full_text": False, } }STEP # 12: With the prerequisites ready, let’s have our deployed LLM process the input data we prepared in the previous step:result = predictor.predict(input_data)[0]["generated_text"] print(result)This should yield the following output:The meaning of life is a philosophical question that has been debated by thinkers and philosophers for centuries. There is no single answer that can be definitively proven, as the meaning of life is subjective and can vary greatly from person to person. ...Looks like our SageMaker Inference endpoint (where the LLM is deployed) is working just fine!ConclusionThat wraps up the first part of this post. At this point, you should have a good idea of how to deploy LLMs using Amazon SageMaker. However, there’s more in store for us in the second part as we’ll build on top of what we have already and enable data capture to help us collect and analyze the data (that is, the input requests and output responses) that pass through the inference endpoint. In addition to this, we’ll prepare a demo user interface utilizing the ML model we deployed in this post.If you’re looking for the link to the second part, here it is: Deploying LLMs with Amazon SageMaker - Part 2We are just scratching the surface as there is a long list of capabilities and features available in SageMaker. If you want to take things to the next level, feel free to read 2 of my books focusing heavily on SageMaker: “Machine Learning with Amazon SageMaker Cookbook” and “Machine Learning Engineering on AWS”.Author BioJoshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO of 3 Australian-owned companies and also served as the Director for Software Development and Engineering for multiple e-commerce startups in the past. Years ago, he and his team won 1st place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and he has been sharing his knowledge in several international conferences to discuss practical strategies on machine learning, engineering, security, and management. He is also the author of the books "Machine Learning with Amazon SageMaker Cookbook", "Machine Learning Engineering on AWS", and "Building and Automating Penetration Testing Labs in the Cloud". Due to his proven track record in leading digital transformation within organizations, he has been recognized as one of the prestigious Orange Boomerang: Digital Leader of the Year 2023 award winners.

0
0
1116

article-image-llms-for-extractive-summarization-in-nlp

Mostafa Ibrahim

20 Nov 2023

7 min read

LLMs For Extractive Summarization in NLP

Mostafa Ibrahim

20 Nov 2023

7 min read

0
0
558

article-image-large-language-models-llms-and-knowledge-graphs

Mostafa Ibrahim

15 Nov 2023

7 min read

Large Language Models (LLMs) and Knowledge Graphs

Mostafa Ibrahim

15 Nov 2023

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionHarnessing the power of AI, this article explores how Large Language Models (LLMs) like OpenAI's GPT can analyze data from Knowledge Graphs to revolutionize data interpretation, particularly in healthcare. We'll illustrate a use case where an LLM assesses patient symptoms from a Knowledge Graph to suggest diagnoses, showcasing LLM’s potential to support medical diagnostics with precision.Brief Introduction Into Large Language Models (LLMs)Large Language Models (LLMs), such as OpenAI's GPT series, represent a significant advancement in the field of artificial intelligence. These models are trained on vast datasets of text, enabling them to understand and generate human-like language.LLMs are adept at understanding complex questions and providing appropriate responses, akin to human analysis. This capability stems from their extensive training on diverse datasets, allowing them to interpret context and generate relevant text-based answers.While LLMs possess advanced data processing capabilities, their effectiveness is often limited by the static nature of their training data. Knowledge Graphs step in to fill this gap, offering a dynamic and continuously updated source of information. This integration not only equips LLMs with the latest data, enhancing the accuracy and relevance of their output but also empowers them to solve more complex problems with a greater level of sophistication. As we harness this powerful combination, we pave the way for innovative solutions across various sectors that demand real-time intelligence, such as the ever-fluctuating stock market.Exploring Knowledge Graphs and How LLMs Can Benefit From ThemKnowledge Graphs represent a pivotal advancement in organizing and utilizing data, especially in enhancing the capabilities of Large Language Models (LLMs).Knowledge Graphs organize data in a graph format, where entities (like people, places, and things) are nodes, and the relationships between them are edges. This structure allows for a more nuanced representation of data and its interconnected nature. Take the above Knowledge Graph as an example.Doctor Node: This node represents the doctor. It is connected to the patient node with an edge labeled "Patient," indicating the doctor-patient relationship.Patient Node (Patient123): This is the central node representing a specific patient, known as "Patient123." It serves as a junction point connecting to various symptoms that the patient is experiencing.Symptom Nodes: There are three separate nodes representing individual symptoms that the patient has: "Fever," "Cough," and "Shortness of breath." Each of these symptoms is connected to the patient node by edges labeled "Symptom," indicating that these are the symptoms experienced by "Patient123. To simplify, the Knowledge Graph shows that "Patient123" is a patient of the "Doctor" and is experiencing three symptoms: fever, cough, and shortness of breath. This type of graph is useful in medical contexts where it's essential to model the relationships between patients, their healthcare providers, and their medical conditions or symptoms. It allows for easy querying of related data—for example, finding all symptoms associated with a particular patient or identifying all patients experiencing a certain symptom.Practical Integration of LLMs and Knowledge GraphsStep 1: Installing and Importing the Necessary LibrariesIn this step, we're going to bring in two essential libraries: rdflib for constructing our Knowledge Graph and openai for tapping into the capabilities of GPT, the Large Language Model.!pip install rdflib !pip install openai==0.28 import rdflib import openaiStep 2: Import your Personal OPENAI API KEYopenai.api_key = "Insert Your Personal OpenAI API Key Here"Step 3: Creating a Knowledge Graph# Create a new and empty Knowledge graph g = rdflib.Graph() # Define a Namespace for health-related data namespace = rdflib.Namespace("http://example.org/health/")Step 4: Adding data to Our GraphIn this part of the code, we will introduce a single entry to the Knowledge Graph pertaining to patient124. This entry will consist of three distinct nodes, each representing a different symptom exhibited by the patient.def add_patient_data(patient_id, symptoms): patient_uri = rdflib.URIRef(patient_id) for symptom in symptoms: symptom_predicate = namespace.hasSymptom g.add((patient_uri, symptom_predicate, rdflib.Literal(symptom))) # Example of adding patient data add_patient_data("Patient123", ["fever", "cough", "shortness of breath"])Step 5: Identifying the get_stock_price functionWe will utilize a simple query in order to extract the required data from the knowledge graph.def get_patient_symptoms(patient_id): # Correctly reference the patient's URI in the SPARQL query patient_uri = rdflib.URIRef(patient_id) sparql_query = f""" PREFIX ex: <http://example.org/health/> SELECT ?symptom WHERE {{ <{patient_uri}> ex:hasSymptom ?symptom. }} """ query_result = g.query(sparql_query) symptoms = [str(row.symptom) for row in query_result] return symptomsStep 6: Identifying the generate_llm_response functionThe generate_daignosis_response function takes as input the user’s name along with the list of symptoms extracted from the graph. Moving on, the LLM uses such data in order to give the patient the most appropriate diagnosis.def generate_diagnosis_response(patient_id, symptoms): symptoms_list = ", ".join(symptoms) prompt = f"A patient with the following symptoms - {symptoms_list} - has been observed. Based on these symptoms, what could be a potential diagnosis?" # Placeholder for LLM response (use the actual OpenAI API) llm_response = openai.Completion.create( model="text-davinci-003", prompt=prompt, max_tokens=100 ) return llm_response.choices[0].text.strip() # Example usage patient_id = "Patient123" symptoms = get_patient_symptoms(patient_id) if symptoms: diagnosis = generate_diagnosis_response(patient_id, symptoms) print(diagnosis) else: print(f"No symptoms found for {patient_id}.")Output: The potential diagnosis could be pneumonia. Pneumonia is a type of respiratory infection that causes symptoms including fever, cough, and shortness of breath. Other potential diagnoses should be considered as well and should be discussed with a medical professional.As demonstrated, the LLM connected the three symptoms—fever, cough, and shortness of breath—to suggest that patient123 may potentially be diagnosed with pneumonia.ConclusionIn summary, the collaboration of Large Language Models and Knowledge Graphs presents a substantial advancement in the realm of data analysis. This article has provided a straightforward illustration of their potential when working in tandem, with LLMs to efficiently extract and interpret data from Knowledge Graphs.As we further develop and refine these technologies, we hold the promise of significantly improving analytical capabilities and informing more sophisticated decision-making in an increasingly data-driven world.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium

0
0
301

article-image-generating-synthetic-data-with-llms

Mostafa Ibrahim

09 Nov 2023

8 min read

Generating Synthetic Data with LLMs

Mostafa Ibrahim

09 Nov 2023

8 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionIn this article, we will delve into the intricate process of synthetic data generation using LLMs. We will shed light on the concept behind the increasing importance of synthetic data, the prowess of LLMs in generating such data, and practical steps to harness the power of advanced models like OpenAI’s GPT-3.5. Whether you’re a seasoned AI enthusiast or a curious newcomer, embark with us on this enlightening journey into the heart of modern machine learning.What are LLMs?Large Language Models (LLMs) are state-of-the-art machine learning architectures primarily designed for understanding and generating human-like text. These models are trained on vast amounts of data, enabling them to perform a wide range of language tasks, from simple text completion to answering complex questions or even crafting coherent articles. Some examples of LLMs include:1. GPT-3 by OpenAI, with 175 billion parameters and up to 2048 tokens per unit.2. BERT by Google, with 340 million parameters and up to 512 tokens per unit.3. T5 (Text-to-Text Transfer Transformer by Google) with parameters ranging from 60 million to 11 billion depending on the model size. The number of tokens it can process is also influenced by its size and setup.That being said, LLMs, with their cutting-edge capabilities in NLP tasks like question answering and text summarization, are also highly regarded for their efficiency in generating synthetic data.Why Is There A Need for Synthetic Data1) Data ScarcityDo you ever grapple with the challenge of insufficient data to train your model? This dilemma is a daily reality for machine learning experts globally. Given that data gathering and processing are among the most daunting aspects of the entire machine-learning journey, the significance of synthetic data cannot be overstated.2) Data Privacy & SecurityReal-world data often contains sensitive information. For industries like healthcare and finance, there are stringent regulations around data usage. Such data may include customer’s credit cards, buying patterns, and diseases. Synthetic data can be used without compromising privacy since it doesn't contain real individual information.The Process of Generating Data with LLMsThe journey of producing synthetic data using Large Language Models begins with the preparation of seed data or guiding queries. This foundational step is paramount as it sets the trajectory for the type of synthetic data one wishes to produce. Whether it's simulating chatbot conversations or creating fictional product reviews, these initial prompts provide LLMs with the necessary context.Once the stage is set, we delve into the actual data generation phase. LLMs, with their advanced architectures, begin crafting text based on patterns they've learned from vast datasets. This capability enables them to produce information that aligns with the characteristics of real-world data, albeit synthesized.Generating Synthetic Data Using OpenAI’s GPT 3.5Step 1: Importing Neseccasry Librariesimport openaiStep 2: Set up the OpenAI API keyopenai.api_key = "Insert Your OpenAI key here"Step 3: Define our synthetic data generation functiondef generate_reviews(prompt, count=1): reviews = [] for i in range(count): review_generated = False while not review_generated: try: # Generate a response using the ChatCompletion method response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt} ] ) review = response.choices[0].message['content'].strip() word_count = len(review.split()) print("word count:", word_count) # Check if the word count is within the desired range if 15 <= word_count <= 70: print("counted") reviews.append(review) review_generated = True except openai.error.OpenAIError as err: print(f"Encountered an error: {err}") # Optional: Add a slight variation to the prompt for next iteration prompt += " Provide another perspective." return reviewsStep 4: Testing our functionprompt_text = "Write a 25 word positive review for a wireless earbud highlighting its battery life." num_datapoints = 5 generated_reviews = generate_reviews(prompt_text, num_datapoints)Step 5: Printing generated synthetic datafor idx, review in enumerate(generated_reviews): print(f"Review {idx + 1}: {review}")Output:Review 1: The battery life on these wireless earbuds is absolutely incredible! I can enjoy hours of uninterrupted music without worrying about recharging. Truly impressive!Review 2: "The battery life of these wireless earbuds is phenomenal! I can enjoy my favorite music for hours without worrying about recharging. Truly impressive!"Review 3: This wireless earbud is a game-changer! With an exceptional battery life that lasts all day, I can enjoy uninterrupted music and calls without any worries. It's a must-have for people on the go. Another perspective: As a fitness enthusiast, the long battery life of this wireless earbud is a true blessing. It allows me to power through my workouts without constantly needing to recharge, keeping me focused and motivated.Review 4: This wireless earbud's exceptional battery life is worth praising! It lasts all day long, keeping you immersed in your favorite tunes. A real game-changer for music enthusiasts.Review 5: The battery life of these wireless earbuds is exceptional, lasting for hours on end, allowing you to enjoy uninterrupted music or calls. They truly exceed expectations!Considerations and PitfallsHowever, the process doesn't conclude here. Generated data may sometimes have inconsistencies or lack the desired quality. Hence, post-processing, which involves refining and filtering the output, becomes essential. Furthermore, ensuring the variability and richness of the synthetic data is paramount, as too much uniformity can lead to overfitting when the data is employed for machine learning purposes. This refinement process should aim to eliminate any redundant or unrepresentative samples that could skew the model's learning process.Moreover, validating the synthetic data ensures that it meets the standards and purposes for which it was intended, ensuring both authenticity and reliability.ConclusionThroughout this article, we've navigated the process of synthetic data generation powered by LLMs. We've explained the underlying reasons for the escalating prominence of synthetic data, showcased the unparalleled proficiency of LLMs in creating such data, and provided actionable guidance to leverage the capabilities of pre-trained LLM models like OpenAI’s GPT-3.5.For all AI enthusiasts, we hope this exploration has deepened your appreciation and understanding of the evolving tapestry of machine learning, LLMs, and synthetic data. As we stand now, it is clear that both synthetic data and LLMs will be central to many breakthroughs to come.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium

0
0
1294

article-image-palm-2-a-game-changer-in-tackling-real-world-challenges

Sangita Mahala

07 Nov 2023

9 min read

PaLM 2: A Game-Changer in Tackling Real-World Challenges

Sangita Mahala

07 Nov 2023

9 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionA new large language model, Google AI's PaLM2, developed from a massive textual and code database. It's a successor of the PaLM program, and is even more powerful in terms of producing text, translating language, writing various types of creative content, and answering your questions by means of information. The research and development of PaLM 2 continues, but it has the potential to shake up many industries and research areas in terms of its ability to address a broad range of complex real-world problems.PaLM 2 is a new large language model from Google AI, trained on a massive dataset of text and code. It is even more powerful than its predecessor, PaLM, and can be used to solve a wide range of complex real-world problems.Powerful Tools for NLP, Code Generation, and Creative Writing by PaLM2In order to learn the complex relationships between words and phrases, LLMs, such as PaLM 2, are trained in massive databases of text and code. For this reason, they make excellent candidates for a wide range of tasks, such as:Natural language processing (NLP): There are also NLP tasks to be performed such as machine translation, text summary, and answering questions. In order to perform these tasks with high accuracy and consistency, PaLM 2 can be used.Code generation: A number of programming languages, including Python, Java, and C++ can be used for generating code by PaLML 2. It can also be useful for tasks like the automation of software development and the creation of new algorithms.Creative writing: Different creative text formats, such as poems, code, scripts, musical notes, emails, letters, etc. may be created by PaLM 2. It could be useful to the tasks of writing advertising copy, producing scripts for films and television shows as well as composing music.Real-World ExamplesTo illustrate how PaLM 2 can be put to use in solving the complicated problems of the actual world, these are some specific examples:Example 1: Drug DiscoveryIn the area of drug discovery, there are many promising applications to be had by PaLM 2. For the generation of new drug candidates, for the prediction of their properties, and for the simulation of their interaction with biological targets, PaLM 2 can be used. This may make it more quickly and efficiently possible for scientists to identify new drugs.In order to produce new drug candidates, PaLM 2 is able to screen several millions of possible compounds with the aim of binding to a specific target protein. This is a highly complex task, but PaLM 2 can speed it up very fast.Input code:import google.cloud.aiplatform as aip def drug_discovery(target_protein): """Uses PaLM 2 to generate new drug candidates for a given target protein. Args: target_protein: The target protein to generate drug candidates for. Returns: A list of potential drug candidates. """ # Create a PaLM 2 client. client = aip.PredictionClient() # Set the input prompt. prompt = f"Generate new drug candidates for the target protein {target_protein}." # Make a prediction. prediction = client.predict(model_name="paLM_2", inputs={"text": prompt}) # Extract the drug candidates from the prediction. drug_candidates = prediction.outputs["drug_candidates"] return drug_candidates # Example usage: target_protein = "ACE2" drug_candidates = drug_discovery(target_protein) print(drug_candidates) Output:A list of potential therapeutic candidates for that protein is provided by the function drug_discovery(). The specific output depends on the protein being targeted, and this example is as follows:This indicates that three possible drug candidates for target protein ACE2 have been identified by PaLM 2. In order to determine the effectiveness and safety of these substances, researchers may therefore carry out additional studies.Example 2: Climate ChangeIn order to cope with climate change, PaLM 2 may also be used. In order to model a climate system, anticipate the impacts of climate change and develop mitigation strategies it is possible to use PaLM 2.Using a variety of greenhouse gas emissions scenarios, PaLM 2 can simulate the Earth's climate. This information can be used for the prediction of climate change's effects on temperature, precipitation, and other factors.Input code:import google.cloud.aiplatform as aip def climate_change_prediction(emission_scenario): """Uses PaLM 2 to predict the effects of climate change under a given emission scenario. Args: emission_scenario: The emission scenario to predict the effects of climate change under. Returns: A dictionary containing the predicted effects of climate change. """ # Create a PaLM 2 client. client = aip.PredictionClient() # Set the input prompt. prompt = f"Predict the effects of climate change under the emission scenario {emission_scenario}." # Make a prediction. prediction = client.predict(model_name="paLM_2", inputs={"text": prompt}) # Extract the predicted effects of climate change from the prediction. predicted_effects = prediction.outputs["predicted_effects"] return predicted_effects # Example usage: emission_scenario = "RCP8.5" predicted_effects = climate_change_prediction(emission_scenario) print(predicted_effects) Output:The example given is RCP 8.5, which has been shown to be a large emission scenario. The model estimates that the global temperature will rise by 4.3 degrees C, with precipitation decreasing by 10 % in this scenario.Example 3: Material ScienceIn the area of material science, PaLM 2 may be used to create new materials with desired properties. In order to obtain the required properties such as durability, lightness, and conductivity, it is possible to use PaLM 2 for an assessment of millions of material possibilities.The development of new materials for batteries may be achieved with the use of PaLM 2. It is essential that the batteries be light, long lasting and have high energy density. Millions of potential material for such properties may be identified using PaLM 2.Input code:import google.cloud.aiplatform as aip def material_design(desired_properties): """Uses PaLM 2 to design a new material with the desired properties. Args: desired_properties: A list of the desired properties of the new material. Returns: A dictionary containing the properties of the designed material. """ # Create a PaLM 2 client. client = aip.PredictionClient() # Set the input prompt. prompt = f"Design a new material with the following desired properties: {desired_properties}" # Make a prediction. prediction = client.predict(model_name="paLM_2", inputs={"text": prompt}) # Extract the properties of the designed material from the prediction. designed_material_properties = prediction.outputs["designed_material_properties"] return designed_material_properties # Example usage: desired_properties = ["lightweight", "durable", "conductive"] designed_material_properties = material_design(desired_properties) print(designed_material_properties)Output:This means that the model designed a material with the following properties:Density: 1.0 grams per cubic centimeter (g/cm^3)Strength: 1000.0 megapascals (MPa)Conductivity: 100.0 watts per meter per kelvin (W/mK)This is only a prediction based on the language model, and further investigation and development would be needed to make this material real.Example 4: Predicting the Spread of Infectious DiseasesIn order to predict the spread of COVID-19 in a given region, PaLM 2 may be used. Factors that may be taken into account by PaLM2 include the number of infections, transmission, and vaccination rates. The PALM 2 method can also be used to predict the effects of preventive health measures, e.g. mask mandates and lockdowns.Input code:import google.cloud.aiplatform as aip def infectious_disease_prediction(population_density, transmission_rate): """Uses PaLM 2 to predict the spread of an infectious disease in a population with a given population density and transmission rate. Args: population_density: The population density of the population to predict the spread of the infectious disease in. transmission_rate: The transmission rate of the infectious disease. Returns: A dictionary containing the predicted spread of the infectious disease. """ # Create a PaLM 2 client. client = aip.PredictionClient() # Set the input prompt. prompt = f"Predict the spread of COVID-19 in a population with a population density of {population_density} and a transmission rate of {transmission_rate}." # Make a prediction. prediction = client.predict(model_name="paLM_2", inputs={"text": prompt}) # Extract the predicted spread of the infectious disease from the prediction. predicted_spread = prediction.outputs["predicted_spread"] return predicted_spread # Example usage: population_density = 1000 transmission_rate = 0.5 predicted_spread = infectious_disease_prediction(population_density, transmission_rate) print(predicted_spread)Output:An estimated peak incidence for infectious disease is 50%, meaning that half of the population will be affected at a particular time during an outbreak. The total number of anticipated cases is 500,000.It must be remembered that this is a prediction, and the rate at which infectious diseases are spreading can change depending on many factors like the effectiveness of disease prevention measures or how people behave.The development of new medicines, more effective energy systems and materials with desired properties is expected to take advantage of PALM 2 in the future. In order to predict the spread of infectious agents and develop mitigation strategies for Climate Change, PaLM 2 is also likely to be used.ConclusionIn conclusion, several sectors have transformed with the emergence of PaLM 2, Google AI's advanced language model. By addressing the complex problems of today's world, it is offering the potential for a revolution in industry. The applicability of the PALM 2 system to drug discovery, prediction of climate change, materials science, and infectious disease spread forecast is an example of its flexibility and strength.Responsibility and proper use of PaLM 2 are at the heart of this evolving landscape. It is necessary to combine the Model's capacity with human expertise in order to make full use of this potential, while ensuring that its application meets ethics standards and best practices. This technology may have the potential for shaping a brighter future, helping to solve complicated world problems across different fields as we continue our search for possible PaLM 2 solutions.Author BioSangita Mahala is a passionate IT professional with an outstanding track record, having an impressive array of certifications, including 12x Microsoft, 11x GCP, 2x Oracle, and LinkedIn Marketing Insider Certified. She is a Google Crowdsource Influencer and IBM champion learner gold. She also possesses extensive experience as a technical content writer and accomplished book blogger. She is always Committed to staying with emerging trends and technologies in the IT sector.

0
0
141

Prakhar Mishra

06 Nov 2023

9 min read

Fine-Tuning LLaMA 2

Prakhar Mishra

06 Nov 2023

9 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge Language Models have recently become the talk of the town. I am very sure, you must have heard of ChatGPT. Yes, that’s an LLM, and that’s what I am talking about. Every few weeks, we have been witnessing newer, better but not necessarily larger LLMs coming out either as open-source or closed-source. This is probably the best time to learn about them and make these powerful models work for your specific use case.In today’s blog, we will look into one of the recent open-source models called Llama2 and try to fine-tune it on a standard NLP task of recognizing entities from text. We will first look into what are large language models, what are open-source and closed-source models, and some examples of them. We will then move to learning about Llama2 and why is it so special. We then describe our NLP task and dataset. Finally, we get into coding.About Large Language Models (LLMs)Language models are artificial intelligence systems that have been trained to understand and generate human language. Large Language Models (LLMs) like GPT-3, ChatGPT, GPT-4, Bard, and similar can perform diverse sets of tasks out of the box. Often the quality of output from these large language models is highly dependent on the finesse of the prompt given by the user.These Language models are trained on vast amounts of text data from the Internet. Most of the language models are trained in an auto-regressive way i.e. they try to maximize the probability of the next word based on the words they have produced or seen in the past. This data includes a wide range of written text, from books and articles to websites and social media posts. Language models have a wide range of applications, including chatbots, virtual assistants, content generation, and more. They can be used in industries like customer service, healthcare, finance, and marketing.Since these models are trained on enormous data, they are already good at zero-shot inference and can be steered to perform better with few-shot examples. Zero-shot is a setup in which a model can learn to recognize things that it hasn't explicitly seen before in training. In a Few-shot setting, the goal is to make predictions for new classes based on the few examples of labeled data that is provided to it at inference time.Despite their amazing capabilities of generating text, these humongous models come with a few limitations that must be thought of when building an LLM-based production pipeline. Some of these limitations are hallucinations, biases, and more.Closed and Open-source Language ModelsLarge language models from closed-source are those employed by some companies and are not readily accessible to the public. Training data for these models are typically kept private. While they can be highly sophisticated, this limits transparency, potentially leading to concerns about bias, and data privacy.In contrast, open-source projects like GPT-3, are designed to be freely available to researchers and developers. These models are trained on extensive, publicly available datasets, allowing for a degree of transparency and collaboration.The decision between closed- and open-source language models is influenced by several variables, such as the project's goals, the need for openness, and others.About LLama2Meta's open-source LLM is called Llama 2. It was trained with 2 trillion "tokens" from publicly available sources like Wikipedia, Common Crawl, and books from the Gutenberg project. Three different parameter level model versions are available, i.e. 7 billion, 13 billion, and 70 billion parameter models. There are two types of completion models available: Chat-tuned and General. The chat-tuned models that have been fine-tuned for chatbot-like dialogue are denoted by the suffix '-chat'. We will use general Meta's 7b Llama-2 huggingface model as the base model that we fine-tune. Feel free to use any other version of llama2-7b.Also, if you are interested, there are several threads that you can go through to understand how good is Llama family w.r.t GPT family is - source, source, source.About Named Entity RecognitionAs a component of information extraction, named-entity recognition locates and categorizes specific entities inside the unstructured text by allocating them to pre-defined groups, such as individuals, organizations, locations, measures, and more. NER offers a quick way to understand the core idea or content of a lengthy text.There are many ways of extracting entities from a given text, in this blog, we will specifically delve into fine-tuning Llama2-7b using PEFT techniques on Colab Notebook.We will transform the SMSSpamCollection classification data set for NER. Pretty interesting 😀We search through all 10 letter words and tag them as 10_WORDS_LONG. And this is the entity that we want our Llama to extract. But why this bizarre formulation? I did it intentionally to show that this is something that the pre-trained model would not have seen during the pre-training stage. So it becomes essential to fine-tune it and make it work for our use case 👍. But surely we can add logic to our formulation - think of these words as probable outliers/noisy words. The larger the words, the higher the possibility of it being noise/oov. However, you will have to come up with the extract letter count after seeing the word length distribution. Please note that the code is generic enough for fine-tuning any number of entities. It’s just a change in the data preparation step that we will make to slice out only relevant entities.Code for Fine-tuning Llama2-7b# Importing Libraries from transformers import LlamaTokenizer, LlamaForCausalLM import torch from datasets import Dataset import transformers import pandas as pd from peft import get_peft_model, LoraConfig, TaskType, prepare_model_for_int8_training, get_peft_model_state_dict, PeftModel from sklearn.utils import shuffleData Preparation Phasedf = pd.read_csv('SMSSpamCollection', sep='\t', header=None) all_text = df[1].str.lower().tolist() input, output = [], [] for text in all_text: input.append(text) output.append({word: '10_WORDS_LONG' for word in text.split() if len(word)==10}) df = pd.DataFrame([input, output]).T df.rename({0:'input_text', 1: 'output_text'}, axis=1, inplace=True) print (df.head(5)) total_ds = shuffle(df, random_state=42) total_train_ds = total_ds.head(4000) total_test_ds = total_ds.tail(1500) total_train_ds_hf = Dataset.from_pandas(total_train_ds) total_test_ds_hf = Dataset.from_pandas(total_test_ds) tokenized_tr_ds = total_train_ds_hf.map(generate_and_tokenize_prompt) tokenized_te_ds = total_test_ds_hf.map(generate_and_tokenize_prompt) Fine-tuning Phase# Loading Modelmodel_name = "meta-llama/Llama-2-7b-hf" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) def create_peft_config(m): peft_cofig = LoraConfig( task_type=TaskType.CAUSAL_LM, inference_mode=False, r=8, lora_alpha=16, lora_dropout=0.05, target_modules=['q_proj', 'v_proj'], ) model = prepare_model_for_int8_training(model) model.enable_input_require_grads() model = get_peft_model(model, peft_cofig) model.print_trainable_parameters() return model, peft_cofig model, lora_config = create_peft_config(model) def generate_prompt(data_point): return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Extract entity from the given input: ### Input: {data_point["input_text"]} ### Response: {data_point["output_text"]}""" tokenizer.pad_token_id = 0 def tokenize(prompt, add_eos_token=True): result = tokenizer( prompt, truncation=True, max_length=128, padding=False, return_tensors=None, ) if ( result["input_ids"][-1] != tokenizer.eos_token_id and len(result["input_ids"]) < 128 and add_eos_token ): result["input_ids"].append(tokenizer.eos_token_id) result["attention_mask"].append(1) result["labels"] = result["input_ids"].copy() return result def generate_and_tokenize_prompt(data_point): full_prompt = generate_prompt(data_point) tokenized_full_prompt = tokenize(full_prompt) return tokenized_full_prompt training_arguments = transformers.TrainingArguments( per_device_train_batch_size=1, gradient_accumulation_steps=16, learning_rate=4e-05, logging_steps=100, optim="adamw_torch", evaluation_strategy="steps", save_strategy="steps", eval_steps=100, save_steps=100, output_dir="saved_models/" ) data_collator = transformers.DataCollatorForSeq2Seq(tokenizer) trainer = transformers.Trainer(model=model, tokenizer=tokenizer, train_dataset=tokenized_tr_ds, eval_dataset=tokenized_te_ds, args=training_arguments, data_collator=data_collator) with torch.autocast("cuda"): trainer.train()InferenceLoaded_tokenizer = LlamaTokenizer.from_pretrained(model_name) Loaded_model = LlamaForCausalLM.from_pretrained(model_name, load_in_8bit=True, torch.dtype=torch.float16, device_map=’auto’) Model = PeftModel.from_pretrained(Loaded_model, “saved_model_path”, torch.dtype=torch.float16) Model.config.pad_tokeni_id = loaded_tokenizer.pad_token_id = 0 Model.eval() def extract_entity(text): inp = Loaded_tokenizer(prompt, return_tensor=’pt’).to(“cuda”) with torch.no_grad(): P_ent = Loaded_tokenizer.decode(model.generate(**inp, max_new_tokens=128)[0], skip_special_tokens=True) int_idx = P_ent.find(‘Response:’) P_ent = P_ent[int_idx+len(‘Response:’):] return P_ent.strip() extracted_entity = extract_entity(text) print (extracted_entity) ConclusionWe covered the process of optimizing the llama2-7b model for the Named Entity Recognition job in this blog post. For that matter, it can be any task that you are interested in. The core concept that one must learn from this blog is PEFT-based training of large language models. Additionally, as pre-trained LLMs might not always perform well in your work, it is best to fine-tune these models.Author BioPrakhar Mishra has a Master’s in Data Science with over 4 years of experience in industry across various sectors like Retail, Healthcare, Consumer Analytics, etc. His research interests include Natural Language Understanding and generation, and has published multiple research papers in reputed international publications in the relevant domain. Feel free to reach out to him on LinkedIn

0
0
336

article-image-ai-distilled-24-google-invests-2-billion-in-anthropic-perplexitys-ai-search-engine-bidens-ai-executive-order-data-mining-with-gpt-4-rl-and-aws-deepracer

Merlyn Shelley

03 Nov 2023

13 min read

AI_Distilled #24: Google Invests $2 Billion in Anthropic, Perplexity's AI Search Engine, Biden's AI Executive Order, Data Mining with GPT-4, RL and AWS Deepracer

Merlyn Shelley

03 Nov 2023

13 min read

👋 Hello ,Welcome to another captivating edition of AI_Distilled, featuring recent advancements in training and fine-tuning LLMs, GPT and AI models for enhanced business outcomes.Let’s begin our news and analysis with an industry expert’s opinion. “Artificial intelligence is the science of making machines do things that would require intelligence if done by humans” – John McCarthy, Computer Scientist and AI Visionary. AI does indeed make machines intelligent, so much so that industry titans are now waging a proxy AI war with billions in startup funding. Without a doubt, AI is onto something big! In this week, we’ll talk about Biden's AI Executive Order, which has been praised for scope but deemed insufficient without legislation, Perplexity's AI Search Engine, OpenAI launching new team and challenge to prepare for catastrophic risks of advanced AI, Google Invests $2 Billion in Anthropic, and updating its Bug Bounty program to address AI security concerns. Look out for your fresh dose of AI resources, secret knowledge, and tutorials on how to use custom AI models to enhance complex technical workflows, improving LLM understanding with user feedback, and essential text preprocessing for effective machine learning with Python. 📥 Feedback on the Weekly EditionWhat do you think of this issue and our newsletter?Please consider taking the short survey below to share your thoughts and you will get a free PDF of the “The Applied Artificial Intelligence Workshop” eBook upon completion. Complete the Survey. Get a Packt eBook for Free!Writer’s Credit: Special shout-out to Vidhu Jain for their valuable contribution to this week’s newsletter content! Cheers, Merlyn Shelley Editor-in-Chief, Packt SignUp | Advertise | Archives⚡ TechWave: AI/GPT News & Analysis🔹 OpenAI Launches New Team and Challenge to Prepare for Catastrophic Risks of Advanced AI: The ChatGPT creator announced new efforts to prepare for potential catastrophic risks associated with highly advanced AI systems. The company is forming a new internal team called "Preparedness" to assess risks ranging from cybersecurity threats to autonomous biological replication. It is also launching an "AI Preparedness Challenge" with prize money to crowdsource ideas for preventing misuse of advanced AI. OpenAI says it aims to benefit humanity with cutting-edge AI while taking seriously the full spectrum of safety risks.🔹 Biden's AI Executive Order Praised for Scope but Deemed Insufficient Without Legislation: President Biden recently issued an executive order on AI that experts say covers important ground but lacks teeth without accompanying legislation from Congress. The order establishes guidelines and oversight for AI development and use, including in healthcare. However, many provisions simply codify voluntary industry practices. Stakeholders say Congress must pass more comprehensive AI regulations, but partisan disputes make near-term action unlikely. 🔹 Google Updates Bug Bounty Program to Address AI Security Concerns: Google has expanded its vulnerability rewards program to include incentives for discovering potential abuses of artificial intelligence systems. The update comes as worries grow over generative AI being exploited maliciously. Under the revised guidelines, security researchers can earn financial rewards for uncovering AI training data extraction that leaks private information. The move aligns with AI companies' recent White House pledge to better identify AI vulnerabilities. 🔹 Perplexity's AI Search Engine Garners $500M Valuation After New Funding: The AI startup Perplexity recently secured additional funding led by venture capital firm IVP, garnering a $500 million valuation. Perplexity is developing a conversational search engine to challenge Google's dominance using artificial intelligence. The company's iOS app and website traffic have been growing steadily amid rising interest in AI like ChatGPT. With deep ties to Google researchers, Perplexity leverages LLMs and has attracted investments from major industry figures. 🔹 Tech Giants Wage Proxy AI War with Billions in Startup Funding As Google Invests $2 Billion in Anthropic: Major technology companies like Google, Microsoft, and Amazon are investing billions in AI startups like OpenAI and Anthropic as surrogates in the race to lead the AI space. Unable to quickly build their own capabilities in large language models, the tech giants are funneling massive sums into the AI leaders to gain ownership stakes and technology access. Anthropic's $2 billion funding from Google follows similar multibillion investments from Microsoft and Amazon, fueling an expensive AI innovation war by proxy. 🔹 Poe Unveils Monetization for Third-Party Conversational AI Developers: The AI chatbot platform Poe has introduced a new revenue sharing model to let creators’ profit from building specialized bots. Poe will split subscription fees and pay per-message charges to offset infrastructure costs. An open API also allows adding custom natural language models beyond Poe's defaults. The moves aim to spur innovation by empowering niche developers. Poe believes reducing barriers will increase diversity, not just competition. 🔮 Expert Insights from Packt Community Generative AI with Python and TensorFlow 2 - By Joseph Babcock , Raghav Bali Kubeflow: an end-to-end machine learning lab As was described at the beginning of this chapter, there are many components of an end-to-end lab for machine learning research and development (Table 2.1), such as: A way to manage and version library dependencies, such as TensorFlow, and package them for a reproducible computing environment Interactive research environments where we can visualize data and experiment with different settings A systematic way to specify the steps of a pipeline – data processing, model tuning, evaluation, and deployment Provisioning of resources to run the modeling process in a distributed manner Robust mechanisms for snapshotting historical versions of the research process As we described earlier in this chapter, TensorFlow was designed to utilize distributed resources for training. To leverage this capability, we will use the Kubeflow projects. Built on top of Kubernetes, Kubeflow has several components that are useful in the end-to-end process of managing machine learning applications. Using Kubeflow Katib to optimize model hyperparameters Katib is a framework for running multiple instances of the same job with differing inputs, such as in neural architecture search (for determining the right number and size of layers in a neural network) and hyperparameter search (finding the right learning rate, for example, for an algorithm). Like the other Customize templates we have seen, the TensorFlow job specifies a generic TensorFlow job, with placeholders for the parameters: apiVersion: "kubeflow.org/v1alpha3" kind: Experiment metadata: namespace: kubeflow name: tfjob-example spec: parallelTrialCount: 3 maxTrialCount: 12 maxFailedTrialCount: 3 objective: type: maximize goal: 0.99 objectiveMetricName: accuracy_1 algorithm: algorithmName: random metricsCollectorSpec: source: fileSystemPath: path: /train kind: Directory collector: kind: TensorFlowEvent parameters: - name: --learning_rate parameterType: double feasibleSpace: min: "0.01" max: "0.05" - name: --batch_size parameterType: int feasibleSpace: min: "100" max: "200" trialTemplate: goTemplate: rawTemplate: |- apiVersion: "kubeflow.org/v1" kind: TFJob metadata: name: {{.Trial}} namespace: {{.NameSpace}} spec: tfReplicaSpecs: Worker: replicas: 1 restartPolicy: OnFailure template: spec: containers: - name: tensorflow image: gcr.io/kubeflow-ci/tf-mnist-with- summaries:1.0 imagePullPolicy: Always command: - "python" - "/var/tf_mnist/mnist_with_summaries.py" - "--log_dir=/train/metrics" {{- with .HyperParameters}} {{- range .}} - "{{.Name}}={{.Value}}" {{- end}} {{- end}} which we can run using the familiar kubectl syntax: kubectl apply -fhttps://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha3/tfjob-example.yaml This content is from the book “Generative AI with Python and TensorFlow 2” by Joseph Babcock , Raghav Bali (April 2021). Start reading a free chapter or access the entire Packt digital library free for 7 days by signing up now. To learn more, click on the button below. Read through the Chapter 1 unlocked here... 🌟 Secret Knowledge: AI/LLM Resources🔹 How to Use Custom AI Models to Enhance Complex Technical Workflows: In this post, you'll learn how Nvidia’s researchers leveraged customized LLMs to streamline intricate semiconductor chip design. The research demonstrates how to refine foundation models into customized assistants that understand industry-specific patterns. You'll see how careful data cleaning and selection enables high performance even with fewer parameters. The post explores step-by-step instructions on how researchers built a specialized AI that helps with writing code, improving documentation, and optimizing complex technical workflows. 🔹 How to Build Impactful LLM Applications: In this post, you'll explore lessons learned from creating Microsoft's Copilot products, such as Viva and PowerPoint. It discusses how combining LLMs with app context and other ML models can be a game-changer and demonstrates how parsing user queries and responses enables precise skill activation. By following their approach of utilizing multiple models to summarize insights without losing nuance, you can gain practical tips for your own LLM application development. 🔹 Understanding Convolutional Neural Networks and Vision Transformers: A Mathematical Perspective: You'll learn about convolutional neural networks and vision transformers in this post. They're great for image classification but differ in math, especially for generative tasks. You'll see how their training budgets work and understand their unique math. We'll also discuss their differences in complexity and memory usage. Plus, you'll learn why convolutional nets handle spatial coherence naturally, while vision transformers might need some help. By the end, you'll know why transformers are better for generating sequential data. 🔹 Improving Large Language Model Understanding with User Feedback: The post focuses on improving user intent detection for LLMs by utilizing disambiguation, context, and MemPrompt. These techniques enhance LLM responses, enabling better understanding of user intent, offering real-time feedback, and enhancing LLM performance and utility. 🔹 The Power of High-Quality Data in Language Models: The article emphasizes the significance of high-quality data for Large Language Models (LLMs). It introduces the concept of alignment, discussing how it influences LLM behavior. The article stresses the vital role of data quality and diversity in optimizing LLM performance and capabilities. 💡 Masterclass: AI/LLM Tutorials🔹 Enhance Language Model Performance with Step-Back Prompting: This guide explores the use of Step-Back Prompting to enhance LLMs' performance in complex tasks, like knowledge-intensive QA and multi-hop reasoning. It offers a step-by-step tutorial, including package setup and data collection, to implement this approach, potentially improving AI model behavior and responses. 🔹 Boosting AI at Scale with Vectorized Databases: This guide explores how vectorized databases are transforming LLMs like GPT-3 by enhancing their capabilities and scalability. It explains the principles of LLMs and the role of vectorized databases in empowering them. It discusses efficient data retrieval, optimization of vector operations, and scaling for real-time responses. The guide highlights use cases, including content generation and recommendation systems, where vectorized databases excel, and addresses the challenges of adopting them for LLMs. 🔹 Mastering Data Mining with GPT-4: A Practical Guide Using Seattle Weather Data: This guide explores the use of GPT-4 for data mining using Seattle's weather dataset. It covers AI's potential in data mining, detailing the process from exploratory data analysis to clustering and anomaly detection. GPT-4 assists in data loading, EDA, data cleaning, feature engineering, and suggests clustering methods. The post highlights the collaborative aspect of AI-human interaction and how GPT-4 can improve data mining and data analysis in the field of data science. 🔹 Introduction to Reinforcement Learning and AWS Deepracer: This post introduces reinforcement learning, a machine learning approach focused on maximizing rewards through agent-environment interactions. It compares it to motivating students based on performance. It explores practical applications via AWS Deepracer for self-driving cars, explaining key components and mentioning the Deepracer Student League as a learning opportunity. 🔹 Essential Text Preprocessing for Effective Machine Learning with Python: This post highlights crucial text preprocessing techniques for machine learning. It emphasizes the need to clean text data to avoid interference and unintended word distinctions. The methods, including removing numbers and handling extra spaces, enhance text data quality for effective machine learning applications. 🚀 HackHub: Trending AI Tools🔹 Pythagora-io/gpt-pilot: Boosts app development speed 20x via requirement specification, oversight, and coding assistance through clarifications and reviews. 🔹 hkuds/rlmrec: PyTorch implementation for the RLMRec model, enhancing recommenders with LLMs for advanced representation learning in recommendation systems. 🔹 THUDM/AgentTuning: Empowers LLMs by instruction-tuning them with interaction trajectories from various agent tasks, enhancing their generalization and language abilities. 🔹 cpacker/MemGPT: Enhances LLMs by intelligently managing memory tiers, enabling extended context and perpetual conversations.

0
0
152

article-image-debugging-and-monitoring-llms-with-weights-biases

Mostafa Ibrahim

31 Oct 2023

6 min read

Debugging and Monitoring LLMs With Weights & Biases

Mostafa Ibrahim

31 Oct 2023

6 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge Language Models, or LLMs for short, are becoming a big deal in the world of technology. They're powerful and can do a lot, but they're not always easy to handle. Just like when building a big tower, you want to make sure everything goes right from the start to the finish. That's where Weights & Biases, often called W&B, comes in. It's a tool that helps people keep an eye on how their models are doing. In this article, we'll talk about why it's so important to watch over LLMs, how W&B helps with that, and how to use it. Let's dive in!Large Language Models (LLMs)Large Language Models (LLMs) are machine learning models trained on vast amounts of text data to understand and generate human-like text. They excel in processing and producing language, enabling various applications like translation, summarization, and conversation.LLMs, such as GPT-3 by OpenAI, utilize deep learning architectures to learn patterns and relationships in the data, making them capable of sophisticated language tasks. Through training on diverse datasets, they aim to comprehend context, semantics, and nuances akin to human communication.When discussing the forefront of natural language processing, several Large Language Models (LLMs) consistently emerge: The Need for Debugging & Monitoring LLMsUnderstanding and overseeing Large Language Models (LLMs) is much like supervising an intricate machine: they're powerful, and versatile, but require keen oversight.Firstly, think about the intricacy of LLMs. They far surpass the complexity of your typical day-to-day machine learning models. While they hold immense potential to revolutionize tasks involving language - think customer support, content creation, and translations - their intricate designs can sometimes misfire. If we're not careful, instead of a smooth conversation with a chatbot, users might encounter bewildering responses, leading to user frustration and diminished trust.Then there's the matter of resources. Training LLMs isn't just about the time; it's also financially demanding. Each hiccup, if not caught early, can translate to unnecessary expenditures. It's much like constructing a skyscraper; mid-way errors are costlier to rectify than those identified in the blueprint phase.Introduction to Weights & BiasesSourceWeights & Biases (W&B) is a cutting-edge platform tailored for machine learning practitioners. It offers a suite of tools designed to help streamline the model development process, from tracking experiments to visualizing results.With W&B, researchers and developers can efficiently monitor their LLM training progress, compare different model versions, and collaborate with team members. It's an invaluable asset for anyone looking to optimize and scale their machine-learning workflows.How to Use W&B for Debugging & Monitoring LLMsIn the hands-on section of this article, we will adhere to the following structured approach, illustrated in the diagram below. We will fine-tune our model and leverage Weights and biases to save critical metrics, tables, and visualizations. This will empower us with deeper insights, enabling efficient debugging and monitoring of our Large Language Models. 1. Setting up Weights and Biasesa. Importing Necessary Librariesimport torch import wandb from transformers import BertTokenizer, BertForSequenceClassification from torch.utils.data import DataLoader, random_split from datasets import load_datasetIntizailaizing W&B # Initialize W&B wandb.init(project='llm_monitoring', name='bert_example')b. Loading the BERT Model# Load tokenizer and model tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForSequenceClassification.from_pretrained('bert-base-uncased')2. Fine-tuning your Modela. Loading your datasetdataset = load_dataset('Load your dataset')b. Fine-tuning the modelfor epoch in range(config.epochs): model.train() for batch in train_dataloader: # ………. # Continue training process here # ………..3. Tracking Metrics# Log the validation metrics to W&B wandb.log({ "Epoch": epoch, "Validation Loss": avg_val_loss, "Validation Accuracy": val_accuracy })4. Graph Visualizationsa. Plotting and logging Training Loss Graphfig, ax = plt.subplots(figsize=(10,5)) ax.plot(train_losses, label="Training Loss", color='blue') ax.set(title="Training Losses", xlabel="Epoch", ylabel="Loss") wandb.log({"Training Loss Curve": wandb.Image(fig)})b. Plotting and logging Validation Loss Graphfig, ax = plt.subplots(figsize=(10,5)) ax.plot(val_losses, label="Validation Loss", color='orange') ax.set(title="Validation Losses", xlabel="Epoch", ylabel="Loss") wandb.log({"Validation Loss Curve": wandb.Image(fig)})c. Plotting and Log Validation Accuracy Graphfig, ax = plt.subplots(figsize=(10,5)) ax.plot(val_accuracies, label="Validation Accuracy", color='green') ax.set(title="Validation Accuracies", xlabel="Epoch", ylabel="Accuracy") wandb.log({"Validation Accuracy Curve": wandb.Image(fig)})d. Plotting and Log Training Accuracy Graphfig, ax = plt.subplots(figsize=(10,5)) ax.plot(train_accuracies, label="Training Accuracy", color='blue') ax.set(title="Training Accuracies", xlabel="Epoch", ylabel="Accuracy") wandb.log({"Training Accuracy Curve": wandb.Image(fig)})5. Manual Checkupsquestions = ["What's the weather like?", "Who won the world cup?", "How do you make an omelette?", "Why is the sky blue?", "When is the next holiday?"] old_model_responses = ["It's sunny.", "France won the last one.", "Mix eggs and fry them.", "Because of the atmosphere.", "It's on December 25th."] new_model_responses = ["The weather is clear and sunny.", "Brazil was the champion in the previous world cup.", "Whisk the eggs, add fillings, and cook in a pan.", "Due to Rayleigh scattering.", "The upcoming holiday is on New Year's Eve."] # Create a W&B Table table = wandb.Table(columns=["question", "old_model_response", "new_model_response"]) for q, old, new in zip(questions, old_model_responses, new_model_responses): table.add_data(q, old, new) # Log the table to W&B wandb.log({"NLP Responses Comparison": table}) 6. Closing the W&B run after all logs are uploadedwandb.finish()ConclusionLarge Language Models have truly transformed the landscape of technology. Their vast capabilities are nothing short of amazing, but like all powerful tools, they require understanding and attention. Fortunately, with platforms like Weights & Biases, we have a handy toolkit to guide us. It reminds us that while LLMs are game-changers, they still need a bit of oversight.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium

0
0
254

article-image-evaluating-large-language-models

Vivekanandan Srinivasan

27 Oct 2023

8 min read

Evaluating Large Language Models

Vivekanandan Srinivasan

27 Oct 2023

8 min read

0
0
1900

article-image-detecting-and-mitigating-hallucinations-in-llms

Ryan Goodman

25 Oct 2023

10 min read

Detecting and Mitigating Hallucinations in LLMs

Ryan Goodman

25 Oct 2023

10 min read

0
0
1969

article-image-large-language-models-llms-in-education

Chaitanya Yadav

23 Oct 2023

8 min read

Large Language Models (LLMs) in Education

Chaitanya Yadav

23 Oct 2023

8 min read

0
0
760

article-image-testing-large-language-models-llms

20 Oct 2023

7 min read

Testing Large Language Models (LLMs)

20 Oct 2023

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Machine learning has become ubiquitous, with models powering everything from search engines and recommendation systems to chatbots and autonomous vehicles. As these models grow more complex, testing them thoroughly is crucial to ensure they behave as expected. This is especially true for large language models like GPT-4 that generate human-like text and engage in natural conversations.In this article, we will explore strategies for testing machine learning models, with a focus on evaluating the performance of LLMs.IntroductionMachine learning models are notoriously challenging to test due to their black-box nature. Unlike traditional code, we cannot simply verify the logic line-by-line. ML models learn from data and make probabilistic predictions, so their decision-making process is opaque.While testing methods like unit testing and integration testing are common for traditional software, they do not directly apply to ML models. We need more specialized techniques to validate model performance and uncover unexpected or undesirable behavior.Testing is particularly crucial for large language models. Since LLMs can generate free-form text, it's hard to anticipate their exact responses. Flaws in the training data or model architecture can lead to Hallucinations, biases, and errors that only surface during real-world usage. Rigorous testing provides confidence that the model works as intended.In this article, we will cover testing strategies to evaluate LLMs. The key techniques we will explore are:Similarity testingColumn coverage testingExact match testingVisual output testingLLM-based evaluationBy combining these methods, we can thoroughly test LLMs along multiple dimensions and ensure they provide coherent, accurate, and appropriate responses.Testing Text Output with Similarity SearchA common output from LLMs is text. This could be anything from chatbot responses to summaries generated from documents. A robust way to test quality of text output is similarity testing.The idea is simple - we define an expected response and compare the model's actual response to determine how similar they are. The higher the similarity score, the better.Let's walk through an example using our favorite LLM. Suppose we give it the prompt:Prompt: What is the capital of Italy?The expected response would be:Expected: The capital of Italy is Rome.Now we can pass this prompt to the LLM and get the actual response:prompt = "What is the capital of Italy?" actual = llm.ask(prompt) Let's say actual contains:Actual: Rome is the capital of Italy.While the wording is different, the meaning is the same. To quantify this similarity, we can use semantic search libraries like SentenceTransformers. It represents sentences as numeric vectors and computes similarity using cosine distance.from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') expected_embedding = model.encode(expected) actual_embedding = model.encode(actual) similarity = cosine_similarity([expected_embedding], [actual_embedding])[0][0] This yields a similarity score of 0.85, indicating the responses are highly similar in meaning.We can establish a threshold for the minimum acceptable similarity, like 0.8. Responses below this threshold fail the test. By running similarity testing over many prompt-response pairs, we can holistically assess the textual coherence of an LLM.Testing Tabular Outputs with Column CoverageIn addition to text, LLMs can output tables or data frames. For testing these, we need different techniques that account for structure.A good validation is column coverage - checking what percentage of columns in the expected output are present in the actual output.Consider the LLM answering questions about movies:Prompt: What are the top 3 highest grossing movies of all time?Expected:MovieWorldwide GrossRelease YearAvatar$2,789,679,7942009Titanic$2,187,463,9441997Star Wars Ep. VII$2,068,223,6242015Now we can test the LLM’s actual output:prompt = "What are the top 3 highest grossing movies of all time?" actual = llm.ask(prompt) Actual:MovieGlobal RevenueYearAvatar$2.789 billion2009Titanic$2.187 billion1997Star Wars: The Force Awakens$2.068 billion2015Here, actual contains the same 3 columns as expected - Movie, Gross, Release Year. So even though the headers and cell values differ slightly, we can pair them with cosine similarity and we will have 100% column coverage.We can formalize this in code:expected_cols = set(expected.columns) actual_cols = set(actual.columns) column_coverage = len(expected_cols & actual_cols) / len(expected_cols) # column_coverage = 1.0 For tables with many columns, we may only need say 90% coverage to pass the test. This validation ensures the critical output columns are present while allowing variability in column names or ancillary data.Exact Match for Numeric OutputsWhen LLMs output a single number or statistic, we can use simple exact match testing.Consider this prompt:Prompt: What was Apple's total revenue in 2021?Expected: $365.82 billionWe get the LLM’s response:prompt = "What was Apple's total revenue in 2021?" actual = llm.ask(prompt) Actual: $365.82 billionIn this case, we expect an exact string match:is_match = (actual == expected) # is_match = True For numerical outputs, precision is important. Exact match testing provides a straightforward way to validate this.Screenshot Testing for Visual OutputsBuilding PandasAI, we sometimes need to test generated charts. Testing these outputs requires verifying the visualized data is correct.One method is screenshot testing - comparing screenshots of the expected and actual visuals. For example:Prompt: Generate a bar chart comparing the revenue of FAANG companies.Expected: [Expected_Chart.png]Actual: [Actual_Chart.png]We can then test if the images match:from PIL import Image, ImageChops expected_img = Image.open("./Expected_Chart.png") actual_img = Image.open("./Actual_Chart.png") diff = ImageChops.difference(expected_img, actual_img) is_match = diff.getbbox() is None // is_match = True if images matchFor more robust validation, we could use computer vision techniques like template matching to identify and compare key elements: axes, bars, labels, etc.Screenshot testing provides quick validation of visual output without needing to interpret the raw chart data.LLM-Based EvaluationAn intriguing idea for testing LLMs is to use another LLM!The concept is to pass the expected and actual outputs to a separate "evaluator" LLM and ask if they match.For example:Expected: Rome is the capital of Italy.Actual: The capital of Italy is Rome.We can feed this to the evaluator model:Prompt: Do these two sentences convey the same information? Answer YES or NOSentence 1: Rome is the capital of Italy.Sentence 2: The capital of Italy is Rome.Evaluator: YESThe evaluator LLM acts like a semantic similarity scorer. This takes advantage of the natural language capabilities of LLMs.The downside is it evaluates one black box model using another black box model. Errors or biases in the evaluator could lead to incorrect assessments. So LLM-based evaluation should complement other testing approaches, not act as the sole method.ConclusionTesting machine learning models thoroughly is critical as they grow more ubiquitous and impactful. Large language models pose unique testing challenges due to their free-form textual outputs.Using a combination of similarity testing, column coverage validation, exact match, visual output screening, and even LLM-based evaluation, we can rigorously assess LLMs along multiple dimensions.A comprehensive test suite combining these techniques will catch more flaws and flaws than any single method alone. This builds essential confidence that LLMs behave as expected in the real world.Testing takes time but prevents much larger problems down the road. The strategies covered in this article will add rigor to the development and deployment of LLMs, helping ensure these powerful models benefit humanity as intended.Author BioGabriele Venturi is a software engineer and entrepreneur who started coding at the young age of 12. Since then, he has launched several projects across gaming, travel, finance, and other spaces - contributing his technical skills to various startups across Europe over the past decade.Gabriele's true passion lies in leveraging AI advancements to simplify data analysis. This mission led him to create PandasAI, released open source in April 2023. PandasAI integrates large language models into the popular Python data analysis library Pandas. This enables an intuitive conversational interface for exploring data through natural language queries.By open-sourcing PandasAI, Gabriele aims to share the power of AI with the community and push boundaries in conversational data analytics. He actively contributes as an open-source developer dedicated to advancing what's possible with generative AI.

0
0
1095

article-image-reducing-hallucinations-with-intent-classification

Gabriele Venturi

13 Oct 2023

10 min read

Reducing Hallucinations with Intent Classification

Gabriele Venturi

13 Oct 2023

10 min read

0
0
115

How-To Tutorials - LLM

AI_Distilled #28: Unveiling Innovations Reshaping Our World

Deploying LLMs with Amazon SageMaker - Part 2

Deploying LLMs with Amazon SageMaker - Part 1

LLMs For Extractive Summarization in NLP

Large Language Models (LLMs) and Knowledge Graphs

Generating Synthetic Data with LLMs

PaLM 2: A Game-Changer in Tackling Real-World Challenges

Fine-Tuning LLaMA 2

AI_Distilled #24: Google Invests $2 Billion in Anthropic, Perplexity's AI Search Engine, Biden's AI Executive Order, Data Mining with GPT-4, RL and AWS Deepracer

Debugging and Monitoring LLMs With Weights & Biases

Trending Topics

Evaluating Large Language Models

Detecting and Mitigating Hallucinations in LLMs

Large Language Models (LLMs) in Education

Testing Large Language Models (LLMs)

Reducing Hallucinations with Intent Classification