LLM | 81 articles | Tech News, Tutorials & Expert Insights

article-image-large-language-model-operations-llmops-in-action

11 Oct 2023

6 min read

Large Language Model Operations (LLMOps) in Action

11 Oct 2023

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionIn an era dominated by the rise of artificial intelligence, the power and promise of Large Language Models (LLMs) stand distinct. These colossal architectures, designed to understand and generate human-like text, have revolutionized the realm of natural language processing. However, with great power comes great responsibility – the onus of managing, deploying, and refining these models in real-world scenarios. This article delves into the world of Large Language Model Operations (LLMOps), an emerging field that bridges the gap between the potential of LLMs and their practical application.BackgroundThe last decade has seen a significant evolution in language models, with models growing in size and capability. Starting with smaller models like Word2Vec and LSTM, we've advanced to behemoths like GPT-3, BERT, and T5. With that said, as these models grew in size and complexity, so did their operational challenges. Deploying, maintaining, and updating these models requires substantial computational resources, expertise, and effective management strategies.MLOps vs LLMOpsIf you've ventured into the realm of machine learning, you've undoubtedly come across the term MLOps. MLOps, or Machine Learning Operations, encapsulates best practices and methodologies for deploying and maintaining machine learning models throughout their lifecycle. It caters to the wide spectrum of models that fall under the machine learning umbrella.On the other hand, with the growth of vast and intricate language models, a more specialized operational domain has emerged: LLMOps. While both MLOps and LLMOps share foundational principles, the latter specifically zeros in on the challenges and nuances of deploying and managing large-scale language models. Given the colossal size, data-intensive nature, and unique architecture of these models, LLMOps brings to the fore bespoke strategies and solutions that are fine-tuned to ensure the efficiency, efficacy, and sustainability of such linguistic powerhouses in real-world scenarios.Core Concepts of LLMOpsLarge Language Models Operations (LLMOps) focuses on the management, deployment, and optimization of large language models (LLMs). One of its foundational concepts is model deployment, emphasizing scalability to handle varied loads, reducing latency for real-time responses, and maintaining version control. As these LLMs demand significant computational resources, efficient resource management becomes pivotal. This includes the use of optimized hardware like GPUs and TPUs, effective memory optimization strategies, and techniques to manage computational costs.Continuous learning and updating, another core concept, revolve around fine-tuning models with new data, avoiding the pitfall of 'catastrophic forgetting', and effectively managing data streams for updates. Parallelly, LLMOps emphasizes the importance of continuous monitoring for performance, bias, fairness, and iterative feedback loops for model improvement. To cater to the vastness of LLMs, model compression techniques like pruning, quantization, and knowledge distillation become crucial.How do LLMOps workPre-training Model DevelopmentLarge Language Models typically start their journey through a process known as pre-training. This involves training the model on vast amounts of text data. The objective during this phase is to capture a broad understanding of language, learning from billions of sentences and paragraphs. This foundational knowledge helps the model grasp grammar, vocabulary, factual information, and even some level of reasoning.This massive-scale training is what makes them "large" and gives them a broad understanding of language. Optimization & CompressionModels trained to this extent are often so large that they become impractical for daily tasks.To make these models more manageable without compromising much on performance, techniques like model pruning, quantization, and knowledge distillation are employed.Model Pruning: After training, pruning is typically the first optimization step. This begins with trimming model weights and may advance to more intensive methods like neuron or channel pruning.Quantization: Following pruning, the model's weights, and potentially its activations, are streamlined. Though weight quantization is generally a post-training process, for deeper reductions, such as very low-bit quantization, one might adopt quantization-aware training from the beginning.Additional recommendations are:Optimizing the model specifically for the intended hardware can elevate its performance. Before initiating training, selecting inherently efficient architectures with fewer parameters is beneficial. Approaches that adopt parameter sharing or tensor factorization prove advantageous. For those planning to train a new model or fine-tune an existing one with an emphasis on sparsity, starting with sparse training is a prudent approach.Deployment Infrastructure After training and compressing our LLM, we will be using technologies like Docker and Kubernetes to deploy models scalably and consistently. This approach allows us to flexibly scale using as many pods as needed. Concluding the deployment process, we'll implement edge deployment strategies. This positions our models nearer to the end devices, proving crucial for applications that demand real-time responses.Continuous Monitoring & FeedbackThe process starts with the Active model in production. As it interacts with users and as language evolves, it can become less accurate, leading to the phase where the Model becomes stale as time passes.To address this, feedback and interactions from users are captured, forming a vast range of new data. Using this data, adjustments are made, resulting in a New fine-tuned model.As user interactions continue and the language landscape shifts, the current model is replaced with the new model. This iterative cycle of deployment, feedback, refinement, and replacement ensures the model always stays relevant and effective.Importance and Benefits of LLMOpsMuch like the operational paradigms of AIOps and MLOps, LLMOps brings a wealth of benefits to the table when managing Large Language Models.MaintenanceAs LLMs are computationally intensive. LLMOps streamlines their deployment, ensuring they run smoothly and responsively in real-time applications. This involves optimizing infrastructure, managing resources effectively, and ensuring that models can handle a wide variety of queries without hiccups.Consider the significant investment of effort, time, and resources required to maintain Large Language Models like Chat GPT, especially given its vast user base.Continuous ImprovementLLMOps emphasizes continuous learning, allowing LLMs to be updated with fresh data. This ensures that models remain relevant, accurate, and effective, adapting to the evolving nature of language and user needs.Building on the foundation of GPT-3, the newer GPT-4 model brings enhanced capabilities. Furthermore, while ChatGPT was previously trained on data up to 2021, it has now been updated to encompass information through 2022.It's important to recognize that constructing and sustaining large language models is an intricate endeavor, necessitating meticulous attention and planning.ConclusionThe ascent of Large Language Models marks a transformative phase in the evolution of machine learning. But it's not just about building them; it's about harnessing their power efficiently, ethically, and sustainably. LLMOps emerge as the linchpin, ensuring that these models not only serve their purpose but also evolve with the ever-changing dynamics of language and user needs. As we continue to innovate, the principles of LLMOps will undoubtedly play a pivotal role in shaping the future of language models and their place in our digital world.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium

0
0
100

article-image-supercharge-your-business-applications-with-azure-openai

Aroh Shukla

10 Oct 2023

8 min read

Supercharge Your Business Applications with Azure OpenAI

Aroh Shukla

10 Oct 2023

8 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionThe rapid advancement of technology, particularly in the domain of extensive language models like ChatGPT, is making waves across industries. These models leverage vast data resources and cloud computing power, gaining popularity not just among tech enthusiasts but also mainstream consumers. As a result, there's a growing demand for such experiences in daily tools, both from employees and customers who increasingly expect AI integration. Moreover, this technology promises transformative impacts. This article explores how organizations can harness Azure OpenAI with low-code platforms to leverage these advancements, opening doors to innovative applications and solutions.Introduction to the Power PlatformPower Platform is a Microsoft Low Code platform that spans Microsoft 365, Azure, Dynamics 365, and standalone apps. a) Power Apps: Rapid low-code app development for businesses with a simple interface, scalable data platform, and cross-device compatibility.b) Power Automate: Automates workflows between apps, from simple tasks to enterprise-grade processes, accessible to users of all technical levels.c) Power BI: Delivers data insights through visualizations, scaling across organizations with built-in governance and security.d) Power Virtual Agents: Creates chatbots with a no-code interface, streamlining integration with other systems through Power Automate.e) Power Pages: Enterprise-grade, low-code SaaS for creating and hosting external-facing websites, offering customization and user-friendly design.Introduction to Azure OpenAIThe collaboration between Microsoft's Azure cloud platform and OpenAI, known as Azure OpenAI Open, presents an exciting opportunity for developers seeking to harness the capabilities of cutting-edge AI models and services. This collaboration facilitates the creation of innovative applications across a spectrum of domains, ranging from natural language processing to AI-powered solutions, all seamlessly integrated within the Azure ecosystem.The rapid pace of technological advancement, characterized by the proliferation of extensive language models like ChatGPT, has significantly altered the landscape of AI. These models, by tapping into vast data resources and leveraging the substantial computing power available in today's cloud infrastructure, have gained widespread popularity. Notably, this technological revolution has transcended the boundaries of tech enthusiasts and has become an integral part of mainstream consumer experiences.As a result, organizations can anticipate a growing demand for AI-driven functionalities within their daily tools and services. Employees seek to enhance their productivity with these advanced capabilities, while customers increasingly expect seamless integration of AI to improve the quality and efficiency of the services they receive.Beyond meeting these rising expectations, the collaboration between Azure and OpenAI offers the potential for transformative impacts. This partnership enables developers to create applications that can deliver tangible and meaningful outcomes, revolutionizing the way businesses operate and interact with their audiences. Azure OpenAI Prerequisites These prerequisites enable you to leverage Azure OpenAI's capabilities for your projects.1. Azure Account: Sign up for an Azure account free or paid: https://azure.microsoft.com/2. Azure Subscription: Acquire an active Azure subscription. 3. Azure OpenAI Request Form: Follow this form https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xUOFA5Qk1UWDRBMjg0WFhPMkIzTzhKQ1dWNyQlQCN0PWcu . 4. API Endpoint: Know your API endpoint for integration. 5. Azure SDKs: Familiarize with Azure SDKs for seamless development: https://docs.microsoft.com/azure/developer/python/azure-sdk-overview Step to Supercharge your business applications with Azure OpenAIStep 1: Azure OpenAI Instance and keys 1. Create a new Azure OpenAI Instance. Select region that is available for Azure OpenAI and name the instance accordingly. 2. Once Azure OpenAI instanced is provisioned, select he Explore button 3. Under Management, select Deployment and select Create new deployment 4. Select gpt-35-turbo and give a deployment name a meaningful name. 5. Under playground select deployment and select View code. 6. In this sample code, copy the endpoint and key in a notepad file that you will use in the next steps. Step 2: Create a Power Apps Canvas app1. Create a Power Apps Canvas App with Tablet format.2. Add textboxes for questions, a button that will trigger Power Automate, and gallery for output from Power Automate via Azure OpenAI service: 3. Connect Flow by Selecting the Power Automate icon, selecting Add flow and selecting Create new flow 4. Select a blank flow Step 3: Power Automate1. In Power Automate, name the flow, next step search for HTTP action. Take note HTTP action is a Premium connector. 2. Next, configure the HTTP action with this step.a. Method: POSTb. URL: You copied this at Step 1.6c. Endpoint: You copied this at Step 1.6d. Body: Follow the Microsoft Azure OpenAI documentation https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chatgpt?pivots=programming-language-chat-completionse. Select Add dynamic content, double click on Ask in PowerApps and new parameter HTTP_Body has been createdf. Use Power Apps Parameter. 3. In the Next Step, search for Compose, use Body as a parameter, and save the flow. 4. Run the flow 5. Copy the outputs of the Compose action to Notepad. You will use it in the next steps later. 6. In the Next Step, search for Parse JSON, in Add dynamic content locate Body, and drag to Content 7. Select Generate from sample 8. Paste the content that you did in previous Step 3.5. 9. In the next step, search for Response action, in Add dynamic content select Choices 5. Save the flow. Step 4: Write up the Canvas app with Power Automate1. Set variables at button control 2. Use Gallery Control to display output from Power Automate via Azure OpenAI. Step 5: Test the App1. Ask a question in the Power Apps user interface as shown below 2. After a few seconds, we get a response that comes from Azure OpenAI. ConclusionYou've successfully configured a comprehensive integration involving three key services to enhance your application's capabilities:1. Power Apps: This service serves as the user-facing visual interface, providing an intuitive and user-friendly experience for end-users. With Power Apps, you can design and create interactive applications tailored to your specific needs, empowering users to interact with your system effortlessly.2. Power Automate: For establishing seamless connectivity between your application and Azure OpenAI, Power Automate comes into play. It acts as the bridge that enables data and process flow between different services. With Power Automate, you can automate workflows, manage data, and trigger actions, ensuring a smooth interaction between your application and Azure OpenAI's services.3. Azure OpenAI: Leveraging the capabilities of Azure OpenAI, particularly the ChatGPT 3.5 Turbo service, opens up a world of advanced natural language processing and AI capabilities. This service enables your application to understand and respond to user inputs, making it more intelligent and interactive. Whether it's for chatbots, text generation, or language understanding, Azure OpenAI's powerful tools can significantly enhance your application's functionality.By integrating these three services seamlessly, you've created a robust ecosystem for your application. Users can interact with a visually appealing and user-friendly interface powered by Power Apps, while Power Automate ensures that data and processes flow smoothly behind the scenes. Azure OpenAI, with its advanced language capabilities, adds a layer of intelligence to your application, making it more responsive and capable of understanding and generating human-like text.This integration not only improves user experiences but also opens up possibilities for more advanced and dynamic applications that can understand, process, and respond to natural language inputs effectively.Source CodeYou can rebuild the entire solution at my GitHub Repo at https://github.com/aarohbits/AzureOpenAIWithPowerPlatform/blob/main/01%20AzureOpenAIPowerPlatform/Readme.md Author BioAroh Shukla is a Microsoft Most Valuable Professional (MVP) Alumni and a Microsoft Certified Trainer (MCT) with expertise in Power Platform and Azure. He assists customers from various industries in their digital transformation endeavors, crafting tailored solutions using advanced Microsoft technologies. He is not only dedicated to serving students and professionals on their Microsoft technology journeys but also excels as a community leader with strong interpersonal skills, active listening, and a genuine commitment to community progress.He possesses a deep understanding of the Microsoft cloud platform and remains up-to-date with the latest innovations. His exceptional communication abilities enable him to effectively convey complex technical concepts with clarity and conciseness, making him a valuable resource in the tech community.

0
0
134

article-image-question-answering-in-langchain

Mostafa Ibrahim

10 Oct 2023

8 min read

Question Answering in LangChain

Mostafa Ibrahim

10 Oct 2023

8 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionImagine seamlessly processing vast amounts of data, posing any question, and receiving eloquently crafted answers in return. While Large Language Models like ChatGPT excel with general data, they falter when it comes to your private information—data you'd rather not broadcast to the world. Enter LangChain: it empowers us to harness any NLP model, refining it with our exclusive data.In this article, we'll explore LangChain, a framework designed for building applications with language models. We'll guide you through training a model, specifically the OpenAI Chat GPT, using your selected private data. While we provide a structured tutorial, feel free to adapt the steps based on your dataset and model preferences, as variations are expected and encouraged. Additionally, we'll offer various feature alternatives that you can incorporate throughout the tutorial.The Need for Privacy in Question AnsweringUndoubtedly, the confidentiality of personalized data is of absolute importance. While companies amass vast amounts of data daily, which offers them invaluable insights, it's crucial to safeguard this information. Disclosing such proprietary information to external entities could jeopardize the company's competitive edge and overall business integrity.How Does the Fine-Tuning LangChain Process WorkStep 1: Identifying The Appropriate Data SourceBefore selecting the right dataset, it's essential to ask a few preliminary questions. What specific topic are you aiming to inquire about? Is the data set sufficient? And such.Step 2: Integrating The Data With LangChainBased on the dataset's file format you have, you'll need to adopt different methods to effectively import the data into LangChain.Step 3: Splitting The Data Into ChunksTo ensure efficient data processing, it's crucial to divide the dataset into smaller segments, often referred to as chunks.Step 4: Transforming The Data Into EmbeddingsEmbedding is a technique where words or phrases from the vocabulary are mapped to vectors of real numbers. The idea behind embeddings is to capture the semantic meaning and relationships of words in a lower-dimensional space than the original representation.Step 5: Asking Queries To Our ModelFinally, after training our model on the updated documentation, we can directly query it for any information we require.Full LangChain ProcessDataSet UsedLangChain's versatility stems from its ability to process varied datasets. For our demonstration, we utilize the "Giskard Documentation", a comprehensive guide on the Giskard framework.Giskard is an open-source testing framework for Machine Learning models, spanning various Python model types. It automatically detects vulnerabilities in ML models, generates domain-specific tests, and integrates open-source QA best practices.Having said that, LangChain can seamlessly integrate with a myriad of other data sources, be they textual, tabular, or even multimedia, expanding its use-case horizons.Setting Up and Using LangChain for Private Question AnsweringStep 1: Installing The Necessary LibrariesAllows the first step of building any machine learning model, we will have to set up our environment, making sure!pip install langchain !pip install openai !pip install pypdf !pip install tiktoken !pip install faiss-gpuStep 2: Importing Necessary Librariesfrom langchain.llms import OpenAI from langchain.chat_models import ChatOpenAI from langchain.document_loaders import PyPDFLoader from langchain.vectorstores import FAISS from langchain.embeddings.openai import OpenAIEmbeddings from langchain.chains import LLMChain from langchain.prompts import PromptTemplate from langchain.chains.qa_with_sources import load_qa_with_sources_chain import openai import osStep 3: Importing OpenAI API Keyos.environ['OPENAI_API_KEY'] = "Insert your OpenAI key here"Step 4: Loading Our Data SetLandChain offers the capability to load data in various formats. In this article, we'll focus on loading data in PDF format but will also touch upon other popular formats such as CSV and File Directory. For details on other file formats, please refer to the LangChain Documentation.Loading PDF DataWe've compiled the Giskard AI tool's documentation into a PDF and subsequently partitioned the data.loader = PyPDFLoader("/kaggle/input/giskard-documentation/Giskard Documentation.pdf")pages = loader.load_and_split()Below are the code snippets if you prefer to work with either CSV or File Directory file formats.Loading CSV Datafrom langchain.document_loaders.csv_loader import CSVLoader loader = CSVLoader(“Insert the path to your CSV dataset here”) data = loader.load()Loading File Directory Datafrom langchain.document_loaders import DirectoryLoader loader = DirectoryLoader('../', glob="**/*.md") docs = loader.load()Step 5: Indexing The DatasetWe will be creating an index using FAISS (Facebook AI Similarity Search), which is a library developed by Facebook AI for efficiently searching similarities in large datasets, especially used with vectors from machine learning models.We will be converting those documents into vector embeddings using OpenAIEmbeddings(). This indexed data can then be used for efficient similarity searches later on.faiss_index = FAISS.from_documents(pages, OpenAIEmbeddings())Here are some alternative indexing options you might consider.Indexing using Pineconeimport osimport pineconefrom langchain.schema import Documentfrom langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.vectorstores import Pinecone pinecone.init( api_key=os.environ["PINECONE_API_KEY"], environment=os.environ["PINECONE_ENV"]) embeddings = OpenAIEmbeddings() pinecone.create_index("langchain-self-retriever-demo", dimension=1536)Indexing using Chromaimport os import getpass from langchain.schema import Document from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import Chroma os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:") embeddings = OpenAIEmbeddings() vectorstore = Chroma.from_documents(docs, embeddings)Step 6: Asking The Model Some QuestionsThere are multiple methods by which we can retrieve our data from our model.Similarity SearchIn the context of large language models (LLMs) and natural language processing, similarity search is often about finding sentences, paragraphs, or documents that are semantically similar to a given sentence or piece of text.query = "What is Giskard?" docs = faiss_index.similarity_search(query) print(docs[0].page_content)Similarity Search Output: Why Giskard?Giskard is an open-source testing framework dedicated to ML models, covering any Python model, from tabular to LLMs.Testing Machine Learning applications can be tedious. Since ML models depend on data, testing scenarios depend on the domain speciﬁcities and are often inﬁnite. Where to start testing? Which tests to implement? What issues to cover? How to implement the tests?At Giskard, we believe that Machine Learning needs its own testing framework. Created by ML engineers for ML engineers, Giskard enables you to:Scan your model to ﬁnd dozens of hidden vulnerabilities: The Giskard scan automatically detects vulnerability issues such as performance bias, data leakage, unrobustness, spurious correlation, overconﬁdence, underconﬁdence, unethical issue, etc. Instantaneously generate domain-speciﬁc tests: Giskard automatically generates relevant tests based on the vulnerabilities detected by the scan. You can easily customize the tests depending on your use case by deﬁning domain-speciﬁc data slicers and transformers as ﬁxtures of your test suites.Leverage the Quality Assurance best practices of the open-source community: The Giskard catalog enables you to easily contribute and load data slicing & transformation functions such as AI-based detectors (toxicity, hate, etc.), generators (typos, paraphraser, etc.), or evaluators. Inspired by the Hugging Face philosophy, the aim of Giskard is to become.LLM Chainsmodel = OpenAI(model_name="gpt-3.5-turbo") my_chain = load_qa_with_sources_chain(model, chain_type="refine") query = "What is Giskard?" documents = faiss_index.similarity_search(query) result = my_chain({"input_documents": pages, "question": query})LLM Chain Output:Based on the additional context provided, Giskard is a Python package or library that provides tools for wrapping machine learning models, testing, debugging, and inspection. It supports models from various machine learning libraries such as HuggingFace, PyTorch, TensorFlow, or Scikit-learn. Giskard can handle classification, regression, and text generation tasks using tabular or text data.One notable feature of Giskard is the ability to upload models to the Giskard server. Uploading models to the server allows users to compare their models with others using a test suite, gather feedback from colleagues, debug models effectively in case of test failures, and develop new tests incorporating additional domain knowledge. This feature enables collaborative model evaluation and improvement. It is worth highlighting that the provided context mentions additional ML libraries, including Langchain, API REST, and LightGBM, but their specific integration with Giskard is not clearly defined.Sources:Giskard Documentation.pdfAPI Reference (for Dataset methods)Kaggle: /kaggle/input/giskard-documentation/Giskard Documentation.pdfConclusionLangChain effectively bridges the gap between advanced language models and the need for data privacy. Throughout this article, we have highlighted its capability to train models on private data, ensuring both insightful results and data security. One thing is for sure though, as AI continues to grow, tools like LangChain will be essential for balancing innovation with user trust.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium

0
0
180

article-image-build-an-ai-based-personal-financial-advisor-with-langchain

Louis Owen

09 Oct 2023

11 min read

Build an AI-based Personal Financial Advisor with LangChain

Louis Owen

09 Oct 2023

11 min read

0
0
2303

article-image-kickstarting-your-journey-with-azure-openai

Shankar Narayanan

06 Oct 2023

8 min read

Kickstarting your journey with Azure OpenAI

Shankar Narayanan

06 Oct 2023

8 min read

0
0
124

article-image-build-a-language-converter-using-google-bard

Aryan Irani

06 Oct 2023

7 min read

Build a Language Converter using Google Bard

Aryan Irani

06 Oct 2023

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionIn this blog, we will be taking a look at building a language converter inside of a Google Sheet using Google Bard. We are going to achieve this using the PaLM API and Google Apps Script.We are going to use a custom function inside which we pass the origin language of the sentence, followed by the target language and the sentence you want to convert. In return, you will get the converted sentence using Google Bard.Sample Google SheetFor this blog, I will be using a very simple Google Sheet that contains the following columns:Sentence that has to be convertedOrigin language of the sentenceTarget Language of the sentenceConverted SentenceIf you want to work with the Google Sheet, click here. Once you make a copy of the Google Sheet you have to go ahead and change the API key in the Google Apps Script code.Step 1: Get the API keyCurrently, PaLM API hasn’t been released for public use but to access it before everybody does, you can apply for the waitlist by clicking here. If you want to know more about the process of applying for MakerSuite and PaLM API, you can check the YouTube tutorial below.Once you have access, to get the API key, we have to go to MakerSuite and go to the Get API key section. To get the API key, follow these steps:Go to MakerSuite or click here.On opening the MakerSuite you will see something like this3. To get the API key go ahead and click on Get API key on the left side of the page.4. On clicking the Get API key, you will see something like this where you can create your API key.5. To create the API key go ahead and click on Create API key in the new project.On clicking Create API Key, in a few seconds, you will be able to copy the API key.Step 2: Write the Automation ScriptWhile you are in the Google Sheet, let’s open up the Script Editor to write some Google Apps Script. To open the Script Editor, follow these steps:1. Click on Extensions and open the Script Editor.2. This brings up the Script Editor as shown below.We have reached the script editor lets code.Now that we have the Google Sheet and the API key ready, lets go ahead and write the Google Apps Script to integrate the custom function inside the Google Sheet.function BARD(sentence,origin_language,target_lanugage) { var apiKey = "your_api_key"; var apiUrl = "https://generativelanguage.googleapis.com/v1beta2/models/text-bison-001:generateText"; We start out by opening a new function BARD() inside which we will declare the API key that we just copied. After declaring the API key we go ahead and declare the API endpoint that is provided in the PaLM API documentation. You can check out the documentation by checking out the link given below.We are going to be receiving the prompt from the Google Sheet from the BARD function that we just created.Generative Language API | PaLM API | Generative AI for DevelopersThe PaLM API allows developers to build generative AI applications using the PaLM model. Large Language Models (LLMs)…developers.generativeai.googlevar url = apiUrl + "?key=" + apiKey; var headers = { "Content-Type": "application/json" };Here we create a new variable called url inside which we combine the API URL and the API key, resulting in a complete URL that includes the API key as a parameter. The headers specify the type of data that will be sent in the request which in this case is “application/json”.var prompt = { 'text': "Convert this sentence"+ sentence + "from"+origin_language + "to"+target_lanugage } var requestBody = { "prompt": prompt }Now we come to the most important part of the code which is declaring the prompt. For this blog, we will be designing the prompt in such a way that we get back only the converted sentence. This prompt will accept the variables from the Google Sheet and in return will give the converted sentence.Now that we have the prompt ready, we go ahead and create an object that will contain this prompt that will be sent in the request to the API. var options = { "method": "POST", "headers": headers, "payload": JSON.stringify(requestBody) };Now that we have everything ready, it's time to define the parameters for the HTTP request that will be sent to the PaLM API endpoint. We start out by declaring the method parameter which is set to POST which indicates that the request will be sending data to the API.The headers parameter contains the header object that we declared a while back. Finally, the payload parameter is used to specify the data that will be sent in the request.These options are now passed as an argument to the UrlFetchApp.fetch function which sends the request to the PaLM API endpoint, and returns the response that contains the AI generated text. var response = UrlFetchApp.fetch(url, options); var data = JSON.parse(response.getContentText()); var output = data.candidates[0].output; Logger.log(output); return output;In this case, we just have to pass the url and options variable inside the UrlFetchApp.fetch function. Now that we have sent a request to the PaLM API endpoint we get a response back. In order to get an exact response we are going to be parsing the data.The getContentText() function is used to extract the text content from the response object. Since the response is in JSON format, we use the JSON.parse function to convert the JSON string into an object.The parsed data is then passed to the final variable output, inside which we get the first response out of multiple other drafts that Bard generates for us. On getting the first response, we return the output back to the Google Sheet.Our code is complete and good to go.Step 3: Check the outputIt's time to check the output and see if the code is working according to what we expected. To do that go ahead and save your code and run the BARD() function.On running the code, let's go back to the Google Sheet, use the custom function, and pass the prompt inside it.Here I have passed the original sentence, followed by the Origin Language and the Target Language.On successful execution, we can see that Google Bard has successfully converted the sentences using the PaLM API and Google Apps Script.ConclusionThis is just another interesting example of how we can Integrate Google Bard into Google Workspace using Google Apps Script and the PaLM API. I hope you have understood how to use the PaLM API and Google Apps Script to create a custom function that acts as a Language converter. You can get the code from the GitHub link below.Google-Apps-Script/Bard_Lang.js at master · aryanirani123/Google-Apps-ScriptCollection of Google Apps Script Automation scripts written and compiled by Aryan Irani. …github.comFeel free to reach out if you have any issues/feedback at aryanirani123@gmail.com.Author BioAryan Irani is a Google Developer Expert for Google Workspace. He is a writer and content creator who has been working in the Google Workspace domain for three years. He has extensive experience in the area, having published 100 technical articles on Google Apps Script, Google Workspace Tools, and Google APIs.

0
0
180

article-image-google-bard-everything-you-need-to-know-to-get-started

Sangita Mahala

05 Oct 2023

6 min read

Google Bard: Everything You Need to Know to Get Started

Sangita Mahala

05 Oct 2023

6 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionGoogle Bard, a generative AI conversational chatbot. Initially from the LaMDA family of large language models and later the PaLM LLMs. It will follow your instructions and complete your requests in a better way. Bard uses its knowledge to answer your questions in an informative manner. It can generate different creative text formats, like poems, code, scripts, emails, letters, etc. Currently, it is available in 238 countries and 46 languages.Bard is a powerful tool which can be used for many different things, including:Writing and editing contentResearching and learning new thingsTranslating languagesGenerating new creative ideasAnswering questionsWhat is Google Bard and How does it work?Bard is a large language model, commonly referred to as a chatbot or conversational AI, that has been programmed to be extensive and informative. It is able to communicate and generate human-like text in response to a wide range of prompts and questions.Bard operates by using a method known as deep learning. Artificial neural networks are used in deep learning to learn from data. Deep learning is a subset of machine learning. The structure and operation of the human brain served as the inspiration for neural networks, which are capable of learning intricate patterns from data.The following illustrates how Bard works:You enter a command or query into Bard.The input or query is processed by Bard's neural network, which then produces a response.The reply from Bard is then shown to you.How to get started with Bard?It’s pretty simple and easy to get started using Google Bard. The following steps will let you know how you will be able to start your journey in Google Bard.Step-1Go to the Google Bard website by clicking this link: https://bard.google.com/Step-2Go to the top right corner of your screen and then click on the “Sign in” button.Step-3Once you are signed in, you can start typing your prompts or queries into the text box at the bottom of the screen.Step-4Your prompt or query will trigger Bard to produce an answer, which you may then read and evaluate.For Example:You can provide the prompt to Google Bard such as “Write a 'Hello, World!' program in the Rust programming language.”Prompt:Common Bard commands and how to use themGoogle Bard does not have any specific commands, but you can use certain keywords and phrases to get Bard to perform certain tasks. For example, you can use the following keywords and phrases to get Bard to:Create many creative text forms, such as "Write a script for...", "Write a poem about...", and "Write a code for...".Bard will perfectly answer your questions in a comprehensive and informative way: "What is the capital of India?", "How do I build a website?", "What are the benefits of using Bard?" Examples of what Bard can do and how to use it for specific tasksHere are some examples of what Bard can do and how to use it for specific tasks:Generate creative text formats: Bard allows you to generate a variety of unique text formats, including code, poems, emails, and letters. You have to simply input the required format, followed by a question or query, to get the things done. For example, to generate an email to your manager requesting a raise in salary, you would type “Write an email to your manager asking for a raise."Prompt:Answer your questions in a comprehensive and informative way: No matter how difficult or complex your query might be, Bard can help you to find the solution. So you have to simply enter your query into the text box, and Bard will generate a response likewise. For example, to ask Bard what is the National Flower of India is, you would type "What is the National Flower of India?".Prompt: Translate languages: Bard will allow to convert text between different languages. To do this, simply type the text that you want to translate into the text box, followed by the provided language from your side. For example, to translate the sentence: I am going to the store to buy some groceries into Hindi, you would type "I am going to the store to buy some groceries to Hindi".Prompt:How Bard Can Be Used for Different PurposesA writer can use Bard to generate new concepts for fresh stories or articles or to summarize research results.Students can utilize Bard to create essays and reports, get assistance with their assignments, and master new topics.A business owner can use Bard to produce marketing materials, develop chatbots for customer support, or perform data analysis.Software developers can use Bard to produce code, debug applications, or find solutions for technical issues.The future of Google BardGoogle Bard is probably going to get increasingly more capable and adaptable as it keeps developing and acquiring knowledge. It might be utilized to produce new works of art and entertainment, advance scientific research, and provide solutions to some of the most critical global problems.It's also crucial to remember that Google Bard is not the only significant language model being developed. Similar technologies are being created as well by various other companies, such as Microsoft and OpenAI. This competition is likely to drive innovation and lead to even more powerful and sophisticated language models in the future.Overall, the future of Google Bard and other substantial language models seems quite bright overall. These innovations have the power to completely transform the way we study, work, and produce. There is no doubt that these technologies have the potential to improve the world, but it is necessary to use them effectively.ConclusionGoogle Bard is a powerful AI tool that has the potential to be very beneficial, including writing and editing content, researching and learning new things, translating languages, generating new creative ideas, and answering questions. Being more productive and saving time are two of the greatest advantages of utilizing Google Bard. This can free up your time so that you can concentrate on other things, including developing new ideas or enhancing your skills. Bard can assist you in finding and understanding the data quickly and easily because it has access to a wide amount of information. It has the potential to be a game-changer for many people. If you are looking for a way to be more productive, I encourage you to try using Bard.Author BioSangita Mahala is a passionate IT professional with an outstanding track record, having an impressive array of certifications, including 12x Microsoft, 11x GCP, 2x Oracle, and LinkedIn Marketing Insider Certified. She is a Google Crowdsource Influencer and IBM champion learner gold. She also possesses extensive experience as a technical content writer and accomplished book blogger. She is always Committed to staying with emerging trends and technologies in the IT sector.

0
0
145

article-image-build-a-clone-of-yourself-with-large-language-models-llms

Louis Owen

05 Oct 2023

13 min read

Build a Clone of Yourself with Large Language Models (LLMs)

Louis Owen

05 Oct 2023

13 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Introduction"White Christmas," a standout sci-fi episode from the Black Mirror series, serves as a major source of inspiration for this article. In this episode, we witness a captivating glimpse into a potential future application of Artificial Intelligence (AI), particularly considering the timeframe when the show was released. The episode introduces us to "Cookies," digital replicas of individuals that piqued the author's interest.A "Cookie" is a device surgically implanted beneath a person's skull, meticulously replicating their consciousness over the span of a week. Subsequently, this replicated consciousness is extracted and transferred into a larger, egg-shaped device, which can be connected to a computer or tablet for various purposes.Back when this episode was made available to the public in approximately 2014, the concept seemed far-fetched, squarely in the realm of science fiction. However, what if I were to tell you that we now have the potential to create our own clones akin to the "Cookies" using Large Language Models (LLMs)? You might wonder how this is possible, given that LLMs primarily operate with text. Fortunately, we can bridge this gap by extending the capabilities of LLMs through the integration of a Text-to-Speech module.There are two primary approaches to harnessing LLMs for this endeavor: fine-tuning your own LLM and utilizing a general-purpose LLM (whether open-source or closed-source). Fine-tuning, though effective, demands a considerable investment of time and resources. It involves tasks such as gathering and preparing training data, fine-tuning the model through multiple iterations until it meets our criteria, and ultimately deploying the final model into production. Conversely, general LLMs have limitations on the length of input prompts (unless you are using an exceptionally long-context model like Antropic's Claude). Moreover, to fully leverage the capabilities of general LLMs, effective prompt engineering is essential. However, when we compare these two approaches, utilizing general LLMs emerges as the easier path for creating a Proof of Concept (POC). If the aim is to develop a highly refined model capable of replicating ourselves convincingly, then fine-tuning becomes the preferred route.In the course of this article, we will explore how to harness one of the general LLMs provided by AI21Labs and delve into the art of creating a digital clone of oneself through prompt engineering. While we will touch upon the basics of fine-tuning, we will not delve deeply into this process, as it warrants a separate article of its own.Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to build a clone of yourself with LLM!Glimpse of Fine-tuning your LLMAs mentioned earlier, we won't delve into the intricate details of fine-tuning a Large Language Model (LLM) to achieve our objective of building a digital clone of ourselves. Nevertheless, in this section, we'll provide a high-level overview of the steps involved in creating such a clone through fine-tuning an LLM.1. Data CollectionThe journey begins with gathering all the relevant data needed for fine-tuning the LLM. This dataset should ideally comprise our historical conversational data, which can be sourced from various platforms like WhatsApp, Telegram, LINE, email, and more. It's essential to cast a wide net and collect as much pertinent data as possible to ensure the model's accuracy in replicating our conversational style and nuances.2. Data PreparationOnce the dataset is amassed, the next crucial step is data preparation. This phase involves several tasks:● Data Formatting: Converting the collected data into the required format compatible with the fine-tuning process.● Noise Removal: Cleaning the dataset by eliminating any irrelevant or noisy information that could negatively impact the model's training.● Resampling: In some cases, it may be necessary to resample the data to ensure a balanced and representative dataset for training.3. Model TrainingWith the data prepared and in order, it's time to proceed to the model training phase. Modern advances in deep learning have made it possible to train LLMs on consumer-grade GPUs, offering accessibility and affordability, such as via QLoRA. During this stage, the LLM learns from the provided dataset, adapting its language generation capabilities to mimic our conversational style and patterns.4. Iterative RefinementFine-tuning an LLM is an iterative process. After training the initial model, we need to evaluate its performance. This evaluation may reveal areas for improvement. It's common to iterate between model training and evaluation, making incremental adjustments to enhance the model's accuracy and fluency.5. Model EvaluationThe evaluation phase is critical in assessing the model's ability to replicate our conversational style and content accurately. Evaluations may include measuring the model's response coherence, relevance, and similarity to our past conversations.6. DeploymentOnce we've achieved a satisfactory level of performance through multiple iterations, the next step is deploying the fine-tuned model. Deploying an LLM is a complex task that involves setting up infrastructure to host the model and handle user requests. An example of a robust inference server suitable for this purpose is Text Generation Inference. You can refer to my other article for this. Deploying the model effectively ensures that it can be accessed and used in various applications.Building the Clone of Yourself with General LLMLet’s start learning how to build the clone of yourself with general LLM through prompt engineering! In this article, we’ll use j2-ultra, the biggest and most powerful model provided by AI21Labs. Note that AI21Labs gives us a free trial for 3 months with $90 credits. These free credits is very useful for us to build a POC for this project. The first thing we need to do is to create the prompt and test it in the playground. To do this, you can log in with your AI21Labs account and go to the AI21Studio. If you don’t have an account yet, you can create one by just following the steps provided on the web. It’s very straightforward. Once you’re on the Studio page, go to the Foundation Models page and choose the j2-ultra model. Note that there are three foundation models provided by AI21Labs. However, in this article, we’ll use j2-ultra which is the best one.Once we’re in the playground, we can experiment with the prompt that we want to try. Here, I provided an example prompt that you can start with. What you need to do is to adjust the prompt with your own information. Louis is an AI Research Engineer/Data Scientist from Indonesia. He is a continuous learner, friendly, and always eager to share his knowledge with his friends. Important information to follow: - His hobbies are writing articles and watching movies - He has 3 main strengths: strong-willed, fast-learner, and effective. - He is currently based in Bandung, Indonesia. - He prefers to Work From Home (WFH) compared to Work From Office - He is currently working as an NLP Engineer at Yellow.ai. - He pursued a Mathematics major in Bandung Institute of Technology. - The reason why he loves NLP is that he found it interesting where one can extract insights from the very unstructured text data. - He learns Data Science through online courses, competitions, internship, and side-projects. - For technical skills, he is familiar with Python, Tableau, SQL, R, Google Big Query, Git, Docker, Design Thinking, cloud service (AWS EC2), Google Data Studio, Matlab, SPSS - He is a Vegan since 2007! He loves all vegan foods except tomatoes. User: Hi, what's up? Louis: Hey, doing good here! How are u? User: All's good. Just wondering, I knew that you're into NLP, can you please give me some recommendation on how to learn? Louis: Sure thing man! I personally learned Data Science through online courses, competitions, internship, and side-projects. However, my top recommendation is to create your own personal projects and joining competitions. You can learn a lot from those! User: Nice. What personal projects to start? Louis: You can start with the topic that you're really interested at. For example, if you're interested at soccer, you can maybe create a data analysis on how one soccer team strategy can gives a better chance for them to win their matches. User: Awesome! thanks man, will ping you again if I have any doubts. Is it okay? Louis: Absolutely! Feel free, good day! ## Louis is an AI Research Engineer/Data Scientist from Indonesia. He is a continuous learner, friendly, and always eager to share his knowledge with his friends. Important information to follow: - His hobbies are writing articles and watching movies - He has 3 main strengths: strong-willed, fast-learner, and effective. - He is currently based in Bandung, Indonesia. - He prefers to Work From Home (WFH) compared to Work From Office - He is currently working as an NLP Engineer at Yellow.ai. - He pursued a Mathematics major in Bandung Institute of Technology. - The reason why he loves NLP is that he found it interesting where one can extract insights from the very unstructured text data. - He learns Data Science through online courses, competitions, internship, and side-projects. - For technical skills, he is familiar with Python, Tableau, SQL, R, Google Big Query, Git, Docker, Design Thinking, cloud service (AWS EC2), Google Data Studio, Matlab, SPSS - He is a Vegan since 2007! He loves all vegan foods except tomatoes. User: Hey, I stumbled upon your IG and realized that you're a Vegan?! Louis: Oh yeah man. I'm a Vegan since 2007! User: Awesome! Out of curiosity, what made you to decide become a Vegan? Louis: Oh mostly it's because of my family beliefs and also to help save the planet. User: Got it. Cool! Anyway, what are you up to lately? Louis: Lately I spend my time to work on my full-time job and also writes articles in my spare time. User: Cool man, keep up the good work! ## Louis is an AI Research Engineer/Data Scientist from Indonesia. He is a continuous learner, friendly, and always eager to share his knowledge with his friends. Important information to follow: - His hobbies are writing articles and watching movies - He has 3 main strengths: strong-willed, fast-learner, and effective. - He is currently based in Bandung, Indonesia. - He prefers to Work From Home (WFH) compared to Work From Office - He is currently working as an NLP Engineer at Yellow.ai. - He pursued a Mathematics major in Bandung Institute of Technology. - The reason why he loves NLP is that he found it interesting where one can extract insights from the very unstructured text data. - He learns Data Science through online courses, competitions, internship, and side-projects. - For technical skills, he is familiar with Python, Tableau, SQL, R, Google Big Query, Git, Docker, Design Thinking, cloud service (AWS EC2), Google Data Studio, Matlab, SPSS - He is a Vegan since 2007! He loves all vegan foods except tomatoes. User: Hey! Louis:The way this prompt works is by giving several few examples commonly called few-shot prompting. Using this prompt is very straightforward, we just need to append the User message at the end of the prompt and the model will generate the answer replicating yourself. Once the answer is generated, we need to put it back to the prompt and wait for the user’s reply. Once the user has replied to the generated response, we also need to put it back to the prompt. Since this is a looping procedure, it’s better to create a function to do all of this. The following is an example of the function that can handle the conversation along with the code to call the AI21Labs model from Python.import ai21 ai21.api_key = 'YOUR_API_KEY' def talk_to_your_clone(prompt): while True: user_message = input() prompt += "User: " + user_message + "\n" response = ai21.Completion.execute( model="j2-ultra", prompt=prompt, numResults=1, maxTokens=100, temperature=0.5, topKReturn=0, topP=0.9, stopSequences=["##","User:"], ) prompt += "Louis: " + response + "\n" print(response)ConclusionCongratulations on keeping up to this point! Throughout this article, you have learned ways to create a clone of yourself, detailed steps on how to create it with general LLM provided by AI21Labs, also working code that you can utilize to customize it for your own needs. Hope the best for your experiment in creating a clone of yourself and see you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects. Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.

0
0
190

article-image-getting-started-with-microsofts-guidance

Prakhar Mishra

26 Sep 2023

8 min read

Getting Started with Microsoft’s Guidance

Prakhar Mishra

26 Sep 2023

8 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionThe emergence of a massive language model is a watershed moment in the field of artificial intelligence (AI) and natural language processing (NLP). Because of their extraordinary capacity to write human-like text and perform a range of language-related tasks, these models, which are based on deep learning techniques, have earned considerable interest and acceptance. This field has undergone significant scientific developments in recent years. Researchers all over the world have been developing better and more domain-specific LLMs to meet the needs of various use cases.Large Language Models (LLMs) such as GPT-3 and its descendants, like any technology or strategy, have downsides and limits. And, in order to use LLMs properly, ethically, and to their maximum capacity, it is critical to grasp their downsides and limitations. Unlike large language models such as GPT-4, which can follow the majority of commands. Language models that are not equivalently large enough (such as GPT-2, LLaMa, and its derivatives) frequently suffer from the difficulty of not following instructions adequately, particularly the part of instruction that asks for generating output in a specific structure. This causes a bottleneck when constructing a pipeline in which the output of LLMs is fed to other downstream functions.Introducing Guidance - an effective and efficient means of controlling modern language models compared to conventional prompting methods. It supports both open (LLaMa, GPT-2, Alpaca, and so on) and closed LLMs (ChatGPT, GPT-4, and so on). It can be considered as a part of a larger ecosystem of tools for expanding the capabilities of language models.Guidance uses Handlebars - a templating language. Handlebars allow us to build semantic templates effectively by compiling templates into JavaScript functions. Making it’s execution faster than other templating engines. Guidance also integrates well with Jsonformer - a bulletproof way to generate structured JSON from language models. Here’s a detailed notebook on the same. Also, in case you were to use OpenAI from Azure AI then Guidance has you covered - notebook.Moving on to some of the outstanding features that Guidance offers. Feel free to check out the entire list of features.Features1. Guidance Acceleration - This addition significantly improves inference performance by efficiently utilizing the Key/Value caches as we proceed through the prompt by keeping a session state with the LLM inference. Benchmarking revealed a 50% reduction in runtime when compared to standard prompting approaches. Here’s the link to one of the benchmarking exercises. The below image shows an example of generating a character profile of an RPG game in JSON format. The green highlights are the generations done by the model, whereas the blue and no highlights are the ones that are copied as it is from the input prompt, unlike the traditional method that tries to generate every bit of it.SourceNote: As of now, the Guidance Acceleration feature is implemented for open LLMs. We can soon expect to see if working with closed LLMs as well.2. Token Healing - This feature attempts to correct tokenization artifacts that commonly occur at the border between the end of a prompt and the start of a group of generated tokens.For example - If we ask LLM to auto-complete a URL with the below-mentioned Input, it’s likely to produce the shown output. Apart from the obvious limitation that the URL might not be valid. I'd like to draw your attention to the extra space it creates (highlighted in red). Such considerations make it difficult to construct a dependable parsing function and robustly absorb its result into subsequent phases.Input: “The link is <a href=http:”Actual Output: “The link is <a href=http: //www.google.com/search?q”Expected Output: “The link is <a href=http://www.google.com/search?q” This is the exact bucket of problems that Token Healing tries to solve using the backtracking method. Feel free to check out this jupyter notebook for more examples.3. Guaranteed Output Structure - Large language models are fantastic at producing useful outputs, but not so much at producing outputs in a specified format (especially open-source ones like LLaMa, GPT-2, and so on). When we want to use the output of a language model as input to another system, this is frequently an issue. With Handlebars, guidance guarantees the output format to be the same as what was being asked for.Let’s now see Guidance in action -InstallationInstalling guidance is a breeze, just do a pip :$ pip install guidanceAssume we are now creating a product description for an e-commerce website. Here's how the traditional generation compares to the guidance generation. Feel free to play with this colab notebook with both the below examples.Traditional GenerationInput:Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The below shown JSON is the structure of the description with constraints for each of the attributes enclosed within < > brackets. Please follow the mentioned constraint and format diligently. { prod_id: <numeric value of 5 digits>, prod_name: <name starts with the prefix 'p_'>, prod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } The product description isOutput:Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The below shown JSON is the structure of the description with constraints for each of the attributes enclosed within < > brackets. Please follow the mentioned constraint and format diligently. { prod_id: <numeric value of 5 digits>, prod_name: <name starts with the prefix 'p_'>, prod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } The product description is { resentprod_id: <numeric value of 5 digits>, resentprod_name: <name begins with the prefix 'p_'>, resentprod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } In the above example, the product description has 5 constraint fields and 5 attribute fields. The constraints are as follows: resentprod_id: - value of 5 digits, resentprod_name: - name of the product, resentprod_price: - price of the product, resentprod_price_suffix: - suffix of the product price, resentprod_id: - the product id, resentpro diabetic_id: value of 4 digits, resentprod_ astronomer_id: - value of 4 digits, resentprod_ star_id: - value of 4 digits, resentprod_is_generic: - if the product is generic and not the generic type, resentprod_type: - the type of the product, resentprod_is_generic_typeHere’s the code for the above example with GPT-2 language model -``` from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("gpt2-large") model = AutoModelForCausalLM.from_pretrained("gpt2-large") inputs = tokenizer(Input, return_tensors="pt") tokens = model.generate( **inputs, max_new_tokens=256, temperature=0.7, do_sample=True, )Output:tokenizer.decode(tokens[0], skip_special_tokens=True)) ```Guidance GenerationInput w/ code:guidance.llm = guidance.llms.Transformers("gpt-large") # define the prompt program = guidance("""Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The following is the format ```json { "prod_id": "{{gen 'id' pattern='[0-9]{5}' stop=','}}", "prod_name": "{{gen 'name' pattern='p_[A-Za-z]+' stop=','}}", "prod_price": "{{gen 'price' pattern='\b([1-9]|1[0-6])\b\$' stop=','}}" }```""") # execute the prompt Output = program()Output:Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of a fixed set of fields to be filled in the JSON. The following is the format```json { "prod_id": "11231", "prod_name": "p_pizzas", "prod_price": "11$" }```As seen in the preceding instances, with guidance, we can be certain that the output format will be followed within the given restrictions no matter how many times we execute the identical prompt. This capability makes it an excellent choice for constructing any dependable and strong multi-step LLM pipeline.I hope this overview of Guidance has helped you realize the value it may provide to your daily prompt development cycle. Also, here’s a consolidated notebook showcasing all the features of Guidance, feel free to check it out.Author BioPrakhar has a Master’s in Data Science with over 4 years of experience in industry across various sectors like Retail, Healthcare, Consumer Analytics, etc. His research interests include Natural Language Understanding and generation, and has published multiple research papers in reputed international publications in the relevant domain. Feel free to reach out to him on LinkedIn

0
0
95

article-image-optimized-llm-serving-with-tgi

Louis Owen

25 Sep 2023

7 min read

Optimized LLM Serving with TGI

Louis Owen

25 Sep 2023

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionThe release of LLaMA-2 LLM to the public has been a tremendous contribution by Meta to the open-source community. It’s very powerful that many developers are fine-tuning it for so many different use cases. Closed-source LLMs such as GPT3.5, GPT4, or Claude are really convenient to utilize since we just need to send API requests to power up our application. On the other hand, utilizing open-source LLMs such as LLaMA-2 in production is not an easy task. Although there are several algorithms that support deploying an LLM with CPU, most of the time we still need to utilize GPU for better throughput.At a high level, there are two main things that need to be taken care of when deploying your LLM in production: memory and throughput. LLM is very big in size, the smallest version of LLaMA-2 has 7 billion parameters, which requires around 28GB GPU RAM for loading the model. Google Colab offers free NVIDIA T4 GPU to be used with 16GB memory, meaning that we can’t load even the smallest version of LLaMA-2 freely using the GPU provided by Google Colab.There are two popular ways to solve this memory issue: half-precision and 4/8-bit quantization. Half-precision is a very simple optimization method that we can adopt. In PyTorch, we just need to add the following line and we can load the 7B model by using only around 13GB GPU RAM. Basically, half-precision means that we’re representing the weights of the model with 16-bit floating points (fp16) instead of 32-bit floating points (fp32).torch.set_default_tensor_type(torch.cuda.HalfTensor)Another way to solve the memory issue is by performing 4/8-bits quantization. There are two quantization algorithms that are widely adopted by the developers: GPT-Q and BitsandBytes. These two algorithms are also supported within the `transformers` package. Quantization is a way to reduce the model size by representing the model weights in lower precision data types, such as 8-bit integers (int8) or 4-bit integers (int4) instead of 32-bit floating point (fp32) or 16-bit (fp16).As for comparison, after applying the 4-bit quantization with the GPT-Q algorithm, we can load the 7B model by using only around 4GB GPU RAM! This is indeed a huge drop - from 28GB to 4GB!Once we’re able to load the model, the simplest way to serve the model is by using the native HuggingFace workflow along with Flask or FastAPI. However, serving our model with this simple solution is not scalable and reliable enough to be used in production since it can’t handle parallel requests decently. This is related to the throughput problem mentioned earlier. There are many ways to solve this throughput problem, starting from continuous batching, tensor parallelism, prompt caching, paged attention, prefill parallel computing, KV cache, and many more. Discussing each of the algorithms will result in a very long article. However, luckily there’s an inference serving library for LLM that applies all of these techniques to ensure the reliability of serving our LLM in production. This library is called Text Generation Inference (TGI).Throughout this article, we’ll discuss in more detail what TGI is and how to utilize it to serve your LLM. We’ll see what are the top features of TGI and the step-by-step tutorial on how to spin up TGI as the inference server.Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to utilize TGI as an inference server!What’s TGI?Text Generation Inference (TGI) is like the `transformers` package for inference servers. In fact, TGI is used in production at HuggingFace to power many applications. It provides a lot of useful features:Provide a straightforward launcher for the most popular Large Language Models.Implement Tensor Parallelism to enhance inference speed across multiple GPUs.Enable token streaming through Server-Sent Events (SSE).Implement continuous batching of incoming requests to boost overall throughput.Enhance transformer code for inference using flash-attention and Paged Attention on leading architectures.Apply quantization techniques with bitsandbytes and GPT-Q.Ensure fast and safe persistence format of weight loading with Safetensors.Incorporate watermarking using A Watermark for Large Language Models.Integrate a logit warper for various operations like temperature scaling, top-p, top-k, and repetition penalty.Implement stop sequences and log probabilities.Ensure production readiness with distributed tracing through Open Telemetry and Prometheus metrics.Offer Custom Prompt Generation, allowing users to easily generate text by providing custom prompts for guiding the model's output.Provide support for fine-tuning, enabling the use of fine-tuned models for specific tasks to achieve higher accuracy and performance.Implement RoPE scaling to extend the context length during inferenceTGI is surely a one-stop solution for serving LLMs. However, one important note about TGI that you need to know is regarding the project license. Starting from v1.0, HuggingFace has updated the license for TGI using the HFOIL 1.0 license. For more details, please refer to this GitHub issue. If you’re using TGI only for research purposes, then there’s nothing to worry about. If you’re using TGI for commercial purposes, please make sure to read the license carefully since there are cases where it can and can’t be used for commercial purposes.How to Use TGI?Using TGI is relatively easy and straightforward. We can use Docker to spin up the server. Note that you need to install the NVIDIA Container Toolkit to use GPU.docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.3 --model-id $modelOnce the server is up, we can easily send API requests to the server using the following command.curl 127.0.0.1:8080/generate \ -X POST \ -d '{"inputs":"What is LLM?","parameters":{"max_new_tokens":120}}' \ -H 'Content-Type: application/json'For more details regarding the API documentation, please refer to this page. Another way to send API requests to the TGI server is by sending it from Python. To do that, we need to install the Python library firstpip install text-generationOnce the library is installed, we can send API requests like the following.from text_generation import Client client = Client("http://127.0.0.1:8080") print(client.generate("What is LLM?", max_new_tokens=120).generated_text) text = "" for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20): if not response.token.special: text += response.token.text print(text)ConclusionCongratulations on keeping up to this point! Throughout this article, you have learned what are the important things to be considered when you’re trying to serve your own LLM. You have also learned about TGI, starting from what is it, what’s the features, and also step-by-step examples on how to get the TGI server up. Hope the best for your LLM deployment journey and see you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects.Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.

0
0
256

article-image-preventing-prompt-attacks-on-llms

Alan Bernardo Palacio

25 Sep 2023

16 min read

Preventing Prompt Attacks on LLMs

Alan Bernardo Palacio

25 Sep 2023

16 min read

3
0
439

article-image-unleashing-the-potential-of-gpus-for-training-llms

Shankar Narayanan

22 Sep 2023

8 min read

Unleashing the Potential of GPUs for Training LLMs

Shankar Narayanan

22 Sep 2023

8 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionThere is no doubt about Language Models being the true marvels in the arena of artificial intelligence. These sophisticated systems have the power to manipulate human language, understand, and even generate with astonishing accuracy.However, one can often complain about the immense computational challenges beyond these medical abilities. For instance, LLM training requires the incorporation of complex mathematical operations along with the processing of vast data. This is where the Graphics Processing Units (GPU) come into play. It serves as the engine that helps to power the language magic.Let me take you through the GPU advancement and innovations to support the Language Model. Parallely, we will explore how Nvidia helps revolutionize the enterprise LLM use cases.Role of GPUs in LLMs To understand the significance of GPU, let us first understand the concept of LLM.What is LLM?LLM or Large Language Models are AI systems that help generate human language. They have various applications, including translation services, sentiment analysis, chatbots, and content generation. Generative Pre-trained Transformer or GPT models, including BERT and GPT3, are popular among every LLM.These models require training, including vast data sets with billions of phrases and words. The model learns to predict while mastering the nuances and structure of language. It is like an intricate puzzle that requires enormous computational power.The need for GPUsThe Graphics Processing Units are specifically designed to undergo parallel processing. This characteristic makes them applicable to train the LLMs. The GPU can tackle thousands of tasks simultaneously, unlike the Central Processing Unit or CPU, which excels at handling sequential tasks.The training of a Large Language Model is like a massive jigsaw puzzle. Each puzzle piece represents a smaller portion of the model's language understanding. Using a CPU could only help one to work on one of these pieces at a simple time. But with GPU, one could work on various pieces parallelly while speeding up the whole process.Besides, GPU offers high computational throughput that one requires for complex mathematical operations. Their competency lies in metric multiplication, one of the fundamentals of neural network training. All these attributes make GPU indispensable for deep learning tasks like LLMs.Here is one of the practical example of how GPU works in LLM training: (Python)import time import torch # Create a large random dataset data = torch.randn(100000, 1000) # Training with CPU start_time = time.time() for _ in range(100): model_output = data.matmul(data) cpu_training_time = time.time() - start_time print(f"CPU Training Time: {cpu_training_time:.2f} seconds") # Training with GPU if torch.cuda.is_available(): data = data.cuda() start_time = time.time() for _ in range(100): model_output = data.matmul(data) gpu_training_time = time.time() - start_time print(f"GPU Training Time: {gpu_training_time:.2f} seconds") else: print("GPU not available.")GPU Advancements and LLMDue to the rising demands of LLMs and AI, GPU technology is evolving rapidly. These advancements, however, play a significant role in constituting the development of sophisticated language models.One such advancement is the increase in GPU memory capacity. Technically, the larger model requires more excellent memory to process massive data sets. Hence, modern GPUs offer substantial memory capacity, allowing researchers to build and train more substantial large language models.One of the critical aspects of training a Large Language Model is its speed. Sometimes, it can take months to prepare and train a large language model. But with the advent of faster GPU, things have changed dramatically. The quicker GPU reduces the training time and accelerates research and development. Apart from that, it also reduces the energy consumption that is often associated with training these large models.Let us explore the memory capacity of the GPU using a code snippet.(Python)import torch # Check GPU memory capacity if torch.cuda.is_available(): gpu_memory = torch.cuda.get_device_properties(0).total_memory print(f"GPU Memory Capacity: {gpu_memory / (1024**3):.2f} GB") else: print("GPU not available.")For the record, Nvidia's Tensor Core technology has been one of the game changers in this aspect. It accelerates one of the core operations in deep learning, i.e., the matrix computation process, allowing the LLMs to train faster and more efficiently.Using matrix Python and PYTorh, you can showcase the speedup with GPU processing.import time import torch # Create large random matrices matrix_size = 1000 cpu_matrix = torch.randn(matrix_size, matrix_size) gpu_matrix = torch.randn(matrix_size, matrix_size).cuda() # Move to GPU # Perform matrix multiplication with CPU start_time = time.time() result_cpu = torch.matmul(cpu_matrix, cpu_matrix) cpu_time = time.time() - start_time # Perform matrix multiplication with GPU start_time = time.time() result_gpu = torch.matmul(gpu_matrix, gpu_matrix) gpu_time = time.time() - start_time print(f"CPU Matrix Multiplication Time: {cpu_time:.4f} seconds") print(f"GPU Matrix Multiplication Time: {gpu_time:.4f} seconds")Nvidia's Contribution to GPU InnovationRegarding GPU innovation, the presence of Nvidia cannot be denied. It has a long-standing commitment to Machine Learning and advancing AI. Hence, it is a natural ally for the large language model community.Here is how Tensor Cores can be utilized with PYTorch.import torch # Enable Tensor Cores (requires a compatible GPU) if torch.cuda.is_available(): torch.backends.cuda.matmul.allow_tf32 = True # Create a tensor x = torch.randn(4096, 4096, device="cuda") # Perform matrix multiplication using Tensor Cores result = torch.matmul(x, x)It is interesting to know that Nvidia's graphics processing unit has powered several breakthroughs in LLM and AI models. BERT and GPT3 are known to harness the computational might of Nvidia's Graphics Processing Unit to achieve remarkable capabilities. Nvidia's dedication to the Artificial Intelligence world encompasses power and efficiency. The design of the graphics processing unit handles every AI workload with optimal performance per watt. It makes Nvidia one of the eco-friendly options for Large Language Model training procedures.As part of AI-focused hardware and architecture, the Tensor Core technology enables efficient and faster deep learning. This technology is instrumental in pushing the boundaries of LLM research.Supporting Enterprise LLM Use-caseThe application of LLM has a far-fetched reach, extending beyond research, labs, and academia. Indeed, they have entered the enterprise world with a bang. From analyzing massive datasets for insights to automating customer support through chatbots, large language models are transforming how businesses operate.Here, the Nvidia Graphics Processing Unit supports the enterprise LLM use cases. Enterprises often require LLM to handle vast amounts of data in real-time. With optimized AI performance and parallel processing power, Nvidia's GPU can provide the needed acceleration for these applications.Various companies across industries are harnessing the Nvidia GPU for developing LLM-based solutions to automate tasks, provide better customer experiences, and enhance productivity. From healthcare organizations analyzing medical records to financial institutions and predicting market trends, Nvidia drives enterprise LLM innovations.ConclusionNvidia continues to be the trailblazer in the captivating journey of training large language models. They are not only the hardware muscle for LLM but constantly innovate to make GPU capable and efficient with each generation.LLM is on the run to become integral to our daily lives. From business solutions to personal assistants, Nvidia's commitment to its GPU innovation ensures more power to the growth of language models. The synergy between AI and Nvidia GPU is constantly shaping the future of enterprise LLM use cases, helping organizations to achieve new heights in innovation and efficiency.Frequently Asked Questions1. How does the GPU accelerate the training process of large language models?The Graphics Processing Unit has parallel processing capabilities to allow the work of multiple tasks simultaneously. Such parallelism helps train Large Language Models by efficiently processing many components in understanding and generating human language.2. How does Nvidia contribute to GPU innovation for significant language and AI models?Nvidia has developed specialized hardware, including Tensor Core, optimized for AI workloads. The graphic processing unit of Nvidia powered numerous AI breakthroughs while providing efficient AI hardware to advance the development of Large Language Models.3. What are the expectations for the future of GPU innovation and launch language model?The future of GPU innovation promises efficient, specialized, and robust hardware tailored to the needs of AI applications and Large Language Models. It will continuously drive the development of sophisticated language models while opening up new possibilities for AI-power solutions.Author BioShankar Narayanan (aka Shanky) has worked on numerous different cloud and emerging technologies like Azure, AWS, Google Cloud, IoT, Industry 4.0, and DevOps to name a few. He has led the architecture design and implementation for many Enterprise customers and helped enable them to break the barrier and take the first step towards a long and successful cloud journey. He was one of the early adopters of Microsoft Azure and Snowflake Data Cloud. Shanky likes to contribute back to the community. He contributes to open source is a frequently sought-after speaker and has delivered numerous talks on Microsoft Technologies and Snowflake. He is recognized as a Data Superhero by Snowflake and SAP Community Topic leader by SAP.

0
0
402

article-image-preparing-high-quality-training-data-for-llm-fine-tuning

Louis Owen

22 Sep 2023

9 min read

Preparing High-Quality Training Data for LLM Fine-Tuning

Louis Owen

22 Sep 2023

9 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge Language Models (LLM) such as GPT3.5, GPT4, or Claude have shown a very good general capability that can be utilized across different tasks, starting from question and answering, coding assistant, marketing campaign, and many more. However, utilizing those general LLMs in production, especially for enterprises, is not an easy task:Those models are very large in terms of the number of parameters - resulting in lower latency compared to the smaller modelWe need to give a very long prompt to achieve good results - again, resulting in lower latencyReliability is not ensured - sometimes they can return the response with an additional prefix, which is really annoying when we expect only a JSON format response for exampleOne of the solutions to solve those problems is fine-tuning a smaller LLM that is specific to the task that we want to handle. For example, we need a QnA model that is able to answer user queries based only on the provided passage. Instead of utilizing those general LLMs, we can just fine-tune a smaller LLM, let’s say a 7 billion parameters LLM to do this specific task. Why to utilize such a giant LLM when our use case is only for QnA?The quality of training data plays a pivotal role in the success of fine-tuning. Garbage in, garbage out holds true in the world of LLMs. When you fine-tune low-quality data, you risk transferring noise, biases, and inaccuracies to your model. Let’s take the newly released paper, Textbooks Are All You Need II: phi-1.5 Technical Report, as an example. Despite the relatively low number of parameters (1.5B), this model is able to perform as well as models five times its size. Additionally, it excels in complex reasoning tasks, surpassing most non-frontier LLMs. What’s their secret sauce? High-quality of training data! The next question is how to prepare the training data for LLM fine-tuning. Moreover, how to prepare high-quality training data? Since fine-tuning needs labeled training data, we need to annotate the unlabeled data that we have. Annotating unlabeled data for classification tasks is much easier compared to more complex tasks like summarization. We just need to give labels based on the available classes in the classification task. If you have deployed an application with those general LLMs before and you have the data coming from real production data, then you can use those data as the training data. Actually, you can also use the response coming from the general LLM as the label directly, no need to do any data annotation anymore. However, what if you don’t have real production data? Then, you can use open-source data or even synthetic data generated by the general LLM as your unlabeled data.Throughout this article, we’ll discuss ways to give high-quality labels to the unlabeled training data, whether it’s annotated by humans or by general LLM. We’ll discuss what are the pros and cons of each of the annotation options. Furthermore, we’ll discuss in more detail how to utilize general LLM to do the annotation task - along with the step-by-step example.Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to prepare high-quality training data for LLM fine-tuning!Human Annotated DataThe first option to create high-quality training data is by using the help of human annotators. In the ideal scenario, well-trained human annotators not only are able to produce high-quality training data but also produce labels that are fully steerable according to the criteria (SOP). However, using humans as the annotators will surely be both time and money-consuming. It is also not scalable since we need to wait for a not short time until we can get the labeled data. Finally, the ideal scenario is also hard to achieve since each of the annotators has their own bias towards a specific domain or even the label quality is most often based on their mood.LLM Annotated DataAnother better option is to utilize general LLM as the annotator. LLMs will always give not only high-quality training data but also full steerability according to the criteria if we do the prompt engineering correctly. It is also cheaper both in terms of time and money. Finally, it’s absolutely scalable and no bias included - except for hallucination.Let’s see how general LLM is usually utilized as an annotator. We’ll use conversation summarization as the task example. The goal of the task is to summarize the given conversation between two users (User A and User B) and return all important information discussed in the conversation in the form of a summarized paragraph.1. Write the initial promptWe need to start from an initial prompt that we will use to generate the summary of the given conversation, or in general, that will be used to generate the label of the given unlabeled sample.You are an expert in summarizing the given conversation between two users. Return all important information discussed in the conversation in the form of summarized paragraph. Conversation: {}2. Evaluate the generated output with a few samples - qualitativelyUsing the initial prompt, we need to evaluate the generated label with few number of samples - let’s say <20 random samples. We need to do this manually by eyeballing through each of the labeled samples and judging qualitatively if they are good enough or not. If the output quality on these few samples is good enough, then we can move into the next step. If not, then revise the prompt and re-evaluate using another <20 random samples. Repeat this process until you are satisfied with the label quality.3. Evaluate the generated output with large samples - quantitativelyOnce we’re confident enough with the generated labels, we can further assess the quality using a more quantitative approach and with a larger number of samples - let’s say >500 samples. For classification tasks, such as sentiment analysis, evaluating the quality of the labels is easy, we just need to compare the generated label with the ground truth that we have, and then we can calculate the precision, recall, or any other classification metrics that we’re interested in. However, for more complex tasks, such as the task in this example, we need a more sophisticated metric. There are a couple of widely used metrics for summarization task - BLEU, ROUGE, and many more. However, those metrics are based on a string-matching algorithm only, which means if the generated summary doesn’t contain the exact word used in the conversation, then this score will suggest that the summary quality is not good. To overcome this, many engineers nowadays are utilizing GPT-4 to assess the label quality. For example, we can write a prompt as follows to assess the quality of the generated labels. Read the given conversation and summary pair. Give the rating quality for the summary with 5 different options: “very bad”, “bad”, “moderate”, “good”, “excellent”. Make sure the summary captures all of the important information in the conversation and does not contain any misinformation. Conversation: {} Summary: {} Rating: Once you get the rating, you can map them into integers - for example, “very bad”:0, “bad”: 1, “moderate”: 2, … Please make sure that the LLM that you’re using as the evaluator is not in the same LLMs family with the LLM that you’re using as the annotator. For example, GPT3.5 and GPT4 are both in the same family since they’re both coming from OpenAI.If the quantitative metric looks decent and meets the criteria, then we can move into the next step. If it’s not, then we can do a subset analysis to see in what kind of cases the label quality is not good. From there, we can revise the prompt and re-evaluate on the same test data. Repeat this step until you’re satisfied enough with the quantitative metric.4. Apply the final prompt to generate labels in the full dataFinally, we can apply the best prompt that we get from all of those iterations and apply it to generate labels in the full unlabeled data that we have.ConclusionCongratulations on keeping up to this point! Throughout this article, you have learned why LLM fine-tuning is important and when to do fine-tuning. You have also learned how to prepare high-quality training data for LLM fine-tuning. Hope the best for your LLM fine-tuning experiments and see you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects.Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.

0
0
99

article-image-building-an-investment-strategy-in-the-era-of-llms

Anshul Saxena

21 Sep 2023

16 min read

Building an Investment Strategy in the Era of LLMs

Anshul Saxena

21 Sep 2023

16 min read

0
0
123

article-image-how-large-language-models-reshape-trading-stats

Anshul Saxena

21 Sep 2023

15 min read

How Large Language Models Reshape Trading Stats

Anshul Saxena

21 Sep 2023

15 min read

0
0
123

How-To Tutorials - LLM

Large Language Model Operations (LLMOps) in Action

Supercharge Your Business Applications with Azure OpenAI

Question Answering in LangChain

Build an AI-based Personal Financial Advisor with LangChain

Kickstarting your journey with Azure OpenAI

Build a Language Converter using Google Bard

Google Bard: Everything You Need to Know to Get Started

Build a Clone of Yourself with Large Language Models (LLMs)

Getting Started with Microsoft’s Guidance

Optimized LLM Serving with TGI

Trending Topics

Preventing Prompt Attacks on LLMs

Unleashing the Potential of GPUs for Training LLMs

Preparing High-Quality Training Data for LLM Fine-Tuning

Building an Investment Strategy in the Era of LLMs

How Large Language Models Reshape Trading Stats