Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

How-To Tutorials

7019 Articles
article-image-gpt-for-wealth-management-enhancing-customer-experience
Bhavishya Pandit
18 Sep 2023
10 min read
Save for later

GPT for Wealth Management: Enhancing Customer Experience

Bhavishya Pandit
18 Sep 2023
10 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionIn the dynamic world of finance, technology continually pushes boundaries. Today, financial institutions seek to enhance customer experiences with a powerful tool: Generative Artificial Intelligence (AI). This cutting-edge technology is revolutionizing finance, reshaping customer interactions, and elevating satisfaction and personalization.Generative AI, known for creative output and data generation, is now making waves in finance. It offers unique opportunities to transform the customer experience. By harnessing Generative AI's capabilities, financial institutions gain valuable insights, provide hyper-personalized solutions, and align offerings with individual needs.This article explores Generative AI's impact on Wealth Management in finance. We uncover innovative applications, from personalized financial product recommendations to intuitive virtual assistants meeting customer needs. Additionally, we discuss the benefits, challenges, and ethical considerations of using Generative AI to enhance customer satisfaction.Customer Pain Points in Wealth ManagementIn the ever-evolving realm of finance, where wealth management and customer service intersect, customers often grapple with a host of challenges that can significantly impact their overall satisfaction. These obstacles stem from various sources and play a pivotal role in shaping customer loyalty. Here, we delve into some prevalent pain points experienced by customers in the finance sector, specifically in the context of wealth management and customer service:1. Lack of Personalization: Many clients seek financial advice and solutions tailored to their distinct goals and circumstances. Yet, conventional wealth management approaches often fall short of delivering this level of customization, leaving customers feeling disconnected and dissatisfied.2. Limited Accessibility: Accessibility issues can arise when clients encounter hurdles in accessing their financial data or communicating with their wealth managers and customer service representatives. Challenges in initiating contact, receiving timely responses, or navigating complex procedures can breed frustration and hinder the customer journey.3. Complex and Confusing Information: Financial matters are inherently intricate, and the use of complex jargon and technicalities can overwhelm customers. When information is not conveyed clearly and effectively, clients may find themselves bewildered, making it arduous to make well-informed decisions.4. Slow and Inefficient Processes: Lengthy processing times, excessive paperwork, and cumbersome procedures can be significant roadblocks in the customer experience. Clients demand streamlined, efficient processes that conserve time and effort, allowing them to manage their wealth seamlessly.5. Inadequate Communication and Transparency: Effective communication stands as the bedrock of trust and robust relationships. Clients place a premium on transparent, proactive communication from their wealth managers and customer service representatives. Inadequate communication or a lack of transparency concerning fees, performance updates, or policy changes can breed dissatisfaction and erode trust.6. Limited Innovation and Technology Adoption: Expectations are on the rise, with clients anticipating financial institutions to embrace technology and provide innovative solutions to enrich their financial management experience. A dearth of technological advancements, such as user-friendly digital platforms and interactive tools, can leave clients feeling underserved and disconnected.Mitigating these recurring customer pain points necessitates a customer-centric approach. This approach should encompass personalized services, streamlined processes, transparent communication, and a wholehearted embrace of innovative technologies. Through active engagement with these pain points, financial institutions can craft superior customer experiences, foster lasting relationships, and set themselves apart in an increasingly competitive landscape.How Generative AI can be used for Wealth Management?Let's dive right into the crux of the matter. Customers look to financial institutions not just for financial guidance but for personalized advice that aligns with their unique wealth aspirations. They place a high premium on financial expertise to help them navigate the path to their financial goals. Traditional wealth management has traditionally excelled in fostering strong client relationships, with each customer paired with a dedicated relationship manager who intimately understands their individual objectives.However, here's where things get interesting: the traditional methods of wealth management sometimes fall short of meeting the sky-high expectations for personalization. The limitations primarily stem from the scarcity of relationship managers, leading to challenges in scalability and sluggish communication. This communication bottleneck occasionally results in misunderstandings due to varying levels of subject comprehension. These roadblocks, unfortunately, can turn customers off, leaving them feeling adrift and dissatisfied.Enter Generative AI, poised to be the game-changer in wealth management. With its ability to sidestep scalability issues, Generative AI emerges as a promising solution. Picture this: every customer is equipped with an AGI-powered Chatbot capable of addressing their queries, understanding their goals, and furnishing personalized financial plans tailored to their specific requirements. It's a potential paradigm shift in customer service that holds the promise of seamless, individualized wealth management experiences.Now let us see the working of a use case. In this article, we will walk through an LLM-powered Chatbot that will answer user queries.Demonstrating a use-case: Context-based LLM-powered chatbot for Financial advice# Importing Dependenciesimport streamlit as st from streamlit_chat import message import openai import os# Mentioning API keyopenai.api_key = 'PASTE-YOUR-KEY' os.environ['OPENAI_API_KEY'] = "PASTE-YOUR-KEY"# Function to return response from GPTdef fun(prompt):    response = openai.ChatCompletion.create(                engine="engine_name",                messages = [                            {'role': 'user',                             'content': prompt}                          ],                temperature=0,                max_tokens=800,                top_p=0.95,                frequency_penalty=0,                presence_penalty=0,                stop=None)    response = response['choices'][0]['message']['content']    return response # Function that checks whether the question asked is out of context or not. Returns True or Falsedef context_check(prompt):    testing_query = f'''    Instructions:    Answer the questions only related to "{context_topics}".       Query:    Study the prompt "{prompt}" and tell whether the user directly or indirectly asking questions related to "{context_topics}".    Give a response only in "True" or "False".       Remember:    1. Do no generate any other output, example, code etc.     2. Answer should be 1 word only. True or False.    '''    response = fun(testing_query)    return response#Returns filtered response after context checkingdef generate_response(prompt):    for topic in context_topics:        if topic not in prompt:            is_contexual = 'False'    instructions = f'''         Instructions:        0. Assume yourself to be an expert in answering Financial queries        1. Answer questions only to the topics mention in: "{context_topics}" at all costs!        2. Be precise and crisp.        3. Answer in short.        '''    is_contexual = context_check(prompt)    if is_contexual == 'True':        prompt += instructions        response = fun(prompt)        return response    elif is_contexual == 'False':        return "Sorry the question asked doesn't follow the guidelines." # Gets the input text from streamlitdef get_text():    input_text = st.text_input("How may I help?", key='input')    return input_text with open('only_reply.txt', 'r') as f:        context_topics = f.read() context_topics = context_topics.split('\n')[:-1] # context_topics = ['Finance', 'Wealth Management', 'Investment', 'Wealth'] st.set_page_config(    page_title="FinBot",    page_icon="💰", )st.write("# Welcome to FinBot💰!") changes = ''' <style> [data-testid = "stAppViewContainer"]    {    background-image:url('https://i.ibb.co/qrrD42j/Screenshot-2023-09-15-at-5-41-25-PM.png');    background-size:cover;    }       div.esravye2 > iframe {        background-color: transparent;    } </style> ''' st.markdown(changes, unsafe_allow_html=True) if 'generated' not in st.session_state:    st.session_state['generated'] = [] if 'past' not in st.session_state:    st.session_state['past'] = []user_input = get_text() if user_input:    output = generate_response(user_input)    js_clear_input = """    <script>    const inputElement = document.querySelector('.stTextInput input');    inputElement.addEventListener('keydown', function(event) {        if (event.key === 'Enter') {            inputElement.value = '';        }    });    </script>    """# Display the JavaScript code st.markdown(js_clear_input, unsafe_allow_html=True)    st.experimental_set_query_params(text_input="")    st.session_state.past.append(user_input)    st.session_state.generated.append(output) if st.session_state['generated']:    for i in range(len(st.session_state['generated'])-1, -1, -1):        message(st.session_state['generated'][i], key=str(i))        message(st.session_state['past'][i], key="user_"+str(i), is_user=True)ScreenshotsBlocking Out of context questionContextual QuestionsConclusionIn conclusion, Generative AI stands as a game-changing force in the realm of wealth management. Its ability to provide personalized financial advice and solutions on a scale previously unattainable is reshaping the landscape of financial services. By leveraging the vast potential of Generative AI, financial institutions can navigate the complexities of modern finance with unparalleled precision.The anticipated impact is profound: clients receive tailored recommendations that align seamlessly with their unique financial goals, risk profiles, and the ever-evolving market dynamics. This, in turn, leads to improved investment outcomes, heightened client satisfaction, and a deepened sense of trust in financial institutions.As we march forward, the synergy between technology and human expertise will continue to define the future of wealth management. Generative AI, as a powerful ally, empowers advisors and clients alike to make informed decisions, optimize portfolios, and nurture enduring financial success. In this dynamic landscape, the marriage of cutting-edge technology and personalized financial guidance promises to usher in an era of unprecedented prosperity and financial well-being for all.Author BioBhavishya Pandit is a Data Scientist at Rakuten! He has been extensively exploring GPT to find use cases and build products that solve real-world problems.
Read more
  • 0
  • 0
  • 173

article-image-revolutionizing-data-analysis-with-pandasai
Rohan Chikorde
18 Sep 2023
7 min read
Save for later

Revolutionizing Data Analysis with PandasAI

Rohan Chikorde
18 Sep 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionData analysis plays a crucial role in extracting meaningful insights from raw data, driving informed decision-making in various fields. Python's Pandas library has long been a go-to tool for data manipulation and analysis. Now, imagine enhancing Pandas with the power of Generative AI, enabling data analysis to become conversational and intuitive. Enter PandasAI, a Python library that seamlessly integrates Generative AI capabilities into Pandas, revolutionizing the way we interact with data.PandasAI is designed to bridge the gap between traditional data analysis workflows and the realm of artificial intelligence. By combining the strengths of Pandas and Generative AI, PandasAI empowers users to engage in natural language conversations with their data. This innovative library brings a new level of interactivity and flexibility to the data analysis process.With PandasAI, you can effortlessly pose questions to your dataset using human-like language, transforming complex queries into simple conversational statements. The library leverages machine learning models to interpret and understand these queries, intelligently extracting the desired insights from the data. This conversational approach eliminates the need for complex syntax and allows users, regardless of their technical background, to interact with data in a more intuitive and user-friendly way.Under the hood, PandasAI combines the power of natural language processing (NLP) and machine learning techniques. By leveraging pre-trained models, it infers user intent, identifies relevant data patterns, and generates insightful responses. Furthermore, PandasAI supports a wide range of data analysis operations, including data cleaning, aggregation, visualization, and more. It seamlessly integrates with existing Pandas workflows, making it a versatile and valuable addition to any data scientist or analyst's toolkit.In this comprehensive blog post, we will first cover how to install and configure PandasAI, followed by detailed usage examples to demonstrate its capabilities.Installing and Configuring PandasAIPandasAI can be easily installed using pip, Python's package manager:pip install pandasaiThis will download and install the latest version of the PandasAI package along with any required dependencies.Next, you need to configure credentials for the AI engine that will power PandasAI's NLP capabilities:from pandasai.llm.openai import OpenAI openai_api_key = "sk-..." llm = OpenAI(api_token=openai_api_key) ai = PandasAI(llm)PandasAI offers detailed documentation on how to get API keys for services like OpenAI and Anthropic.Once configured, PandasAI is ready to supercharge your data tasks through the power of language. Let's now see it in action through some examples.Intuitive Data Exploration Using Natural LanguageA key strength of PandasAI is enabling intuitive data exploration using plain English. Consider this sample data:data = pd.DataFrame({    'Product': ['A', 'B', 'C'],    'Sales': [100, 200, 50],    'Region': ['East', 'West', 'West']}) ai.init(data)You can now ask questions about this data conversationally:ai.run("Which region had the highest sales?") ai.run("Plot sales by product as a bar chart ordered by sales")PandasAI will automatically generate relevant summaries, plots, and insights from the data based on the natural language prompts.Automating Complex Multi-Step Data PipelinesPandasAI also excels at automating relatively complex multi-step analytical data workflows: ai.run("""    Load sales and inventory data    Join tables on product_id    Impute missing values    Remove outliers    Calculate inventory turnover ratio    Segment products into ABC categories """)This eliminates tedious manual coding effort with Pandas.Unified Analysis across Multiple DatasetsFor real-world analysis, PandasAI can work seamlessly across multiple datasets:sales = pd.read_csv("sales.csv") product = pd.read_csv("product.csv") customer = pd.read_csv("customer.csv") ai.add_frames(sales, product, customer) ai.run("Join the datasets. Show average order size by customer city.")This enables deriving unified insights across disconnected data sources.Building Data-Driven Analytics ApplicationsBeyond exploration, PandasAI can power analytics apps via Python integration. For instance:region = input("Enter region: ") ai.run(f"Compare {region} sales to national average") This allows creating customizable analytics tools for business users tailored to their needs. PandasAI can also enable production apps using Streamlit for the UI: import streamlit as st from pandasai import PandasAI region = st.text_input("Enter region:") … … … if region:    insight = ai.run(f"Analyze {region} sales")    st.write(insight)Democratizing Data-Driven DecisionsA key promise of PandasAI is democratizing data analysis by removing coding complexity. This allows non-technical users to independently extract insights through natural language.Data-driven decisions can become decentralized rather than relying on centralized analytics teams. Domain experts can get tailored insights on demand without coding expertise.Real-World ApplicationsLet's explore some real-world applications of PandasAI to understand how it can benefit various industries:FinanceFinancial analysts can use PandasAI to quickly analyze stock market data, generate investment insights, and create financial reports. They can ask questions like, "What are the top-performing stocks in the last quarter?" and receive instant answers. For Example:import pandas as pd from pandasai import PandasAI stocks = pd.read_csv("stocks.csv") ai = PandasAI(model="codex") ai.init(stocks) ai.run("What were the top 5 performing stocks last quarter?") ai.run("Compare revenue growth across technology and healthcare stocks") ai.run("Which sectors saw the most upside surprises in earnings last quarter?")HealthcareHealthcare professionals can leverage PandasAI to analyze patient data, track disease trends, and make informed decisions about patient care. They can ask questions like, "What are the common risk factors for a particular disease?" and gain valuable insights.MarketingMarketers can use PandasAI to analyze customer data, segment audiences, and optimize marketing strategies. They can ask questions like, "Which marketing channels have the highest conversion rates?" and fine-tune their campaigns accordingly.E-commerceE-commerce businesses can benefit from PandasAI by analyzing sales data, predicting customer behavior, and optimizing inventory management. They can ask questions like, "What products are likely to be popular next month?" and plan their stock accordingly.ConclusionPandasAI represents an exciting glimpse into the future of data analysis driven by AI advancement. By automating the tedious parts of data preparation and manipulation, PandasAI allows data professionals to focus on high-value tasks - framing the right questions, interpreting insights, and telling impactful data stories.Its natural language interface also promises to open up data exploration and analysis to non-technical domain experts. Rather than writing code, anyone can derive tailored insights from data by simply asking questions in plain English.As AI continues progressing, we can expect PandasAI to become even more powerful and nuanced in its analytical abilities over time. It paves the path for taking data science from simple pattern recognition to deeper knowledge generation using machines that learn, reason and connect concepts.While early in its development, PandasAI offers a taste of what is possible when the foundations of data analysis are reimagined using AI. It will be fascinating to see how this library helps shape and transform the analytics landscape in the coming years. For forward-thinking data professionals, the time to embrace its possibilities is now.In summary, by synergizing the strengths of Pandas and large language models, PandasAI promises to push the boundaries of what is possible in data analysis today. It represents an important milestone in the AI-driven evolution of the field.Author BioRohan Chikorde is an accomplished AI Architect professional with a post-graduate in Machine Learning and Artificial Intelligence. With almost a decade of experience, he has successfully developed deep learning and machine learning models for various business applications. Rohan's expertise spans multiple domains, and he excels in programming languages such as R and Python, as well as analytics techniques like regression analysis and data mining. In addition to his technical prowess, he is an effective communicator, mentor, and team leader. Rohan's passion lies in machine learning, deep learning, and computer vision.LinkedIn
Read more
  • 0
  • 0
  • 218

article-image-generative-ai-building-a-strong-data-foundation
Shankar Narayanan
15 Sep 2023
7 min read
Save for later

Generative AI: Building a Strong Data Foundation

Shankar Narayanan
15 Sep 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionGenerative AI has become increasingly popular among businesses and researchers, which has led to a growing interest in how data supports generative models. Generative AI relies heavily on the quality and diversity of its foundational data to generate new data samples from existing ones. In this blog post, I will explain why a strong data foundation is essential for Generative AI and explore the various methods used to build and prepare data systems. Why Data is Vital for Generative AI?Generative AI models can generate various outputs, from images to text to music. However, the accuracy and performance of these models depend primarily on the quality of the data they are trained on. The models will produce incorrect, biased, or unimpressive results if the foundation data is inadequate. The adage "garbage in, garbage out" is quite relevant here. The quality, diversity, and volume of data used will determine how well the AI system understands patterns and nuances. Methods of Building a Data Foundation for Generative AI To harness the potential of generative AI, enterprises need to establish a strong data foundation. But building a data foundation isn't a piece of cake. Like a killer marketing strategy, building a solid data foundation for generative AI involves a systematic collection, preparation, and management approach. Building a robust data foundation involves the following phases: Data Collection: Collecting data from diverse sources ensures variety. For example, a generative model that trains on human faces should include faces from different ethnicities, ages, and expressions. For example, you can run to collect data from a CSV file in Python.   import pandas as pd data = pd.read_csv('path_to_file.csv') print(data.head())  # prints first 5 rows To copy from a Database, you can use a Python code like this import sqlite3 DATABASE_PATH = 'path_to_database.db' conn = sqlite3.connect(DATABASE_PATH) cursor = conn.cursor() cursor.execute("SELECT * FROM table_name") rows = cursor.fetchall() for row in rows: print(row) conn.close()  Time-Series Data Time-series data is invaluable for generative models focusing on sequences or temporal patterns (like stock prices). Various operations can be performed with the Time series data, such as the one below.  import pandas as pd import numpy as np import matplotlib.pyplot as plt # Load data (assuming a CSV file with 'date' and 'value' columns) df = pd.read_csv('time_series_data.csv', parse_dates=['date'], index_col='date') # Making the Time Series Stationary # Differencing df['first_difference'] = df['value'] - df['value'].shift(1) # Log Transformation (if data is non-stationary after differencing) df['log_value'] = np.log(df['value']) df['log_first_difference'] = df['log_value'] - df['log_value'].shift(1) # 3. Smoothing with Moving Average window_size = 5  # e.g., using a window size of 5 df['moving_avg'] = df['first_difference'].rolling(window=window_size).mean()  Data Cleaning Detecting and managing outliers appropriately is crucial as they can drastically skew AI predictions. Lets see an example of Data Cleaning using Python.  import pandas as pd # Sample data for demonstration data = {    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Alice'],    'Age': [25, 30, np.nan, 29, 25],    'Salary': [50000, 55000, 52000, 60000, 50000],    'Department': ['HR', 'Finance', 'Finance', 'IT', None] } df = pd.DataFrame(data) # Removing duplicates df.drop_duplicates(inplace=True) Handling Missing Values: Accuracy can only be achieved with complete data sets. Techniques like imputation can be used to address gaps. The missing values can be handled for the data, like the following example. import pandas as pd import numpy as np import matplotlib.pyplot as plt # Load data (assuming a CSV file with 'date' and 'value' columns) df = pd.read_csv('time_series_data.csv', parse_dates=['date'], index_col='date') #  Handle Missing Values: Interpolation is one method df['value'].interpolate(method='linear', inplace=True)  Data AugmentationTransformations such as rotating, scaling, or flipping images can increase the volume and diversity of visual data. Sometimes, a little noise (random variations) is added to the data for robustness. We will do some essential data augmentation for the same data presented in the above example.  #  Correcting data types df['Age'] = df['Age'].astype(int)  # Convert float Age to integer # Removing outliers (using Z-score for Age as an example) from scipy import stats z_scores = np.abs(stats.zscore(df['Age'])) df = df[(z_scores < 3)] Data AnnotationAdding descriptions or tags helps AI understand the context. For example, in image datasets, metadata can describe the scene, objects, or emotions present. Having domain experts review and annotate data ensures high fidelity. Data Partitioning Segregating data ensures that models are not evaluated on the same data they are trained on. This technique uses multiple training and test sets to ensure generalized and balanced models. Data Storage & Accessibility Storing data in structured or semi-structured databases makes it easily retrievable. For scalability and accessibility, many organizations opt for cloud-based storage solutions. Generative AI's Need for Data Different Generative AI models require diverse types of data: Images: GANs, used to create synthetic images, rely heavily on large, diverse image datasets. They can generate artwork, fashion designs, or even medical images. Text: Models like OpenAI's GPT series require vast text corpora to generate human-like text. These models can produce news articles, stories, or technical manuals. Audio: Generative models can produce music or speech. They need extensive audio samples to capture nuances. Mixed Modalities: Some models integrate text, image, and audio data to generate multimedia content. ConclusionWe all know the capabilities and potential of generative AI models in various industries and roles like content creation, designing, and problem-solving. But to let it continuously evolve, improve, and generate better results, it's essential to recognize and leverage the correct data.  Enterprises that recognize the importance of data and invest in building a solid data foundation will be well-positioned to harness the creative power of generative AI in future years. As Generative AI advances, the role of data becomes even more critical. Just as a building requires a strong foundation to withstand the test of time, Generative AI requires a solid data foundation to produce meaningful, accurate, and valuable outputs. Building and preparing this foundation is essential, and investing time and resources into it will pave the way for breakthroughs and innovations in the realm of Generative AI. Author BioShankar Narayanan (aka Shanky) has worked on numerous different cloud and emerging technologies like Azure, AWS, Google Cloud, IoT, Industry 4.0, and DevOps to name a few. He has led the architecture design and implementation for many Enterprise customers and helped enable them to break the barrier and take the first step towards a long and successful cloud journey. He was one of the early adopters of Microsoft Azure and Snowflake Data Cloud. Shanky likes to contribute back to the community. He contributes to open source is a frequently sought-after speaker and has delivered numerous talks on Microsoft Technologies and Snowflake. He is recognized as a Data Superhero by Snowflake and SAP Community Topic leader by SAP.
Read more
  • 0
  • 0
  • 119

article-image-chatgpt-for-marketing
Valentina Alto
15 Sep 2023
8 min read
Save for later

ChatGPT for Marketing

Valentina Alto
15 Sep 2023
8 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Modern Generative AI with ChatGPT and OpenAI Models, by Valentina Alto. Master core data architecture design concepts and Azure Data & AI services to gain a cloud data and AI architect’s perspective to developing end-to-end solutions IntroductionIn this article, we will focus on how marketers can leverage ChatGPT, looking at the main use cases of ChatGPT in this domain, and how marketers can leverage it as a valuable assistant.We will learn how ChatGPT can assist in the following activities: Marketers’ need for ChatGPTNew product development and the go-to-market strategyBy the end of this article, you will be able to leverage ChatGPT for marketing-related activities and to boost your productivity.Technical requirementsYou will need an OpenAI account to access ChatGPT and DALL-E.All the code and prompts within this chapter are available in the GitHub repository of this bookhere: https://github.com/PacktPublishing/The-Ultimate-Guide-to-ChatGPTand-OpenAI.Marketers’ need for ChatGPTMarketing is probably the domain where ChatGPT and OpenAI models’ creative power can be leveraged in their purest form. They can be practical tools to support creative development in terms of new products, marketing campaigns, search engine optimization, and so on. Overall, marketers automate and streamline many aspects of their work, while also improving the quality and effectiveness of their marketing efforts.Here is an example. One of the most prominent and promising use cases of ChatGPT in marketing is personalized marketing. ChatGPT can be used to analyze customer data and generate personalized marketing messages that resonate with individual customers. For example, a marketing team can use ChatGPT to analyze customer data and develop targeted email campaigns that are tailored to specific customer preferences and behavior. This can increase the likelihood of conversion and lead to greater customer satisfaction. By providing insights into customer sentiment and behavior, generating personalized marketing messages, providing personalized customer support, and generating content, ChatGPT can help marketers deliver exceptional customer experiences and drive business growth.This is one of many examples of ChatGPT applications in marketing. In the following sections, we will look at concrete examples of end-to-end marketing projects supported by ChatGPT.New product development and the go-to-market strategyThe first way you can introduce ChatGPT into your marketing activity might be as an assistant in new product development and go-to-market (GTM) strategy.In this section, we will look at a step-by-step guide on how to develop and promote a new product. You already own a running clothing brand called RunFast and so far you have only produced shoes, so you want to expand your business with a new product line. We will start by brainstorming ideas to create a GTM strategy. Of course, everything is supported by ChatGPT:Brainstorming ideas: The first thing ChatGPT can support you with is brainstorming and drafting options for your new product line. It will also provide the reasoning behind each suggestion. So, let’s ask what kind of new product line I should focus on:Figure 7.1 – Example of new ideas generated by ChatGPTOut of the three suggestions, we will pick the first one, because of the reason ChatGPT suggested it—it is indeed a complementary product for our running shoes, so we will proceed with that one.Product name: Now that we have our idea fixed in mind, we need to think of a catchy name for it. Again, I will ask ChatGPT for more options so that I can then pick my favorite one:Figure 7.2 – A list of potential product namesSprintSoles sounds good enough for me – I’ll go ahead with that one.Generating catchy slogans: On top of the product name, I also want to share the intent behind the name and the mission of the product line, so that my target audience is captured by it. I want to inspire trust and loyalty in my customers and for them to see themselves reflected in the mission behind my new product line.Figure 7.3 – A list of slogans for my new product nameGreat – now I’m satisfied with the product name and slogan that I will use later on to create a unique social media announcement. Before doing that, I want to spend more time on market research for the target audience.Figure 7.4 – List of groups of target people to reach with my new product lineIt’s important to have in mind different clusters within your audience so that you can differentiate the messages you want to give. In my case, I want to make sure that my product line will address different groups of people, such as competitive runners, casual runners, and fitness enthusiasts.Product variants and sales channels: According to the preceding clusters of potential customers, I could generate product variants so that they are more tailored toward specific audiences:Figure 7.5 – Example of variants of the product lineSimilarly, I can also ask ChatGPT to suggest different sales channels for each of the preceding groups:Figure 7.6 – Suggestions for different sales channels by ChatGPTStanding out from the competition: I want my product line to stand out from the competition and emerge in a very saturated market – I want to make it unique. With this purpose in mind, I asked ChatGPT to include social considerations such as sustainability and inclusivity. Let’s ask ChatGPT for some suggestions in that respect:Figure 7.7 – Example of outstanding features generated by ChatGPTAs you can see, it was able to generate interesting features that could make my product line unique.Product Description: Now it’s time to start building our GTP plan. First of all, I want to generate a product description to put on my website, including all the earlier unique differentiators.Figure 7.8 – Example of description and SEO keywords generated by ChatGPTFair price: Another key element is determining a fair price for our product. As I differentiated among product variants for different audiences (competitive runners, casual runners, and fitness enthusiasts), I also want to have a price range that takes into account this clustering.Figure 7.9 – Price ranges for product variantsWe are almost there. We have gone through many new product development and go-to-market steps, and in each of them, ChatGPT acted as a great support tool.As one last thing, we can ask ChatGPT to generate an Instagram post about our new product, including relevant hashtags and SEO keywords. We can then generate the image with DALL-E!Figure 7.10 – Social media post generated by ChatGPTAnd, with the special contribution of DALL-E, here is the final result:Figure 7.11 – Instagram post entirely generated by ChatGPT and DALL-EOf course, many elements are missing here for complete product development and go-to-market. Yet, with the support of ChatGPT (and the special contribution of DALL-E – you can try DALL-E on your own at https://openai.com/product/dall-e-2, we managed to brainstorm a new product line and variants, potential customers, catchy slogans, and finally, generated a pretty nice Instagram post to announce the launch of SprintSoles!ConclusionIn this article, we explored ways in which ChatGPT can be used by marketers to enhance their marketing strategies. We learned that ChatGPT can help in developing new products as well as defining their go-to-market strategy.The importance of ChatGPT for marketers lies in its potential to revolutionize the way companies engage with their customers. By leveraging the power of NLP, ML, and big data, ChatGPT allows companies to create more personalized and relevant marketing messages, improve customer support and satisfaction, and ultimately, drive sales and revenue.Author BioValentina Alto graduated in 2021 in data science. Since 2020, she has been working at Microsoft as an Azure solution specialist, and since 2022, she has been focusing on data and AI workloads within the manufacturing and pharmaceutical industry. She has been working closely with system integrators on customer projects to deploy cloud architecture with a focus on modern data platforms, data mesh frameworks, IoT and real-time analytics, Azure Machine Learning, Azure Cognitive Services (including Azure OpenAI Service), and Power BI for dashboarding. Since commencing her academic journey, she has been writing tech articles on statistics, machine learning, deep learning, and AI in various publications and has authored a book on the fundamentals of machine learning with Python.
Read more
  • 0
  • 0
  • 105
Banner background image

article-image-demystifying-azure-openai-service
Olivier Mertens, Breght Van Baelen
15 Sep 2023
16 min read
Save for later

Demystifying Azure OpenAI Service

Olivier Mertens, Breght Van Baelen
15 Sep 2023
16 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Azure Data and AI Architect Handbook, by Olivier Mertens and Breght Van Baelen. Master core data architecture design concepts and Azure Data & AI services to gain a cloud data and AI architect’s perspective to developing end-to-end solutionsIntroductionOpenAI has risen immensely in popularity with the arrival of ChatGPT. The company, which started as an on-profit organization, has been the driving force behind the GPT and DALL-E model families, with intense research at a massive scale. The speed at which new models get released and become available on Azure has become impressive lately.Microsoft has a close partnership with OpenAI, after heavy investments in the company from Microsoft . The models created by OpenAI use Azure infrastructure for development and deployment. Within this partnership, OpenAI carries the responsibility of research and innovation, coming up with new models and new versions of their existing models. Microsoft manages the enterprise-scale go-to-market. It provides infrastructure and technical guidance, along with reliable SLAs, to get large organizations started with the integrations of these models, fine-tuning them on their own data, and hosting a private deployment of the models.Like the face recognition model in Azure Cognitive Services, powerful LLMs such as the ones in Azure OpenAI Service could be used to cause harm at scale. Therefore, this service is also gated according to Microsoft ’s guidelines on responsible AI.At the time of writing, Azure OpenAI Service offers access to the following models: GPT model family * GPT-3.5   * GPT-3.5-Turbo (the model behind ChatGPT  * GPT-4CodexDALL-E 2Let’s dive deeper into these models.The GPT model familyGPT models, which stands for generative pre-trained transformer models, made their first appearance in 2018, with GPT-1, trained on a dataset of roughly 7,000 books. This made good advancements in performance at the time, but the model was already vastly outdated a couple of years later. GPT-2 followed in 2019, trained on the WebText dataset (a collection of 8 million web pages). In 2020, GPT-3 was released, trained on the WebText dataset, two book corpora, and  English Wikipedia.In these years, there were no major breakthroughs in terms of efficient algorithms , but rather, in the scale of the architecture and datasets. This becomes easily visible when we look at the growing number of parameters used for every new generation of the model, as shown in the following figure.Figure 9.3 – A visual comparison between the sizes of the different generations of GPT models, based on their trainable parametersThe question is often raised of how to interpret this concept of parameters. An easy analogy is the number of neurons in a brain. Although parameters in a neural network are not equivalent to its artificial neurons, the number of parameters and neurons are heavily correlated – more parameters = more neurons. The more neurons there are in the brain, the more knowledge it can grasp.Since the arrival of GPT-3, we have seen two major adaptations of the third-generation model being made. The first one is GPT-3.5. This model has a similar architecture as the GPT-3 model but is trained on text and code, whereas the original GPT-3 only saw text data during training. Therefore, GPT-3.5 is capable of generating and understanding code. GPT-3.5, in turn, became the basis for the next adaptation, the vastly popular ChatGPT model. This model has been fine-tuned for conversational usage while using additional reinforcement learning to get a sense of ethical behavior.GPT model sizesThe OpenAI models are available in different sizes, which are all named after remarkable scientists. The GPT-3.5 model specifically, is  available in four versions:AdaBabbage CurieDavinciThe Ada model is the smallest, most lightweight model, while Davinci is the most complex and most performant model. The larger the model, the more expensive it is to use, host, and fine-tune, as shown in Figure 9.4. As a side note, when you hear about the absurd number of parameters of new GPT models, this usually refers to the Davinci model.Figure 9.4 – A trade-off exists between lightweight, cheap models and highly performant, complex modelsWith a trade-off between costs and performance available, an architect can start thinking about which model size may best fit a solution. In reality, this often comes down to empirical testing. If the cheaper model can perform the job at an acceptable performance, then this is the more cost-effective solution. Note that when talking about performance in this scenario, we mean predictive power, not the speed at which the model makes predictions. The larger models will be slower to output a prediction than the lightweight models.Understanding the difference between GPT-3.5 and GPT-3.5-Turbo (ChatGPT)GPT-3.5 and GPT-3.5-Turbo are both models used to generate natural language text, but they are used in different ways. GPT-3.5 is classified as a text completion model, whereas GPT-3.5-Turbo is referred to as conversational AI.To better understand the contrast between the two models, we first need to introduce the concept of contextual learning. These models are trained to understand the structure of the input prompt to provide a meaningful answer. Contextual learning is often split up into few-shot learning, one-shot learning, and zero-shot learning. Shot, in this context, refers to an example given in the input prompt. With few-shot learning, we provide multiple examples in the input prompt, one-shot learning provides a single example, and zero-shot indicates that no examples are given. In the case of the latter, the model will have to figure out a different way to understand what is being asked of it (such as interpreting the goal of a question).Consider the following example:Figure 9.5 – Few-shot learning takes up the most amount of tokens and requires more effort but often results in model outputs of higher qualityWhile it takes more prompt engineering effort to apply few-shot learning, it will usually yield better results. A text completion model, such as GPT-3.5, will perform vastly better on few-shot learning than one-shot or zero-shot. As the name suggests, the model figures out the structure of the input prompt (i.e., the examples) and completes the text accordingly.Conversational AI, such as ChatGPT, is more performant in zero-shot learning. In the case of the preceding example, both models are able to output the correct answer, but as questions become more and more complex, there will be a noticeable difference in predictive performance. Additionally, GPT-3.5-Turbo will remember information from previous input prompts, whereas GPT-3.5 prompts are handled independently.Innovating with GPT-4With the arrival of GPT-4, the focus has shifted toward multimodality. Multimodality in AI refers to the ability of an AI system to process and interpret information from multiple modalities, such as text, speech, images, and videos. Essentially, it is the capability of AI models to understand and combine data from different sources and formats.GPT-4 is capable of additionally taking images as input and interpreting them. It has stronger reasoning and overall performance than its predecessors. There was a famous example where GPT-4 was able to deduce that balloons would fly upward when asked what would happen if someone cut the balloons' strings, as shown in the following photo.Figure 9.6 – The image in question that was used in the experiment. When asked what would happen if the strings were cut, GPT-4 replied that the balloons would start flying awaySome adaptations of GPT-4, such as the one used in Bing Chat, have the extra feature of citing sources in generated answers. This is a welcome addition, as hallucination was a significant flaw in earlier GPT models.HallucinationHallucination in the context of AI refers to generating wrong predictions with high confidence. It is obvious that this can cause a lot more harm than the model indicating it is not sure how to respond or knowing the answer.Next, we will look at the Codex model.CodexCodex is a model that is architecturally similar to GPT-3, but it fully focuses on code generation and understanding. Furthermore, an adaptation of Codex forms the underlying model for GitHub Copilot, a tool that provides suggestions and auto-completion for code based on context and natural language inputs, available for various integrated development environments (IDEs) such as Visual Studio Code. Instead of a ready-to-use solution, Codex is (like the other models in Azure OpenAI) available as a model endpoint and should be used for integration in custom apps.The Codex model is initially trained on a collection of 54 million code repositories, resulting in billions of lines of code, with the majority of training data written in Python.Codex can generate code in different programming languages based on an input prompt in natural language (text-to-code), explain the function of blocks of code (code-to-text), add comments to code, and debug existing code.Codex is available as a C (cushman) and D (Davinci) model. Lightweight Codex models (A series or B series) currently do not exist.Models such as Codex or GitHub Copilot are a great way to boost the productivity of software engineers, data analysts, data engineers, and data scientists.  They do not replace these roles, as their accuracy is not perfect; rather, they give engineers the opportunity to start editing from a fairly well-written block of code instead of coding from scratch.DALL-E 2The DALL-E model family is used to generate visuals. By providing a  description in natural language in the input prompt, it generates a series of matching images. While other models are often used at scale in large enterprises, DALL-E 2 tends to be more popular in smaller businesses. Organizations that lack an in-house graphic designer can make great use of DALL-E to generate visuals for banners, brochures, emails, web pages, and so on.DALL-E 2 only has a  single model size to choose from, although open-source alternatives exist if a lightweight version is preferred. Fine-tuning and private deploymentsAs a data architect, it is important to understand the cost structure of these models. The first option is to use the base model in a serverless manner. Similar to how we work with Azure Cognitive Services, users will get a key for the model’s endpoint and simply pay per prediction. For DALL-E 2, costs are incurred per 100 images, while the GPT and Codex models are priced per 1,000 tokens. For every request made to a GPT or Codex model, all tokens of the input prompt and the output are added up to determine the cost of the prediction.TokensIn natural language processing, a token refers to a sequence of characters that represents a distinct unit of meaning in a text. These units do not necessarily correspond to words, although for short words, this is mostly the case. Tokens are used as the basic building blocks to process and analyze text data. A good rule of thumb for the English language is that one token is, on average, four characters. Dividing your total character count by four will make a good estimate of the number of tokens.Azure OpenAI Service also grants extensive fine-tuning functionalities. Up to 1 GB of data can be uploaded per Azure OpenAI instance for fine-tuning. This may not sound like a lot, but note that we are not training a new model from scratch. The goal of fine-tuning is to retrain the last few layers of the model to increase performance on specific tasks or company-specific knowledge. For this process, 1 GB of data is more than sufficient.When adding a fine-tuned model to a solution, two additional costs will be incurred. On top of the token-based inference cost, we need to take into account the training and hosting costs. The hourly training cost can be quite high due to the amount of hardware needed, but compared to the inference and hosting costs during a model’s life cycle, it remains a small percentage. Next, since we are not using the base model anymore and, instead, our own “version” of the model, we will need to host the model ourselves, resulting in an hourly hosting cost.Now that we have covered both pre-trained model collections, Azure Cognitive Services, and Azure OpenAI Service, let’s move on to custom development using Azure Machine Learning.Grounding LLMsOne of the most popular use cases for LLMs involves providing our own data as context to the model (often referred to as grounding). The reason for its popularity is partly due to the fact that many business cases can be solved using a consistent technological architecture. We can reuse the same solution, but by providing different knowledge bases, we can serve different end users.For example, by placing an LLM on top of public data such as product manuals or product specifics, it is easy to develop a customer support chatbot. If we swap out this knowledge base of product information with something such as HR documents, we can reuse the same tech stack to create an internal HR virtual assistant.A common misconception regarding grounding is that a model needs to be trained on our own data. This is not the case. Instead, after a user asks a question, the relevant document (or paragraphs) is injected into the prompt behind the scenes and lives in the memory of the model for the duration of the chat session (when working with conversational AI) or for a single prompt. The context, as we call it, is then wiped clean and all information is forgotten. If we wanted to cache this info, it is possible to make use of a framework such as LangChain or Semantic Kernel, but that is out of the scope of this book.The fact that a model does not get retrained on our own data plays a crucial role in terms of data privacy and cost optimization. As shown before in the section on fine-tuning, as soon as a base model is altered, an hourly operating cost is added to run a private deployment of the model. Also, information from the documents cannot be leaked to other users working with the same model.Figure 9.7 visualizes the architectural concepts to ground an LLM.Figure 9.7 – Architecture to ground an LLMThe first thing to do is turn the documents that should be accessible to the model into embeddings. Simply put, embeddings are mathematical representations of natural language text. By turning text into embeddings, it is possible to accurately calculate the similarity (from a semantics perspective) between two pieces of text.To do this, we can leverage Azure Functions, a service that allows pieces of code to run in a serverless function. It often forms the glue between different components by handling interactions. In this case, an Azure function (on the bottom left of Figure 9.7) will grab the relevant documents from the knowledge base, break them up into chunks (to accommodate for the maximum token limits of the model), and generate an embedding for each one. This embedding is then stored, alongside the natural language text, in a vector database. This function should be run for all historic data that will be accessible to the model, as well as triggered for every new, relevant document that is added to the knowledge base.Once the vector database is in place, users can start asking questions. However, the user questions are not directly sent to the model endpoint. Instead, another Azure function (shown at the top of Figure 9.7) will turn the user question into an embedding and check its similarity of it with the embeddings of the documents or paragraphs in the vector database. Then, the top X most relevant text chunks are injected into the prompt as context, and the prompt is sent over to the LLM. Finally, the response is returned to the user.ConclusionAzure OpenAI Service, a collaboration between OpenAI and Microsoft, delivers potent AI models. The GPT model family, from GPT-1 to GPT-4, has evolved impressively, with GPT-3.5-Turbo (ChatGPT) excelling in conversational AI. GPT-4 introduces multimodal capabilities, comprehending text, speech, images, and videos.Codex specializes in code generation, while DALL-E 2 creates visuals from text descriptions. These models empower developers and designers. Customization via fine-tuning offers cost-effective solutions for specific tasks. Leveraging Azure OpenAI Service for your projects enhances productivity.Grounding language models with user data ensures data privacy and cost efficiency. This collaboration holds promise for innovative AI applications across various domains.Author BioOlivier Mertens is a cloud solution architect for Azure data and AI at Microsoft, based in Dublin, Ireland. In this role, he assisted organizations in designing their enterprise-scale data platforms and analytical workloads. Next to his role as an architect, Olivier leads the technical AI expertise for Microsoft EMEA in the corporate market. This includes leading knowledge sharing and internal upskilling, as well as solving highly complex or strategic customer AI cases. Before his time at Microsoft, he worked as a data scientist at a Microsoft partner in Belgium.Olivier is a lecturer for generative AI and AI solution architectures, a keynote speaker for AI, and holds a master’s degree in information management, a postgraduate degree as an AI business architect, and a bachelor’s degree in business management.Breght Van Baelen is a Microsoft employee based in Dublin, Ireland, and works as a cloud solution architect for the data and AI pillar in Azure. He provides guidance to organizations building large-scale analytical platforms and data solutions. In addition, Breght was chosen as an advanced cloud expert for Power BI and is responsible for providing technical expertise in Europe, the Middle East, and Africa. Before his time at Microsoft, he worked as a data consultant at Microsoft Gold Partners in Belgium.Breght led a team of eight data and AI consultants as a data science lead. Breght holds a master’s degree in computer science from KU Leuven, specializing in AI. He also holds a bachelor’s degree in computer science from the University of Hasselt.
Read more
  • 0
  • 0
  • 119

article-image-ai-distilled-17-numentas-nupic-adepts-persimmon-8b-hugging-face-rust-ml-framework-nvidias-tensorrt-llm-azure-ml-promptflow-siris-gen-ai-enhancements
Merlyn Shelley
15 Sep 2023
11 min read
Save for later

AI_Distilled #17: Numenta’s NuPIC, Adept’s Persimmon-8B, Hugging Face Rust ML Framework, NVIDIA’s TensorRT-LLM, Azure ML PromptFlow, Siri's Gen AI Enhancements

Merlyn Shelley
15 Sep 2023
11 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,"If we don't embrace AI, it will move forward without us. Now is the time to harness AI's potential for the betterment of society."- Fei-Fei Li, Computer Scientist and AI Expert. AI is proving to be a real game-changer worldwide, bringing new perspectives to everyday affairs in every field. No wonder Apple is heavily investing in Siri's generative AI enhancement and Microsoft to Provide Legal Protection for AI-Generated Copyright Breaches however, AI currently has massive cooling requirements in data centers which has led to a 34% increase in water consumption in Microsoft data centers. Say hello to the latest edition of our AI_Distilled #17 where we talk about all things LLM, NLP, GPT, and Generative AI! In this edition, we present the latest AI developments from across the world, including NVIDIA TensorRT-LLM enhances Large Language Model inference on H100 GPUs, Meta developing powerful AI system to compete with OpenAI, Google launching Digital Futures Project to support responsible AI, Adept open-sourcing a powerful language model with <10 billion parameters, and Numenta introduces NuPIC, revolutionizing AI efficiency by 100 Times. We know how much you love our curated AI secret knowledge resources. This week, we’re here with some amazing tutorials on building an AWS conversational AI app with AWS Amplify, how to evaluate legal language models with Azure ML PromptFlow, deploying generative AI models on Amazon EKS with a step-by-step guide, Automate It with Zapier and Generative AI and generating realistic textual synthetic data using LLMs. What do you think of this issue and our newsletter? Please consider taking the short survey below to share your thoughts and you will get a free PDF of the “The Applied Artificial Intelligence Workshop” eBook upon completion. Complete the Survey. Get a Packt eBook for Free!Writer’s Credit: Special shout-out to Vidhu Jain for their valuable contribution to this week’s newsletter content!  Cheers,  Merlyn Shelley  Editor-in-Chief, Packt  ⚡ TechWave: AI/GPT News & AnalysisGoogle Launches Digital Futures Project to Support Responsible AI: Google has initiated the Digital Futures Project, accompanied by a $20 million fund from Google.org to provide grants to global think tanks and academic institutions. This project aims to unite various voices to understand and address the opportunities and challenges presented by AI. It seeks to support researchers, organize discussions, and stimulate debates on public policy solutions for responsible AI development. The fund will encourage independent research on topics like AI's impact on global security, labor, and governance structures. Inaugural grantees include renowned institutions like the Aspen Institute and MIT Work of the Future.  Microsoft to Provide Legal Protection for AI-Generated Copyright Breaches: Microsoft has committed to assuming legal responsibility for copyright infringement related to material generated by its AI software used in Word, PowerPoint, and coding tools. The company will cover legal costs for commercial customers who face lawsuits over tools or content produced by AI. This includes services like GitHub Copilot and Microsoft 365 Copilot. The move aims to ease concerns about potential clashes with content owners and make the software more user-friendly. Other tech companies, such as Adobe, have made similar pledges to indemnify users of AI tools. Microsoft's goal is to provide reassurance to paying users amid the growing use of generative AI, which may reproduce copyrighted content. NVIDIA TensorRT-LLM Enhances Large Language Model Inference on H100 GPUs: NVIDIA introduces TensorRT-LLM, a software solution that accelerates and optimizes LLM inference. This open-source software incorporates advancements achieved through collaboration with leading companies. TensorRT-LLM is compatible with Ampere, Lovelace, and Hopper GPUs, aiming to streamline LLM deployment. It offers an accessible Python API for defining and customizing LLM architectures without requiring deep programming knowledge. Performance improvements are demonstrated with real-world datasets, including a 4.6x acceleration for Meta's Llama 2. Additionally, TensorRT-LLM helps reduce total cost of ownership and energy consumption in data centers, making it a valuable tool for the AI community. Meta Developing Powerful AI System to Compete with OpenAI: The Facebook parent company is reportedly working on a new AI system that aims to rival the capabilities of OpenAI's advanced models. The company intends to launch this AI model next year, and it is expected to be significantly more powerful than Meta's current offering, Llama 2, an open-source AI language model. Llama 2 was introduced in July and is distributed through Microsoft's Azure services to compete with OpenAI's ChatGPT and Google's Bard. This upcoming AI system could assist other companies in developing sophisticated text generation and analysis services. Meta plans to commence training on this new AI system in early 2024. Adept Open-Sources a Powerful Language Model with <10 Billion Parameters: Adept announces the open-source release of Persimmon-8B, a highly capable language model with fewer than 10 billion parameters. This model, made available under an Apache license, is designed to empower the AI community for various use cases. Persimmon-8B stands out for its substantial context size, being 4 times larger than LLaMA2 and 8 times more than GPT-3. Despite using only 0.37x the training data of LLaMA2, it competes with its performance. It includes 70k unused embeddings for multimodal extensions and offers unique inference code combining speed and flexibility. Adept expects this release to inspire innovation in the AI community. Apple Invests Heavily in Siri's Generative AI Enhancement: Apple has significantly increased its investment in AI, particularly in developing conversational chatbot features for Siri. The company is reportedly spending millions of dollars daily on AI research and development. CEO Tim Cook expressed a strong interest in generative AI. Apple's AI journey began four years ago when John Giannandrea, head of AI, formed a team to work on LLMs. The Foundational Models team, led by Ruoming Pang, is at the forefront of these efforts, rivaling OpenAI's investments. Apple plans to integrate LLMs into Siri to enhance its capabilities, but the challenge lies in fitting these large models onto devices while maintaining privacy and performance standards. Numenta Introduces NuPIC: Revolutionizing AI Efficiency by 100 Times: Numenta, a company bridging neuroscience and AI, has unveiled NuPIC (Numenta Platform for Intelligent Computing), a groundbreaking solution rooted in 17 years of brain research. Developed by computing pioneers Jeff Hawkins and Donna Dubinsky, NuPIC aims to make AI processing up to 100 times more efficient. Partnering with game startup Gallium Studios, NuPIC enables high-performance LLMs on CPUs, prioritizing user trust and privacy. Unlike GPU-reliant models, NuPIC's CPU focus offers cost savings, flexibility, and control while maintaining high throughput and low latency. AI Development Increases Water Consumption in Microsoft Data Centers by 34%: The development of AI tools like ChatGPT has led to a 34% increase in Microsoft's water consumption, raising concerns in the city of West Des Moines, Iowa, where its data centers are located. Microsoft, along with tech giants like OpenAI and Google, has seen rising demand for AI tools, which comes with significant costs, including increased water usage. Microsoft disclosed a 34% spike in global water consumption from 2021 to 2022, largely attributed to AI research. A study estimates that ChatGPT consumes 500 milliliters of water every time it's prompted. Google also reported a 20% growth in water use, partly due to AI work. Microsoft and OpenAI stated they are working to make AI systems more efficient and environmentally friendly.  🔮 Looking for a New Book from Packt’s Expert Community? Automate It with Zapier and Generative AI - By Kelly Goss, Philip Lakin Are you excited to supercharge your work with Gen AI's automation skills?  Check out this new guide that shows you how to become a Zapier automation pro, making your work more efficient and productive in no time! It covers planning, configuring workflows, troubleshooting, and advanced automation creation. It emphasizes optimizing workflows to prevent errors and task overload. The book explores new built-in apps, AI integration, and complex multi-step Zaps. Additionally, it provides insights into account management and Zap issue resolution for improved automation skills. Read through the Chapter 1 unlocked here...  🌟 Secret Knowledge: AI/LLM ResourcesUnderstanding Liquid Neural Networks: A Primer on AI Advancements: In this post, you'll learn how liquid neural networks are transforming the AI landscape. These networks, inspired by the human brain, offer a unique and creative approach to problem-solving. They excel in complex tasks such as weather prediction, stock market analysis, and speech recognition. Unlike traditional neural networks, liquid neural networks require significantly fewer neurons, making them ideal for resource-constrained environments like autonomous vehicles. These networks excel in handling continuous data streams but may not be suitable for static data. They also provide better causality handling and interpretability. Navigating Generative AI with FMOps and LLMOps: A Practical Guide: In this informative post, you'll gain valuable insights into the world of generative AI and its operationalization using FMOps and LLMOps principles. The authors delve into the challenges businesses face when integrating generative AI into their operations. You'll explore the fundamental differences between traditional MLOps and these emerging concepts. The post outlines the roles various teams play in this process, from data engineers to data scientists, ML engineers, and product owners. The guide provides a roadmap for businesses looking to embrace generative AI. AI Compiler Quartet: A Breakdown of Cutting-Edge Technologies: Explore Microsoft’s groundbreaking "heavy-metal quartet" of AI compilers: Rammer, Roller, Welder, and Grinder. These compilers address the evolving challenges posed by AI models and hardware. Rammer focuses on optimizing deep neural network (DNN) computations, improving hardware parallel utilization. Roller tackles the challenge of memory partitioning and optimization, enabling faster compilation with good computation efficiency. Welder optimizes memory access, particularly vital as AI models become more memory-intensive. Grinder addresses complex control flow execution in AI computation. These AI compilers collectively offer innovative solutions for parallelism, compilation efficiency, memory, and control flow, shaping the future of AI model optimization and compilation.  💡 MasterClass: AI/LLM Tutorials Exploring IoT Data Simulation with ChatGPT and MQTTX: In this comprehensive guide, you'll learn how to harness the power of AI, specifically ChatGPT, and the MQTT client tool, MQTTX, to simulate and generate authentic IoT data streams. Discover why simulating IoT data is crucial for system verification, customer experience enhancement, performance assessment, and rapid prototype design. The article dives into the integration of ChatGPT and MQTTX, introducing the "Candidate Memory Bus" to streamline data testing. Follow the step-by-step guide to create simulation scripts with ChatGPT and efficiently simulate data transmission with MQTTX.  Revolutionizing Real-time Inference: SageMaker Unveils Streaming Support for Generative AI: Amazon SageMaker now offers real-time response streaming, transforming generative AI applications. This new feature enables continuous response streaming to clients, reducing time-to-first-byte and enhancing interactive experiences for chatbots, virtual assistants, and music generators. The post guides you through building a streaming web application using SageMaker real-time endpoints for interactive chat use cases. It showcases deployment options with AWS Large Model Inference (LMI) and Hugging Face Text Generation Inference (TGI) containers, providing a seamless, engaging conversation experience for users. Implementing Effective Guardrails for Large Language Models: Guardrails are crucial for maintaining trust in LLM applications as they ensure compliance with defined principles. This guide presents two open-source tools for implementing LLM guardrails: Guardrails AI and NVIDIA NeMo-Guardrails. Guardrails AI offers Python-based validation of LLM responses, using the RAIL specification. It enables developers to define output criteria and corrective actions, with step-by-step instructions for implementation. NVIDIA NeMo-Guardrails introduces Colang, a modeling language for flexible conversational workflows. The guide explains its syntax elements and event-driven design. Comparing the two, Guardrails AI suits simple tasks, while NeMo-Guardrails excels in defining advanced conversational guidelines. 🚀 HackHub: Trending AI Tools cabralpinto/modular-diffusion: Python library for crafting and training personalized Diffusion Models with PyTorch.  cofactoryai/textbase: Simplified Python chatbot development using NLP and ML with Textbase's on_message function in main.py. microsoft/BatteryML: Open-source ML tool for battery analysis, aiding researchers in understanding electrochemical processes and predicting battery degradation. facebookresearch/co-tracker: Swift transformer-based video tracker with Optical Flow, pixel-level tracking, grid sampling, and manual point selection. explodinggradients/ragas: Framework evaluates Retrieval Augmented Generation pipelines, enhancing LLM context with external data using research-based tools. 
Read more
  • 0
  • 0
  • 123
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-openai-and-chatgpt-for-enterprises
Valentina Alto
14 Sep 2023
9 min read
Save for later

OpenAI and ChatGPT for Enterprises

Valentina Alto
14 Sep 2023
9 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Modern Generative AI with ChatGPT and OpenAI Models, by Valentina Alto. Harness the power of AI with innovative, real-world applications, and unprecedented productivity boosts, powered by the latest advancements in AI technology like ChatGPT and OpenAIIntroductionIn this article, we’ll focus on the enterprise-level applications of OpenAI models and introduce the partnership between OpenAI and Microsoft and Azure OpenAI (AOAI) Service. We will go through the milestones and developments of Microsoft in the field of artificial intelligence (AI), highlighting the journey that brought the Azure cloud into the game of OpenAI, and why this is a game-changer for large organizations. Finally, we will consider the topic of responsible AI and how to make sure your AI system complies with ethical standards.In this article, we will discuss the following topics:The history of the partnership between Microsoft and OpenAI and the introduction of AOAI ServiceThe role of the public cloud in the context of OpenAI models Responsible AITechnical requirementsThe following are the technical requirements for this article: An Azure subscription, which you can create for free here: https://azure.microsoft. com/free/cognitive-services.Access granted to Azure OpenAI in the desired Azure subscription. At the time of writing, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/accessAzure OpenAI ServiceAOAI Service is a product of Microsoft that provides REST API access to OpenAI’s powerful language models such as GPT-3.5, Codex, and DALL-E. You can use these models for the very same tasks as OpenAI models, such as content generation, summarization, semantic search, natural language, and code translation.In the context of the Microsoft Azure AI portfolio, AOAI Service is collocated among the following Cognitive Services offerings:Figure - AOAI Service General Availability (GA)As with any other Cognitive Services offering, AOAI offers models that have already been trained and are ready to be consumed.To create your AOAI resource, follow these instructions:1.      Navigate to the Azure portal at https://ms.portal.azure.com.2.      Click on Create a resource.3.      Type azure openai and click on Create.4.      Fill in the required information and click on Review + create.This is shown in the following screenshot:Figure  – Steps to create an AOAI resourceThis process might take a few minutes. Once it is ready, you can directly jump to its user-friendly interface, AOAI Playground, to test your models before deploying them:Figure  – AOAI UI and PlaygroundNote that AOAI Playground looks almost identical to the OpenAI Playground version we saw in Chapter 2. The difference here is that, to use AOAI models, you have to initiate a deployment, which is a serverless compute instance you can attach to a model. You can do so either in Playground or on the resource backend page in the Azure portal:Figure – Creating a new AOAI deployment via Playground (A) or in the Azure portal (B)For example, I created a deployment called text-davinci-003 with an associated textdavinci-003 model:Figure 9.7 – An active deployment of AOAIIn OpenAI Playground, we can test those models either directly via the user interface or by embedding their APIs into our applications. In the next section, we are going to explore how to interact with Playground and try different models’ configurations. In Chapter 10, we will learn how to integrate AOAI’s Models API into enterprise applications.Exploring PlaygroundAOAI Playground is the easiest way to get familiar with the underlying models and start planning which model’s version is the most suitable for your projects. The user interface presents different tabs and workspaces, as shown in the following screenshot:Figure - Overview of AOAI PlaygroundLet’s explore each of them:Playground | Chat: The Chat workspace is designed to be only used with conversational models such as GPT-3.5-turbo (the model behind ChatGPT):Figure – AOAI Chat workspaceIt offers a similar experience to ChatGPT itself, with the possibility to configure your model with additional parameters (as we saw in Chapter 2 with OpenAI Playground). Furthermore, there is an additional feature that makes the Chat workspace very interesting, known as System message:         Figure – Example of System messageSystem message is the set of instructions we give the model to tell it how to behave and interact with us. As for the prompt, System message represents a key component of a model’s configuration since it massively affects model performance.For example, let’s instruct our model to behave as a JSON formatter assistant:Figure – Example of a model acting as a JSON formatter assistantAs you can see from the previous screenshot, the model was able to suggest a JSON file through some simple data, such as name and age, without the need to specify any labels.Playground | Completions: Different from the previous workspace, the Completions workspace offers a sort of white paper where you can interact with your models. While GPT-3.5-turbo is designed for conversational tasks (which means it can be consumed via a chatbot-like interface), the GPT-3 series contains more general-purpose models and can be used for a wide range of language tasks, such as content generation, summarization, and so on.For example, we could ask our model to generate a quiz by giving it a description of the topic and a one-shot example, as shown here:Figure – Example of a GPT model generating a quizFinally, as per the Chat workspace, with Completions, you can configure parameters such as the maximum number of tokens or the temperature (refer to Chapter 2 for a comprehensive list of those parameters and their meanings).Management | Deployments: Within the Deployments tab, you can create and manage new deployments to be associated with AOAI models. They are depicted here:Figure – List of AOAI deploymentsEach deployment can host only one model. You can edit or delete your deployments at any time. As we mentioned previously, a model deployment is the enabler step for using either the Completions or Chat workspace within AOAI Service.Management | Models: Within this tab, you can quickly assess the models that are available within AOAI Service and, among them, those that can be deployed (that is, a model that hasn’t been deployed yet). For example, let’s consider the following screenshot:Figure – List of AOAI modelsHere, we have text-similarity-curie-001. It doesn’t have an associated deployment, so it can be deployed (as the Deployable column shows). On the other hand, text-similarityada-002 already has a deployment, so it is not available anymore. Within this tab, you can also create a custom model by following a procedure called fine-tuning.We explored this in Chapter 2:Figure – Example of model fine-tuningStarting from this guided widget, you can upload your training and validation data to produce a customized model, starting from a base model (namely, text-davinci-002), which will be hosted on a dedicated deployment.NoteIn Chapter 2, we saw that the training dataset should align with a specific format of the following type (called JSONL):{"prompt": "<prompt text>", "completion": "<ideal generated text>"}{"prompt": "<prompt text>", "completion": "<ideal generated text>"}{"prompt": "<prompt text>", "completion": "<ideal generated text>"}...To facilitate this formatting, OpenAI has developed a tool that can format your data into this specific format ready for fine-tuning. It can also provide suggestions on how to modify data so that the tool can be used for fine-tuning. Plus, it accepts various data formats as inputs, including CSV, TXT, and JSON.To use this tool, you can initialize the OpenAI command-line interface (CLI) by running the following command: pip install --upgrade openaiOnce initialized, you can run the tool, as follows:openai tools fine_tunes.prepare_data -f <LOCAL_FILE>Management | File Management: Finally, within the File Management tab, you can govern and upload your training and test data directly from the user interface, as shown here:Figure – Example of uploading a file within AOAI ServiceYou can decide to upload files by selecting Local file or Azure blob or other shared web locations. Once you’ve uploaded your files, you will be able to select them while creating customized models, via the Models tab.Finally, as mentioned in the previous section, each model comes with a REST API that can be consumed in your applications.ConclusionIn this article, we saw how the partnership between OpenAI and Microsoft has brought about a powerful and innovative AI solution for enterprise-level organizations: AOAI. This service combines OpenAI’s cutting-edge technology with Microsoft’s extensive cloud infrastructure to provide businesses with a scalable and customizable platform for building and deploying advanced AI applications.We also dwelled on Microsoft’s strong focus on responsible AI practices and ethics, and how AOAI Service reflects this commitment to responsible AI, with features such as a content filter built into the platform.As AI continues to transform industries and shape our future, the collaboration between OpenAI and Microsoft marks an important milestone in the development of enterprise-level AI solutions. AOAI empowers businesses to harness the power of AI to drive growth and innovation while ensuring ethical and responsible practices.Author BioValentina Alto graduated in 2021 in data science. Since 2020, she has been working at Microsoft as an Azure solution specialist, and since 2022, she has been focusing on data and AI workloads within the manufacturing and pharmaceutical industry. She has been working closely with system integrators on customer projects to deploy cloud architecture with a focus on modern data platforms, data mesh frameworks, IoT and real-time analytics, Azure Machine Learning, Azure Cognitive Services (including Azure OpenAI Service), and Power BI for dashboarding. Since commencing her academic journey, she has been writing tech articles on statistics, machine learning, deep learning, and AI in various publications and has authored a book on the fundamentals of machine learning with Python.
Read more
  • 0
  • 0
  • 163

article-image-future-trends-in-pretraining-foundation-models
Emily Webber
14 Sep 2023
17 min read
Save for later

Future Trends in Pretraining Foundation Models

Emily Webber
14 Sep 2023
17 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Pretrain Vision and Large Language Models in Python, by Emily Webber. Master the art of training vision and large language models with conceptual fundaments and industry-expert guidance. Learn about AWS services and design patterns, with relevant coding examplesIntroductionIn this article, we’ll explore trends in foundation model application development, like using LangChain to build interactive dialogue applications, along with techniques like retrieval augmented generation to reduce LLM hallucination. We’ll explore ways to use generative models to solve classification tasks, human-centered design, and other generative modalities like code, music, product documentation, powerpoints, and more! We’ll talk through AWS offerings like SageMaker JumpStart Foundation Models, Amazon Bedrock, Amazon Titan, and Amazon Code Whisperer.In particular, we’ll dive into the following topics:Techniques for building applications for LLMsGenerative modalities outside of vision and languageAWS offerings in foundation modelsTechniques for building applications for LLMsNow that you’ve learned about foundation models, and especially large language models, let’s talk through a few key ways you can use them to build applications. One of the most significant takeaways of the ChatGPT moment in December 2022 is that customers clearly love for their chat to be knowledgeable about every moment in the conversation, remember topics mentioned earlier, and encompassing all the twists and turns of dialogue. Said another way, beyond generic question answering, there’s a clear consumer preference for a chat to be chained. Let’s take a look at an example in the following screenshot:Figure 15.1 – Chaining questions for chat applicationsThe key difference between the left - and the right-hand side of Figure 15.1 is that on the left-hand side, the answers are discontinuous. That means the model simply sees each question as a single entity before providing its response. On the right-hand side, however, the answers are continuous. That means the entire dialogue is provided to the model, with the newest question at the bottom. This helps to ensure the continuity of responses, with the model more capable of maintaining the context.How can you set this up yourself? Well, on the one hand, what I’ve just described isn’t terribly difficult. Imagine just reading from your HTML page, packing in all of that call and response data into the prompt, and siphoning out the response to return it to your end user. If you don’t want to build it yourself, however, you can just use a few great open-source options!Building interactive dialogue apps with open-source stacksIf you haven’t seen it before, let me quickly introduce you to LangChain. Available for free on GitHub here: https://github.com/hwchase17/langchain, LangChain is an open-source toolkit built by Harrison Chase and more than 600 other contributors. It provides functionality similar to the famous ChatGPT by pointing to OpenAI’s API, or any other foundation model, but letting you as the developer and data scientist create your own frontend and customer experience.Decoupling the application from the model is a smart move; in the last few months alone the world has seen nothing short of hundreds of new large language models come online, with teams around the world actively developing more. When your application interacts with the model via a single API call, then you can more easily move from one model to the next as the licensing, pricing, and capabilities upgrade over time. This is a big plus for you!Another interesting open-source technology here is Haystack (26).  Developed by the German start-up, Deepset, Haystack is a useful tool for, well, finding a needle in a haystack. Specifically, they operate like an interface for you to bring your own LLMs into expansive question/answering scenarios. This was their original area of expertise, and since then have expanded quite a bit!At AWS, we have an open-source template for building applications with LangChain on AWS. It’s available on GitHub here: https://github.com/3coins/langchain-aws-template.In the following diagram, you can see a quick representation of the architecture:While this can point to any front end, we provide an example template you can use to get off the ground for your app. You can also easily point to any custom model, whether it’s on a SageMaker endpoint or in the new AWS service, Bedrock! More on that a bit later in this chapter. As you can see in the previous image, in this template you can easily run a UI anywhere that interacts with the cloud. Let’s take a look at all of the steps.:1.      First, the UI hits the API gateway.2.      Second, credentials are retrieved via IAM.3.      Third, the service is invoked via Lambda.4.      Fourth, the model credentials are retrieved via Secrets Manager.5.      Fift h, your model is invoked through either an API call to a serverless model SDK, or a custom model you’ve trained that is hosted on a SageMaker endpoint is invoked.6.      Sixth, look up the relevant conversation history in DynamoDB to ensure your answer is accurate.How does this chat interface ensure it’s not hallucinating answers? How does it point to a set of data stored in a database? Through retrieval augmented generation (RAG), which we will cover next.Using RAG to ensure high accuracy in LLM applicationsAs explained in the original 2020 (1) paper, RAG is a way to retrieve documents relevant to a given query. Imagine your chat application takes in a question about a specific item in your database, such as one of your products. Rather than having the model make up the answer, you’d be better off retrieving the right document from your database and simply using the LLM to stylize the response. That’s where RAG is so powerful; you can use it to ensure the accuracy of your generated answers stays high, while keeping the customer experience consistent in both style and tone. Let’s take a closer look:Figure 15.3 – RAGFirst, a question comes in from the left-hand side. In the top left , you can see a simple question, Defi ne “middle ear”. This is processed by a query encoder, which is simply a language model producing an embedding of the query. This embedding is then applied to the index of a database, with many candidate algorithms in use here: K Nearest Neighbors, Maximum Inner Product Search (MIPS), and others. Once you’ve retrieved a set of similar documents, you can feed the best ones into the generator, the final model on the right-hand side. This takes the input documents and returns a simple answer to the question. Here, the answer is The middle ear includes the tympanic cavity and the three ossicles.Interestingly, however, the LLM here doesn’t really define what the middle ear is. It’s actually answering the question, “what objects are contained within the middle ear?” Arguably, any definition of the middle ear would include its purpose, notably serving as a buffer between your ear canal and your inner ear, which helps you keep your balance and lets you hear. So, this would be a good candidate for expert reinforcement learning with human feedback, or RLHF, optimization.As shown in Figure 15.3, this entire RAG system is tunable. That means you can and should fine-tune the encoder and decoder aspects of the architecture to dial in model performance based on your datasets and query types. Another way to classify documents, as we’ll see, is generation!Is generation the new classification?As we learned in Chapter 13, Prompt Engineering, there are many ways you can push your language model to output the type of response you are looking for. One of these ways is actually to have it classify what it sees in the text! Here is a simple diagram to illustrate this concept:Figure 15.4 – Using generation in place of classificationAs you can see in the diagram, with the traditional classification you train the model ahead of time to perform one task: classification. This model may do well on classification, but it won’t be able to handle new tasks at all. This key drawback is one of the main reasons why foundation models, and especially large language models, are now so popular: they are extremely flexible and can handle many different tasks without needing to be retrained.On the right-hand side of Figure 15.4, you can see we’re using the same text as the starting point, but instead of passing it to an encoder-based text model, we’re passing it to a decoder-based model and simply adding the instruction to classify this sentence into positive or negative sentiment. You could just as easily say, “tell me more about how this customer really feels,” or “how optimistic is this home buyer?” or “help this homebuyer find a different house that meets their needs.” Arguably each of those three instructions is slightly different, veering away from pure classification and into more general application development or customer experience. Expect to see more of this over time! Let’s look at one more key technique for building applications with LLMs: keeping humans in the loop.Human-centered design for building applications with LLMsWe touched on this topic previously, in Chapter 2, Dataset Preparation: Part One, Chapter 10, FineTuning and Evaluating, Chapter 11, Detecting, Mitigating, and Monitoring Bias, and Chapter 14, MLOps for Vision and Language. Let me say this yet again; I believe that human labeling will become even more of a competitive advantage that companies can provide. Why? Building LLMs is now incredibly competitive; you have both the open source and proprietary sides actively competing for your business. Open source options are from the likes of Hugging Face and Stability, while proprietary offerings are from AI21, Anthropic, and OpenAI. The differences between these options are questionable; you can look up the latest models at the top of the leaderboard from Stanford’s HELM (2), which incidentally falls under their human-centered AI initiative. With enough fine-tuning and customization, you should generally be able to meet performance.What then determines the best LLM applications, if it’s not the foundation model? Obviously, the end-to-end customer experience is critical, and will always remain so. Consumer preferences wax and wane over time, but a few tenets remain for general technology: speed, simplicity, flexibility, and low cost. With foundation models we can clearly see that customers prefer explainability and models they can trust. This means that application designers and developers should grapple with these long-term consumer preferences, picking solutions and systems that maximize them. As you may have guessed, that alone is no small task.Beyond the core skill of designing and building successful applications, what else can we do to stay competitive in this brave new world of LLMs? I would argue that amounts to customizing your data. Focus on making your data and your datasets unique: singular in purpose, breadth, depth, and completeness. Lean into labeling your data with the best resources you can, and keep that a core part of your entire application workflow. This brings you to continuous learning, or the ability of the model to constantly get better and better based on signals from your end users.Next, let’s take a look at upcoming generative modalities.Other generative modalitiesSince the 2022 ChatGPT moment, most of the technical world has been fascinated by the proposition of generating novel content. While this was always somewhat interesting, the meeting of high-performance foundation models with an abundance of media euphoria over the capabilities, combined with a post-pandemic community with an extremely intense fear of missing out, has led us to the perfect storm of a global fixation on generative AI.Is this a good thing? Honestly, I’m happy to finally see the shift ; I’ve been working on generating content with AI/ML models in some fashion since at least 2019, and as a writer and creative person myself, I’ve always thought this was the most interesting part of machine learning. I was very impressed by David Foster’s book (3) on the topic. He’s just published an updated version of this to include the latest foundation models and methods! Let’s quickly recap some other types of modalities that are common in generative AI applications today.Generating code should be no surprise to most of you; its core similarities to language generation make it a perfect candidate! Fine-tuning an LLM to spit out code in your language of choice is pretty easy; here’s my 2019 project (4) doing exactly that with the SageMaker example notebooks! Is the code great? Absolutely not, but fortunately, LLMs have come a long way since then. Many modern code-generating models are excellent, and thanks to a collaboration between Hugging Face and ServiceNow we have an open-source model to use! This is called StarCoder and is available for free on HuggingFace right here: https://huggingface.co/bigcode/starcoder.What I love about using an open-source LLM for code generation is that you can customize it! This means you can point to your own private code repositories, tokenize the data, update the model, and immediately train this LLM to generate code in the style of your organization! At the organizational level, you might even do some continued pretraining on an open-source LLM for code generation on your own repositories to speed up all of your developers. We’ll take a look at more ways you can useLLMs to write your own code faster in the next section when we focus on AWS offerings, especially Amazon Code Whisperer. (27)The rest of the preceding content can all be great candidates for your own generative AI projects. Truly, just as we saw general machine learning moving from the science lab into the foundation of most businesses and projects, it’s likely that generative capabilities in some fashion will do the same.Does that mean engineering roles will be eliminated? Honestly, I doubt it. Just as the rise of great search engines didn’t eliminate software engineering roles but made them more fun and doable for a lot of people, I’m expecting generative capabilities to do the same. They are great at searching many possibilities and quickly finding great options, but it’s still up to you to know the ins and outs of your consumers, your product, and your design. Models aren’t great at critical thinking, but they are good at coming up with ideas and finding shortcomings, at least in words.Now that we’ve looked at other generative modalities at a very high level, let’s learn about AWS offerings for foundation models!AWS offerings in foundation modelsOn AWS, as you’ve seen throughout the book, you have literally hundreds of ways to optimize your foundation model development and operationalization. Let’s now look at a few ways AWS is explicitly investing to improve the customer experience in this domain:SageMaker JumpStart Foundation Model Hub: Announced in preview at re: Invent 2022, this is an option for pointing to foundation models nicely packaged in the SageMaker environment. This includes both open-source models such as BLOOM and Flan-T5 from Hugging Face, and proprietary models such as AI21 Jurassic. A list of all the foundation models is available here (5). To date, we have nearly 20 foundation models, all available for hosting in your own secure environments. Any data you use to interact with or fine-tune models on the Foundation Model Hub is not shared with providers. You can also optimize costs by selecting the instances yourself. We have tens of example notebooks pointing to these models for training and hosting across a wide variety of use cases available here (6) and elsewhere. For more information about the data the models were trained on, you can read about that in the playground directly.Amazon Bedrock: If you have been watching AWS news closely in early 2023, you may have noticed a new service we announced for foundation models: Amazon Bedrock! As discussed in this blog post (7) by Swami Sivasubramanian, Bedrock is a service that lets you interact with a variety of foundation models through a serverless interface that stays secure. Said another way, Bedrock provides a point of entry for multiple foundation models, letting you get the best of all possible providers. This includes AI start-ups such as AI21, Anthropic, and Stability. Interacting with Bedrock means invoking a serverless experience, saving you from dealing with the lower-level infrastructure. You can also fine-tune your models with Bedrock!Amazon Titan: Another model that will be available through Bedrock is Titan, a new large language model that’s fully trained and managed by Amazon! This means we handle the training data, optimizations, tuning, debiasing, and all enhancements for getting you results with large language models. Titan will also be available for fine-tuning.Amazon Code Whisperer: As you may have seen, Code Whisperer is an AWS service announced in 2022 and made generally available in 2023. Interestingly it seems to tightly couple with a given development environment, taking the entire context of the script you are writing and generating recommendations based on this. You can write pseudo-code, markdown, or other function starts, and using keyboard shortcuts invoke the model. This will send you a variety of options based on the context of your script, letting you ultimately select the script that makes the most sense for you! Happily, this is now supported for both Jupyter notebooks and SageMaker Studio; you can read more about these initiatives from AWS Sr Principal Technologist Brain Granger, co-founder of Project Jupyter. Here’s Brian’s blog post on the topic: https://aws.amazon.com/blogs/machine-learning/announcing-new-jupyter-contributions-by-aws-to-democratize-generative-ai-and-scale-ml-workloads/ Pro tip: Code Whisperer is free to individuals! Close readers of Swami’s blog post above will also notice updates to our latest ML infrastructure, like the second edition of the inferentia chip, inf2, and a trainium instance with more bandwidth, trn1n.Close readers of Swami’s blog post will also notice updates to our latest ML infrastructure, such as the second edition of the inferentia chip, inf2, and a Trainium instance with more bandwidth, trn1n. We also released our code generation service, CodeWhisperer, at no cost to you!ConclusionIn summary, the field of pretraining foundation models is filled with innovation. We have exciting advancements like LangChain and AWS's state-of-the-art solutions such as Amazon Bedrock and Titan, opening up vast possibilities in AI development. Open-source tools empower developers, and the focus on human-centered design remains crucial. As we embrace continuous learning and explore new generative methods, we anticipate significant progress in content creation and software development. By emphasizing customization, innovation, and responsiveness to user preferences, we stand on the cusp of fully unleashing the potential of foundation models, reshaping the landscape of AI applications. Keep an eye out for the thrilling journey ahead in the realm of AI.Author BioEmily Webber is a Principal Machine Learning Specialist Solutions Architect at Amazon Web Services. She has assisted hundreds of customers on their journey to ML in the cloud, specializing in distributed training for large language and vision models. She mentors Machine Learning Solution Architects, authors countless feature designs for SageMaker and AWS, and guides the Amazon SageMaker product and engineering teams on best practices in regards around machine learning and customers. Emily is widely known in the AWS community for a 16-video YouTube series featuring SageMaker with 160,000 views, plus a Keynote at O’Reilly AI London 2019 on a novel reinforcement learning approach she developed for public policy.
Read more
  • 0
  • 0
  • 118

article-image-chatgpt-for-interview-preparation
Jakov Semenski
14 Sep 2023
6 min read
Save for later

ChatGPT for Interview Preparation

Jakov Semenski
14 Sep 2023
6 min read
IntroductionDespite reading numerous forums and blogs, you feel unprepared for the technical questions that interviewers might throw at you. Isn't there a more efficient way?The countdown during a timed coding test.The watching eyes of an interviewer - make you blank out.Leading to regrets about not practicing in a more realistic setting.What if there was a tool available 24/7, that can not only answer your questions but also engage with you, much like an interviewer would?Enter ChatGPT, your personalized interview coach.Using ChatGPT for interview preparation is like having a personal trainer at the gym.While you can definitely exercise on your own, the trainer (or in this case, ChatGPT) guides you:It corrects your mistakes,Provides targeted exercises (solutions)Improve your weak areas.Let’s put ChatGPT to the test with a hands-on exercise:Setting up a mock interview for Senior Java RoleDoint theoretical and coding questionsGetting feedback for our answersBonus: We will also check if it can make my learning entertaining.Here are Little-Known Ways to Ace Your Dev Interview with ChatGPTStep 1: Set your levelChatGPT doesn't know your level, so we need to set proper expectationsStep 2: Setup rehears mock interviewNow, we need to let ChatGPT know that we are not interested in answers, but rather feedback.Here is the full prompt:I want to rehear for the senior java interview Currently, I am junior and have experience only with basics and helloworld examples don't show me the answer yet once I answer, check if I was correctHere is the result:ChatGPTOf course! Let's simulate a senior Java interview scenario. I'll provide questions that are often asked for a senior Java position. Try to answer them to the best of your ability, and then I'll provide feedback.Question 1: Explain the difference between ArrayList and LinkedList in Java.Go ahead and provide your answer when you're ready.UserI think arraylist uses dynamic array while linkedlists stores data in different objects and acts as a queueChatGPTYou've touched on the basics, but let's dive a bit deeper.ArrayList:Internal Structure: As you correctly mentioned, ArrayList uses a dynamic array to store its elements. This means its capacity grows automatically when we add more elements than its initial capacity.Access Time: Because of this structure, accessing an element by its index is fast, i.e., O(1).Insertion/Deletion Time: However, inserting or deleting an element from anywhere except the end of the list takes O(n) time on average, as it may require shifting elements.Memory Overhead: It's relatively low, as you're mostly dealing with the raw array and a few integer counters.LinkedList:Internal Structure: LinkedList uses a doubly-linked list. Each element (node) has a data part and two pointers, one pointing to the next element and the other to the previous element.Access Time: Getting to an element by index isn't as fast, since it requires walking through the list. It's O(n) in the worst case.Insertion/Deletion Time: However, once you have a reference to a node (say via an iterator), adding or removing is O(1), provided you don’t need to search for the node.Memory Overhead: Higher than ArrayList because of the two pointers with each data item.Now, one more nuanced point: ArrayList performs better in scenarios where you are frequently accessing elements via their indices and have fewer insertions/deletions. LinkedList is better suited for operations that involve frequent insertions or deletions from the beginning or middle of the list.You had a good start! Keep refining your understanding, and you'll be on your way to senior-level knowledge. Would you like to proceed with the next question?Step 3: Now, the coding testWe can now instruct ChatGPT to give us coding questiongive me one more question, this time coding questionChatGPTOf course! Here's a simpler coding problem:Question: Implement a method that takes an array of integers and returns the sum of all even numbers in the array.Method Signature:public int sumOfEvens(int[] numbers) { // your code here }Please go ahead and implement this, and I'll review your solution afterward!Userint sum = 0; for (int number : numbers) { if (number % 2 == 0) { sum += number; } } return sum;ChatGPTThat's correct! Your solution is concise and to the point.Here's the complete function for clarity: ….Step 4: Make it funTo spice up the interview a bit, let’s make chatGPT talk in a funny way so we can get some laughs along the wayPrompt:write every sentence in a speaking style of a fast-talking wise guy from 1940's gangster moviesChatGPTAlright, alright! I'll take it down a notch for you. No need to get all flustered.Question 3: Look here, champ, you ever heard of this thing called "Encapsulation"? It's one of those big fancy terms they throw around. Can you give me the lowdown on what it means in Java? Just the basics, ya know?Alright, give it your best shot! And remember, keep your chin up, kid!UserIt's basically means binding variables and methods together into a single unit and preventing them from being accessed by other classesChatGPTHey, not too shabby, kid! You've got the right idea. Encapsulation, in Java and many other object-oriented languages, is about keeping the internal states or data of an object private and only allowing modification through public methods. It's a way to protect the integrity of the data and control how it's accessed or modified. It's like having a vault, see? You don’t just let anyone waltz in and take whatever they want. You got guards, and specific ways to access what's inside.Good job, kiddo! Anything else you wanna test your chops on, or are we square?You can find the full chat on this link:https://chat.openai.com/share/b9f8af45-1187-4fa9-a884-4565f37d2884LimitationsKeep in mind that ChatGPT is not always correct, even though it can sound convincing.If you are unsure, always double-check your solution with a compiler for coding answers.ConclusionChatGPT is your ticket to mastering technical interviews.Using ChatGPT ensures you get direct, concise, and often up-to-date explanations on specific problems, making your interview preparation more efficient.Engage, learn, and get instant feedback to always stay a step ahead.As the next step, try it yourself and practice.Remember:Inform ChatGPT about your current skill level.If you are not happy with the questions or topic, give ChatGPT more context.Lastly, make it more fun.Author BioJakov Semenski is an IT Architect working at IBMiX with almost 20 years of experience.He is also a ChatGPT Speaker at the WeAreDevelopers conference and shares valuable tech stories on LinkedIn.
Read more
  • 0
  • 0
  • 104

article-image-exploring-the-roles-in-building-azure-ai-solutions
Olivier Mertens, Breght Van Baelen
13 Sep 2023
19 min read
Save for later

Exploring the Roles in Building Azure AI Solutions

Olivier Mertens, Breght Van Baelen
13 Sep 2023
19 min read
This article is an excerpt from the book, Azure Data and AI Architect Handbook, by Olivier Mertens and Breght Van Baelen. Master core data architecture design concepts and Azure Data & AI services to gain a cloud data and AI architect’s perspective to developing end-to-end solutionsIntroductionArtificial Intelligence (AI) is transforming businesses across various industries rapidly. Especially w ith the surge in popularity of large language models such as ChatGPT, AI adoption is increasing exponentially. Microsoft Azure provides a wide range of AI services to help organizations build powerful AI solutions. In this chapter, we will explore the different AI services available on Azure, as well as the roles involved in building AI solutions, and the steps required to design, develop, and deploy AI models on Azure.Specifically, we will cover the following:The different roles involved in building AI solutionsThe questions a data architect should ask when designing an AI solutionBy the end of this article, you will have a good understanding of the role of the data architect in the world of data science. Additionally, you will have a high-level overview of what the data scientists and machine learning engineers are responsible for. Knowing the roles in data scienceThe Azure cloud offers an extensive range of services for use in advanced analytics and data science. Before we dive into these, it is crucial to understand the different roles in the data science ecosystem. In previous chapters, while always looking through the lens of a data architect, we saw workloads that are typically operationalized by data engineers, database administrators, and data analysts.Up until now, the chapters followed the journey of data through a data platform, from ingestion to raw storage to transformation, data warehousing, and eventually, visualization and dashboarding. The advanced analytics component is more separated from the entire solution, in the sense that most data architectures can perform perfectly without it. This does not take away from the fact that adding advanced analytics such as machine learning predictions can be a valuable enhancement to a  solution.The environment for advanced analytics introduces some new roles. The most prominent are the data scientist and the machine learning engineer, which we will look at in a bit more detail, starting with the following figure. Other profiles include roles such as data labelers and citizen data scientists.Figure 9.1 – An overview of the core components that each data role works withFigure 9.1 shows a very simplified data solution with a machine learning component attached to it. This consists of a workspace to build and train machine learning models and virtual machine clusters to deploy them in production.The data scientist is responsible for building and training the machine learning model. This is done through experimenting with data, most of the time stemming from the data lake. The data scientist will often use data from the bronze or silver tier in the data lake (i.e., the raw or semi-processed data). Data in the gold tier or the data warehouse is often transformed and aggregated in ways that make it convenient for business users to build reports with. However, the data scientist might want to perform different kinds of transformations, which focus more on the statistical relevance of certain features within the data to optimize the training performance of a machine learning model. Regardless, in some cases, data scientists will still interact with the gold layer and the data warehouse to pull clean data for experimentation.Using this data, data scientists will perform exploratory data analysis (EDA) to get initial insights into the dataset. This is followed by data cleaning and feature engineering, where features are transformed or new features are derived to serve as input for the machine learning model. Next up, a model is trained and evaluated, resulting in a first prototype. The experimentation does not stop here, however, as machine learning models have hyperparameters that can be adjusted, which might lead to increased performance, while still using the same dataset. This last process is called hyperparameter tuning. Once this is completed, we will arrive at the cutoff point between the responsibilities of a data scientist and a machine learning engineer.The machine learning engineer is responsible for the machine learning operations, often referred to as MLOps. Depending on the exact definition, this usually encompasses the later stages of the machine learning model life cycle. The machine learning engineer receives the finished model from the data scientist and creates a deployment for it. This will make the model available through an API so that it can be consumed by applications and users. In later stages, the model will need to be monitored and periodically retrained, until the end of its life cycle. This is a brief summary, but the MLOps process will be explained in more detail further in this chapter.Next, Figure 9.2 provides an overview of the processes that take place in the MLOps cycle and who the primary contributor to each step is.Figure 9.2 – The steps of the data science workflow and their executorsFinally, what we are most interested in is the role of the cloud data architect in this environment. First, the architect has to think about the overall AI approach, part of which is deciding whether to go for custom development or not. We will dive deeper into strategy soon.If custom machine learning model development is involved, the architect will have to decide on a data science environment, or workspace, where the data scientists can experiment.However, the architect will have more involvement in the work of a machine learning engineer. The optimal working of MLOps is considerably more dependent on good architectural design than the typical prototyping done by data scientists. Here, the architect is responsible for deciding on deployment infrastructure, choosing the right monitoring solutions, version control for models, datasets, code, retraining strategies, and so on.A lot of the value that an architect brings to machine learning projects comes from design choices outside of the data science suite. The data architect can greatly facilitate the work of data scientists by envisioning efficient data storing structures at the data lake level, with a strong focus on silver (and bronze) tiers with good data quality. Often, extra pipelines are required to get labeled data ready to be picked up by the data scientists. Designing AI solutionsIn this part, we will talk about the design of AI solutions, including qualification, strategy, and the responsible use of AI. Infusing AI into architecture has to be the result of some strategic consideration. The data architect should ask themself a series of questions, and find a substantiated answer, to end up with an optimal architecture.The first set of questions is regarding the qualification of a use case. Is AI the right solution?This can be further related to the necessity of an inductive solution, compared to a deductive one. Business rulesets are deductive; machine learning is inductive. Business rules will provide you with a solid answer if the condition for that rule is met. Machine learning models will provide you with answers that have a high probability but not certain ones.The big advantage of machine learning is its ability to cover cases in a much more granular manner,  whereas business rules must group various cases within a single condition so as to not end up with an absurd or even impossible number of rules. Look at image recognition, for example. Trying to make a rule set for every possible combination of pixels that might represent a human is simply impossible. Knowing this, evaluate the proposed use case and confirm that the usage (and correlating costs) of AI is justified for this solution.Do we opt for pre-trained models or a custom model?Although this question is more focused on implementation than qualification, it is crucial t o answer it fi rst, as this will directly impact the following two questions. As with most things in the broader field of IT, it comes down to not reinventing the wheel. Does your use case sound like something generic or industry-agnostic? Th en there are probably existing machine learning models, often with far superior performance (general knowledge-wise) than your own data could train a  model to have. Companies such as Microsoft and partners such as OpenAI invest heavily in getting these pre-trained models to cutting-edge standards.It may be that the solution you want to create is fairly generic, but there are certain aspects that make it a bit more niche. An example could be a text analytics model in the medical industry. Text analytics models are great at the general skill of language understanding, but they might have some issues with grasping the essence of industry-specific language out of the box. In this case, an organization can provide some of its own data to fine-tune the model to increase its performance on niche tasks, while maintaining most of the general knowledge from its initial training dataset. Most of the pre-trained AI models on Azure, which reside in Azure Cognitive Services and Azure OpenAI Service, are fine tuneable. When out-of-the-box models are not an option, then we need to look at custom development. Is data available?If we opt for custom development, we will need to bring our own data. The same goes for wanting to fine-tune an existing model, yet to a lesser extent. Is the data that we need available? Does an organization have a significant volume of historical data stored already in a central location? If this data is still spread across multiple platforms or sources, then this might indicate it is not the right time to implement AI. It would be more valuable to focus on increased data engineering efforts in this situation. In the case of machine learning on Azure, data is ideally stored in tiers in Azure Data Lake Storage.Keep in mind that machine learning model training does not stop after putting it into production. Th e performance of the production model will be constantly monitored, and if it starts to drift over time, retraining will take place. Do the sources of our current historical data still generate an adequate volume of data to carry out retraining?In terms of data volume, there is still a common misunderstanding that large volumes of data are a necessity for any high-performant model. It’s key to know here that even though the performance of a model still scales with the amount of training data, more and more new techniques have been developed to allow for valuable performance levels to be reached with a limited data volume. Is the data of acceptable quality?Just like the last question, this only counts for custom development or fine-tuning. Data quality between sources can differ immensely. There are different ways in which data can be of bad quality. Some issues can be solved easily; others can be astonishingly hard. Some examples of poor data quality are as follows:Inaccurate data: This occurs when data is incorrect or contains errors, such as typos or missing values. This is not easy to solve and will often result in fixes required at the source.Incomplete data: This occurs when data is missing important information or lacks the necessary details to be useful. In some cases, data scientists can use statistics to impute missing data. In other cases, it might depend on the specific model that is being developed. Certain algorithms can perform well with sparse data, while others are heavily affected by it. Knowing which exact algorithms should not be in the scope of the architect but, rather, the data scientists. Outdated data: This occurs when data is no longer relevant or useful due to changes in circumstances or the passage of time. If this data is statistically dissimilar to data generated in the present, it is better to remove this data from the training dataset.Duplicated data: This occurs when the same data is entered multiple times in different places, leading to inconsistencies and confusion. Luckily, this is one of the easiest data quality issues to solve. Biased data: This occurs when data is influenced by personal biases or prejudices, leading to inaccurate or unfair conclusions. This can be notoriously hard to solve and is a well-known issue in the data science world. We will come back to this later when discussing responsible AI.This concludes the qualifying questions on whether to implement AI or not. There is one more important topic, namely the return on investment (ROI) of the addition, but to calculate the investment, we need to have more knowledge on the exact implementation. This will be the focus of the next set of questions. Low code or code first?The answer to which approach should be chosen depends on people, their skill sets, and the complexity of the use case. In the vast majority of cases, code-fi first solutions are preferred, as it comes with considerably more flexibility and versatility. Low code simplifies development a lot, often by providing drag and drop interfaces to create workflows (or, in this case, machine learning pipelines). While low-code solutions often benefit from rapid development, this advantage in speed is slowly shrinking. Due to advancements in libraries and packages, generic code-fi first models are also being developed in a shorter amount of time than before.While code-first solutions cover a much broader set of use cases, they are simply not possible for every organization. Data scientists tend to be an expensive resource and are often fought over , with competition due to a lack of them in the labor market. Luckily, low-code platforms are advancing fast to address this issue. This allows citizen data scientists (non-professionals) to create and train machine learning models easily, although it will still yield inferior performance compared to professional code-first development.As a rule of thumb, if a professional data science team is present and it has already been decided that custom development is the way forward, choose a code-fi rst solution. What are the requirements for the AI model?Now, we will dive deeper into the technicalities of machine learning models. Note that not all answers here must come from the data architect. It is certainly a plus if the architect can think about things such as model selection with the data scientists, but it is not expected of the role. Leave it to the data science and machine learning team to have a clear understanding of the technical requirements for the AI model and allow them to leverage their expertise.The minimum accepted performance is probably the most straightforward. This is a defined threshold on the primary metric of a model, based on what is justifiable for the use case to progress. For instance, a model might need to have a minimum accuracy of 95% to be economically viable and continue toward production.Next, latency is an important requirement when the model is used to make real-time predictions. The larger the model and the more calculations that need to happen (not counting parallelism), the longer it will take to make a prediction. Some use cases will require a prediction latency within milliseconds, which can be solved with lightweight model selection and specialized infrastructure.Another requirement is the size of the model, which directly relates to the hosting costs when deployed into production, as the model will have to be loaded in RAM while the deployment runs. This is mostly a very binding requirement for IoT Edge use cases, where AI models are deployed on a small IoT device and make predictions locally before sending their results to the cloud. These devices often have very limited memory, and the data science team will have to figure out what the most efficient model is to fit on the device.With the recently growing adoption of large language models (LLMs), such as the GPT-model family, power consumption has started to become an increasingly important topic as well. Years ago, this was a negligible topic in most use cases, but with the massive size of today’s cutting-edge models, it is unavoidable. Whether these models are hosted privately or in the cloud, power consumption will be an incurred cost directly or indirectly. For natural language use cases specifically, consider whether the traditional (and significantly cheaper) text analytics models in Azure Cognitive Services can do the job at an acceptable level before heading straight for LLMs. Batch or real-time inferencing?When a model is finished and ready for deployment, the architect will have to decide on the type of deployment. On a high level, we should decide whether the model will be used for either batch scoring or predicting in real-time.Typically, when machine learning predictions are used to enrich data, which is already being batch processed in an OLAP scenario, the machine learning model can do periodical inferencing on large batches. The model will then be incorporated as an extra transformation step in the ETL pipeline. When using machine learning models in applications, for example, where users expect an instant prediction, real-time endpoints are required.When deploying our model to an endpoint, the architecture might differ based on the type of inferencing, which we will look into in more depth later in this chapter. Is explainability required?Explainable AI, often referred to as XAI, has been on the rise for quite a while now. For traditional machine learning models, it was straightforward to figure out why a model came to which conclusion, through statistical methods such as feature importance. With the rise of deep learning models, which are essentially black-box models, we come across more and more predictions that cannot be explained.Techniques have been developed to make an approximation of the decision-making process of a black box model. For instance, in the case of the mimic explainer, a traditional (and by nature interpretable) machine learning model is trained to mimic the black-box model and extract things, such as feature importance, from the mimic model. However, this is still an approximation and no guarantee.Therefore, it is key to figure out how crucial explainability is for the use case. In cases that (heavily) affect humans, such as predicting credit scoring using AI, interpretability is a must. In cases with minimal or no impact on human lives, interpretability is more of a nice-to-have. In this instance, we can opt for a black-box model if this provides increased predictive performance. What is the expected ROI?When the qualifying questions have been answered and decisions have been made to fulfill technical requirements, we should have sufficient information to calculate an estimated ROI. This will be the final exercise before giving the green light to start implementation, or at least the development of a proof of concept.If we know what approach to use, what kind of models to train, and which type of deployment to leverage, we can start mapping it to the right Azure service and perform a cost calculation. This is compared to the expected added value of a machine learning model.Optimal performance of a machine learning modelAs a side note to calculating the ROI, we need to have an idea of what the optimal performance level of a machine learning model is. This is where the academic and corporate worlds tend to differ. Academics focus on reaching the highest performance levels possible, whereas businesses will focus on the most efficient ratio between costs and performance. It might not make sense for a business to invest largely in a few percent increase in performance if this marginal increase is not justified by bringing adequate value to compensate.ConclusionThis article is focused on data science and AI on Azure. We started by outlining the different roles involved in a data science team, including the responsibilities of data architects, engineers, scientists, and machine learning engineers, and how the collaboration between these roles is key to building successful AI solutions.We then focused on the role of the data architect when designing an AI solution, outlining the questions they should ask themselves for a well-architected design.Author BioOlivier Mertens is a cloud solution architect for Azure data and AI at Microsoft, based in Dublin, Ireland. In this role, he assisted organizations in designing their enterprise-scale data platforms and analytical workloads. Next to his role as an architect, Olivier leads the technical AI expertise for Microsoft EMEA in the corporate market. This includes leading knowledge sharing and internal upskilling, as well as solving highly complex or strategic customer AI cases. Before his time at Microsoft, he worked as a data scientist at a Microsoft partner in Belgium.Olivier is a lecturer for generative AI and AI solution architectures, a keynote speaker for AI, and holds a master’s degree in information management, a postgraduate degree as an AI business architect, and a bachelor’s degree in business management.Breght Van Baelen is a Microsoft employee based in Dublin, Ireland, and works as a cloud solution architect for the data and AI pillar in Azure. He provides guidance to organizations building large-scale analytical platforms and data solutions. In addition, Breght was chosen as an advanced cloud expert for Power BI and is responsible for providing technical expertise in Europe, the Middle East, and Africa. Before his time at Microsoft, he worked as a data consultant at Microsoft Gold Partners in Belgium.Breght led a team of eight data and AI consultants as a data science lead. Breght holds a master’s degree in computer science from KU Leuven, specializing in AI. He also holds a bachelor’s degree in computer science from the University of Hasselt.
Read more
  • 0
  • 0
  • 143
article-image-generate-google-doc-summaries-using-palm-api-and-google-apps-script
Aryan Irani
13 Sep 2023
8 min read
Save for later

Generate Google Doc summaries using PaLM API and Google Apps Script

Aryan Irani
13 Sep 2023
8 min read
IntroductionIn this article, we'll delve into the powerful synergy of the PaLM API and Google Apps Script, unveiling a seamless way to generate concise summaries for your Google Docs. Say goodbye to manual summarization and embrace efficiency as we guide you through the process of simplifying your document management tasks. Let's embark on this journey to streamline your Google Doc summaries and enhance your productivity.Sample Google DocFor this blog, we will be using a very simple Google Doc that contains a paragraph for which we want to generate a summary for. If you want to work with the Google Docs, click here. Once you make a copy of the Google Doc you have to go ahead and change the API key in the Google Apps Script code. Step1: Get the API keyCurrently, PaLM API hasn’t been released for public use but to access it before everybody does, you can apply for the waitlist by clicking here. If you want to know more about the process of applying for MakerSuite and PaLM API, you can check the YouTube tutorial here.Once you have access, to get the API key, we have to go to MakerSuite and go to the Get API key section. To get the API key, follow these steps:1. Go to MakerSuite or click here.2. On opening the MakerSuite you will see something like this3. To get the API key go ahead and click on Get API key on the left side of the page.4. On clicking Get API key, you will see something like this where you can create your API key.5. To create the API key go ahead and click on Create API key in new project.On clicking Create API Key, in a few seconds, you will be able to copy the API key.Step 2: Write the Automation ScriptWhile you are in the Google Docs, let’s open up the Script Editor to write some Google Apps Script. To open the Script Editor, follow these steps:1. Click on Extensions and open the Script Editor.2. This brings up the Script Editor as shown below.We have reached the script editor lets code.Now that we have the Google Doc setup and the API key ready, let’s go ahead and write our Google Apps Script code to get the summary for the paragraph in the Google Doc. function onOpen(){ var ui = DocumentApp.getUi(); ui.createMenu('Custom Menu')     .addItem('Summarize Selected Paragraph', 'summarizeSelectedParagraph')     .addToUi();   }We are going to start out by creating our own custom menu using which we can select the paragraph we want to summarize and run the code. To do that we are going to start out by opening a new function called onOpen(). On opening the function we are going to create a menu using the create.Menu() function, inside which we will be passing the name of the menu. After that, we assign some text to the name followed by the function you want to run when the menu is clicked. function DocSummary(paragraph){ var apiKey = "your_api_key"; var apiUrl = "https://generativelanguage.googleapis.com/v1beta2/models/text-bison-001:generateText";We start out by opening a new function BARD() inside which we will declare the API key that we just copied. After declaring the API key we go ahead and declare the API endpoint that is provided in the PaLM API documentation. You can check out the documentation by checking out the link given below.We are going to be receiving the prompt from the Google Doc from the BARD function that we just created.Generative Language API | PaLM API | Generative AI for DevelopersThe PaLM API allows developers to build generative AI applications using the PaLM model. Large Language Models (LLMs)…developers.generativeai.googl var url = apiUrl + "?key=" + apiKey var headers = {   "Content-Type": "application/json" } var prompt = {   'text': "Please generate a short summary for :\n" + paragraph } var requestBody = {   "prompt": prompt }Here we create a new variable called url inside which we combine the API URL and the API key, resulting in a complete URL that includes the API key as a parameter. The headers specify the type of data that will be sent in the request which in this case is “application/json”.Now we come to the most important part of the code which is declaring the prompt. For this blog, we will be asking Bard to summarize a paragraph followed by the paragraph present in the Google Doc. All of this will be stored in the prompt variable. Now that we have the prompt ready, we create an object that will contain this prompt that will be sent in the request to the API. var options = {   "method": "POST",   "headers": headers,   "payload": JSON.stringify(requestBody) }Now that we have everything ready, its time to define the parameters for the HTTP request that will be sent to the PaLM API endpoint. We start out by declaring the method parameter which is set to POST which indicates that the request will be sending data to the API.The headers parameter contains the header object that we declared a while back. Finally, the payload parameter is used to specify the data that will be sent in the request.These options are now passed as an argument to the UrlFetchApp.fetch function which sends the request to the PaLM API endpoint, and returns the response that contains the AI generated text.var response = UrlFetchApp.fetch(url,options); var data = JSON.parse(response.getContentText()); return data.candidates[0].output; }In this case, we just have to pass the url and options variable inside the UrlFetchApp.fetch function. Now that we have sent a request to the PaLM API endpoint we get a response back. In order to get an exact response we are going to be parsing the data.The getContentText() function is used to extract the text content from the response object. Since the response is in JSON format, we use the JSON.parse function to convert the JSON string into an object.The parsed data is then passed to the final variable output, inside which we get the first response out of multiple other drafts that Bard generates for us. On getting the first response we just return the output. function summarizeSelectedParagraph(){ var selection = DocumentApp.getActiveDocument().getSelection(); var text = selection.getRangeElements()[0].getElement().getText(); var summary = DocSummary(text); Now that we have the summary function ready and good to go, we will now go ahead and open the function that will be interacting with the Google Doc. We want the summary to be generated for the paragraph that the user selects. To do that we are going to get the selected text from the Google Doc using the getSelection() function. Once we get the selected text we go ahead and get the text using the .getText() function. To generate the summary using Google Bard we pass the text in the DocSummary() function. DocumentApp.getActiveDocument().getBody().appendParagraph("Summary"); DocumentApp.getActiveDocument().getBody().appendParagraph(summary) }Now that we have the summary for the selected text, it's time to append the paragraph back into the Google Doc. To do that we are going to be using the appendParagraph() function inside which we will pass the summary variable. Just to divide the summary from the original paragraph we append an additional line that says “Summary”. Our code is complete and good to go.Step 3: Check the outputIt's time to check the output and see if the code is working according to what we expected. To do that go ahead and save your code and run the OnOpen() function. This will create the menu that we can select and generate the summary for the paragraph.On running the code you should get an output like this in the Execution Log.On running the onOpen() function the custom menu has been created in the Google Doc successfully.To generate the summary in the Google Doc, follow the steps.1. Select the paragraph you want to generate the summary for.2. Once you select the paragraph go ahead and click on the custom menu and click on Summarise Selected paragraph.3. On clicking the option, you will see that the code will generate a summary for the paragraph we selected.Here you can see the summary for the paragraph has been generated in the Google Doc successfully.ConclusionIn this blog, we walked through the process of how we can access the PaLM API to integrate Google Bard inside of a Google Doc using Google Apps Script. The integration of Google Bard and Google Apps Script empowers users to generate summaries of paragraphs in Google Docs effortlessly.You can get the code from the GitHub link given below. Google-Apps-Script/BlogSummaryPaLM.js at master · aryanirani123/Google-Apps-ScriptCollection of Google Apps Script Automation scripts written and compiled by Aryan Irani. …github.comAuthor BioAryan Irani is a Google Developer Expert for Google Workspace. He is a writer and content creator who has been working in the Google Workspace domain for three years. He has extensive experience in the area, having published 100 technical articles on Google Apps Script, Google Workspace Tools, and Google APIs.Website
Read more
  • 0
  • 0
  • 200

article-image-using-llm-chains-in-rust
Alan Bernardo Palacio
12 Sep 2023
9 min read
Save for later

Using LLM Chains in Rust

Alan Bernardo Palacio
12 Sep 2023
9 min read
IntroductionThe llm-chain is a Rust library designed to make your experience with large language models (LLMs) smoother and more powerful. In this tutorial, we'll walk you through the steps of installing Rust, setting up a new project, and getting started with the versatile capabilities of LLM-Chain.This guide will break down the process step by step, using simple language, so you can confidently explore the potential of LLM-Chain in your projects.InstallationBefore we dive into the exciting world of LLM-Chain, let's start with the basics. To begin, you'll need to install Rust on your computer. By using the official Rust toolchain manager called rustup you can ensure you have the latest version and easily manage your installations. We recommend having Rust version 1.65.0 or higher. If you encounter errors related to unstable features or dependencies requiring a newer Rust version, simply update your Rust version. Just follow the instructions provided on the rustup website to get Rust up and running.With Rust now installed on your machine, let's set up a new project. This step is essential to create an organized space for your work with LLM-Chain. To do this, you'll use a simple command-line instruction. Open up your terminal and run the following command:cargo new --bin my-llm-projectBy executing this command, a new directory named "my-llm-project" will be created. This directory contains all the necessary files and folders for a Rust project.Embracing the Power of LLM-ChainNow that you have your Rust project folder ready, it's time to integrate the capabilities of LLM-Chain. This library simplifies your interaction with LLMs and empowers you to create remarkable applications. Adding LLM-Chain to your project is a breeze. Navigate to your project directory by using the terminal and run the following command:cd my-llm-project cargo add llm-chainBy running this command, LLM-Chain will become a part of your project, and the configuration will be recorded in the "Cargo.toml" file.LLM-Chain offers flexibility by supporting multiple drivers for different LLMs. For the purpose of simplicity and a quick start, we'll be using the OpenAI driver in this tutorial. You'll have the choice between the LLAMA driver, which runs a LLaMA LLM on your machine, and the OpenAI driver, which connects to the OpenAI API.To choose the OpenAI driver, execute this command:cargo add llm-chain-openaiIn the next section, we'll explore generating your very first LLM output using the OpenAI driver. So, let's move on to exploring sequential chains with Rust and uncovering the possibilities they hold with LLM-Chain.Exploring Sequential Chains with RustIn the realm of LLM-Chain, sequential chains empower you to orchestrate a sequence of steps where the output of each step seamlessly flows into the next. This hands-on section serves as your guide to crafting a sequential chain, expanding its capabilities with additional steps, and gaining insights into best practices and tips that ensure your success.Let's kick things off by preparing our project environment:As we delve into creating sequential chains, one crucial prerequisite is the installation of tokio in your project. While this tutorial uses the full tokio package crate, remember that in production scenarios, it's recommended to be more selective about which features you install. To set the stage, run the following command in your terminal:cargo add tokio --features fullThis step ensures that your project is equipped with the necessary tools to handle the intricate tasks of sequential chains. Before we continue, ensure that you've set your OpenAI API key in the OPENAI_API_KEY environment variable. Here's how:export OPENAI_API_KEY="YOUR_OPEN_AI_KEY"With your environment ready, let’s look at the full implementation code. In this case, we will be implementing the use of Chains to generate recommendations of cities to travel to, formatting them, and organizing the results throughout a series of steps:use llm_chain::parameters; use llm_chain::step::Step; use llm_chain::traits::Executor as ExecutorTrait; use llm_chain::{chains::sequential::Chain, prompt}; use llm_chain_openai::chatgpt::Executor; #[tokio::main(flavor = "current_thread")] async fn main() -> Result<(), Box<dyn std::error::Error>> {    // Create a new ChatGPT executor with default settings    let exec = Executor::new()?;    // Create a chain of steps with two prompts    let chain: Chain = Chain::new(vec![        // First step: Craft a personalized birthday email        Step::for_prompt_template(            prompt!("You are a bot for travel assistance research",                "Find good places to visit in this city {{city}} in this country {{country}}. Include their name")        ),        // Second step: Condense the email into a tweet. Notably, the text parameter takes the output of the previous prompt.        Step::for_prompt_template(            prompt!(                "You are an assistant for managing social media accounts for a travel company",                "Format the information into 5 bullet points for the most relevant places. \\\\n--\\\\n{{text}}")        ),        // Third step: Summarize the email into a LinkedIn post for the company page, and sprinkle in some emojis for flair.        Step::for_prompt_template(            prompt!(                "You are an assistant for managing social media accounts for a travel company",                "Summarize this email into a LinkedIn post for the company page, and feel free to use emojis! \\\\n--\\\\n{{text}}")        )    ]);    // Execute the chain with provided parameters    let result = chain        .run(            // Create a Parameters object with key-value pairs for the placeholders            parameters!("city" => "Rome", "country" => "Italy"),            &exec,        )        .await        .unwrap();    // Display the result on the console    println!("{}", result.to_immediate().await?.as_content());    Ok(()) }The provided code initiates a multi-step process using the llm_chain and llm_chain_openai libraries. First, it sets up a ChatGPT executor with default configurations. Next, it creates a chain of sequential steps, each designed to produce specific text outputs. The first step involves crafting a personalized travel recommendation, which includes information about places to visit in a particular city and country, with a Parameters object containing key-value pairs for placeholders like {{city}} and {{country}}. The second step condenses this email into a tweet, formatting the information into five bullet points and utilizing the text output from the previous step. Lastly, the third step summarizes the email into a LinkedIn post for a travel company's page, adding emojis for extra appeal.The chain is executed with specified parameters, creating a Parameters object with key-value pairs for placeholders like "city" (set to "Rome") and "country" (set to "Italy"). The generated content is then displayed on the console. This code represents a structured workflow for generating travel-related content using ChatGPT.Running the CodeNow, it's time to compile the code and run the code. Execute the following command in your terminal:cargo runAs the code executes, the sequential chain orchestrates the different prompts, generating content that flows through each step.We can see the results of the model as a bulleted list of travel recommendations.ConclusionThe llm-chain Rust library serves as your gateway to accessing large language models (LLMs) within the Rust programming language. This tutorial has been your guide to uncovering the fundamental steps necessary to harness the versatile capabilities of LLM-Chain.We began with the foundational elements, guiding you through the process of installing Rust and integrating llm-chain into your project using Cargo. We then delved into the practical application of LLM-Chain by configuring it with the OpenAI driver, emphasizing the use of sequential chains. This approach empowers you to construct sequences of steps, where each step's output seamlessly feeds into the next. As a practical example, we demonstrated how to create a travel recommendation engine capable of generating concise posts for various destinations, suitable for sharing on LinkedIn.It's important to note that LLM-Chain offers even more possibilities for exploration. You can extend its capabilities by incorporating CPP models like Llama, or you can venture into the realm of map-reduce chains. With this powerful tool at your disposal, the potential for creative and practical applications is virtually limitless. Feel free to continue your exploration and unlock the full potential of LLM-Chain in your projects. See you in the next article.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 0
  • 0
  • 323

article-image-weaviate-and-pyspark-for-llm-similarity-search
Alan Bernardo Palacio
12 Sep 2023
12 min read
Save for later

Weaviate and PySpark for LLM Similarity Search

Alan Bernardo Palacio
12 Sep 2023
12 min read
IntroductionWeaviate is gaining popularity as a semantic graph database, while PySpark is a well-established data processing framework used for handling large datasets efficiently.The integration of Weaviate and Spark enables the processing of large volumes of data, which can be stored in unstructured blob storages like S3. This integration allows for batch processing to structure the data to suit specific requirements. Subsequently, it empowers users to perform similarity searches and build contexts for applications based on Large Language Models (LLMs).In this article, we will explore how to integrate Weaviate and PySpark, with a particular emphasis on leveraging their capabilities for similarity searches using Large Language Models (LLMs).Before we delve into the integration of Weaviate and PySpark, let's start with a brief overview. We will begin by seamlessly importing a subset of the Sphere dataset, which contains a substantial 100k lines of data, into our newly initiated Spark Session. This dataset will provide valuable insights and nuances, enhancing our understanding of the collaboration between Weaviate and PySpark. Let's get started.Preparing the Docker Compose EnvironmentBefore we delve into the integration of Weaviate and PySpark, let's take a closer look at the components we'll be working with. In this scenario, we will utilize Docker Compose to deploy Spark, Jupyter, Weaviate, and the Transformers container in a local environment. The Transformers container will be instrumental in creating embeddings.To get started, we'll walk you through the process of setting up the Docker Compose environment, making it conducive for seamlessly integrating Weaviate and PySpark.version: '3' services: spark-master:    image: bitnami/spark:latest    hostname: spark-master    environment:      - INIT_DAEMON_STEP=setup_spark jupyter:    build: .    ports:      - "8888:8888"    volumes:      - ./local_lake:/home/jovyan/work      - ./notebooks:/home/jovyan/    depends_on:      - spark-master    command: "start-notebook.sh --NotebookApp.token='' --NotebookApp.password=''" weaviate:    image: semitechnologies/weaviate:latest    restart: on-failure:0    environment:      QUERY_DEFAULTS_LIMIT: 20      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'      PERSISTENCE_DATA_PATH: "./data"      DEFAULT_VECTORIZER_MODULE: text2vec-transformers      ENABLE_MODULES: text2vec-transformers      TRANSFORMERS_INFERENCE_API: <http://t2v-transformers:8080>      CLUSTER_HOSTNAME: 'node1' t2v-transformers:    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1    environment:      ENABLE_CUDA: 0 # set to 1 to enable      # NVIDIA_VISIBLE_DEVICES: all # enable if running with CUDA volumes: myvol:This Docker Compose configuration sets up a few different services:spark-master: This service uses the latest Bitnami Spark image. It sets the hostname to "spark-master" and defines an environment variable for initialization.jupyter: This service is built from the current directory and exposes port 8888. It also sets up volumes to link the local "local_lake" directory to the working directory inside the container and the "notebooks" directory to the home directory of the container. It depends on the "spark-master" service and runs a command to start Jupyter Notebook with certain configurations.weaviate: This service uses the latest Weaviate image. It specifies some environment variables for configuration, like setting query defaults, enabling anonymous access, defining data persistence paths, and configuring vectorizers.t2v-transformers: This service uses a specific image for transformers vector embedding creation. It also sets environment variables, including one for enabling CUDA if needed.Additionally, there is a volume defined named "myvol" for potential data storage. This Docker Compose configuration essentially sets up an environment where Spark, Jupyter, Weaviate, and Transformers can work together, each with its specific configuration and dependencies.Enabling Seamless Integration with the Spark ConnectorThe way in which Spark and Weaviate work together is through the Spark Connector. This connector serves as a bridge, allowing data to flow from Spark to Weaviate. It's especially important for tasks like Extract, Transform, Load (ETL) processes, where it allows to processing of the data with Spark and then populating Weaviate vector databases. One of its key features is its ability to automatically figure out the correct data type in Spark based on your Weaviate schema, making data transfer more straightforward. Another feature is that you can choose to vectorize data as you send it to Weaviate, or you can provide existing vectors. By default, Weaviate generates document IDs for new documents, but you can also supply your own IDs within the data frame. These capabilities can all be configured as options within the Spark Connector.To start integrating Spark and Weaviate, you'll need to install two important components: the weaviate-client Python package and the essential PySpark framework. You can easily get these dependencies by running the following command with pip3:pip3 install pyspark weaviate-clientTo get the Weaviate Spark Connector, we can execute the following command in your terminal, which will download the JAR file that is used by the Spark Session:curl <https://github.com/weaviate/spark-connector/releases/download/v1.2.8/spark-connector-assembly-1.2.8.jar> --output spark-connector-assembly-1.2.8.jarKeep in mind that Java 8+ and Scala 2.12 are prerequisites for a seamless integration experience so please make sure that these components are installed on your system before proceeding. While here we demonstrate Spark's local operation using Docker, consider referring to the Apache Spark documentation or your cloud platform's resources for guidance on installation and deploying a Spark cluster in different environments, like EMR on AWS or Dataproc in GCP. Additionally, make sure to verify the compatibility of your chosen language runtime with your selected environment.The way in which Spark and Weaviate work together is through the Spark Connector. This connector serves as a bridge, allowing data to flow from Spark to Weaviate. It's especially important for tasks like Extract, Transform, Load (ETL) processes, where it allows to processing of the data with Spark and then populate Weaviate vector databases. One of its key features is its ability to automatically figure out the correct data type in Spark based on your Weaviate schema, making data transfer more straightforward. Another feature is that you can choose to vectorize data as you send it to Weaviate, or you can provide existing vectors. By default, Weaviate generates document IDs for new documents, but you can also supply your own IDs within the data frame. These capabilities can all be configured as options within the Spark Connector.In the next sections, we will dive into the practical implementation of the integration, showing the PySpark notebook that we can run in Jupyter with code snippets to guide us through each step of the implementation. In this case, we will be using the Sphere dataset – housing a robust 100k lines of data – in our Spark Session, and we will insert it into the running Weaviate dataset which will create embeddings by using the Transformers container.Initializing the Spark Session and Loading DataTo begin, we initialize the Spark Session using the SparkSession.builder module. This code snippet configures the session with the necessary settings, including the specification of the spark-connector-assembly-1.2.8.jar – the Weaviate Spark Connector JAR file. We set the session's master to local[*] and define the application name as weaviate. The .getOrCreate() function ensures the session is created or retrieved as needed. To maintain clarity, we suppress log messages with a level of "WARN."from pyspark.sql import SparkSession spark = (    SparkSession.builder.config(        "spark.jars",        "spark-connector-assembly-1.2.8.jar",  # specify the spark connector JAR    )    .master("local[*]")    .appName("weaviate")    .getOrCreate() ) spark.sparkContext.setLogLevel("WARN") Remember that in this case, the connector needs to be in the proper location to be utilized by the Spark Session. Now we can proceed to load the dataset using the .load() function, specifying the format as JSON. This command fetches the data into a DataFrame named df, which is then displayed using .show(). df = spark.read.load("sphere.100k.jsonl", format="json") df.show()The next steps involve preparing the data for the integration with Weaviate. We first drop the vector column from the DataFrame, as it's not needed for our integration purpose.df = df.drop(*["vector"]) df.show()To interact with Weaviate, we use the weaviate Python package. The code initializes the Weaviate client, specifying the base URL and setting timeout configurations. We then delete any existing schema and proceed to create a new class named Sphere with specific properties, including raw, sha, title, and url. The vectorizer is set to text2vec-transformers.import weaviate import json # initiate the Weaviate client client = weaviate.Client("<http://weaviate:8080>") client.timeout_config = (3, 200) # empty schema and create new schema client.schema.delete_all() client.schema.create_class(    {        "class": "Sphere",        "properties": [            {                "name": "raw",                "dataType": ["string"]            },            {                "name": "sha",                "dataType": ["string"]            },            {                "name": "title",                "dataType": ["string"]            },            {                "name": "url",                "dataType": ["string"]            },        ],     "vectorizer":"text2vec-transformers"    } )Now we can start the process of writing data from Spark to Weaviate. The code renames the id column to uuid and uses the .write.format() function to specify the Weaviate format for writing. Various options, such as batchSize, scheme, host, id, and className, can be set to configure the write process. The .mode("append") ensures that only the append write mode is currently supported. Additionally, the code highlights that both batch operations and streaming writes are supported.df.limit(1500).withColumnRenamed("id", "uuid").write.format("io.weaviate.spark.Weaviate") \\\\    .option("batchSize", 200) \\\\    .option("scheme", "http") \\\\    .option("host", "weaviate:8080") \\\\    .option("id", "uuid") \\\\    .option("className", "Sphere") \\\\    .mode("append").save()Querying Weaviate for Data InsightsNow we can conclude this hands-on section by showcasing how to query Weaviate for data insights. The code snippet demonstrates querying the Sphere class for title and raw properties, using the .get() and .with_near_text() functions. The concept parameter includes animals, and additional information like distance is requested. A limit of 5 results is set using .with_limit(5), and the query is executed with .do().client.query\\\\    .get("Sphere", ["title","raw"])\\\\    .with_near_text({        "concepts": ["animals"]    })\\\\    .with_additional(["distance"])\\\\    .with_limit(5)\\\\    .do()These guided steps provide a comprehensive view of the integration process, showcasing the seamless data transfer from Spark to Weaviate and enabling data analysis with enhanced insights.ConclusionIn conclusion, the integration of Weaviate and PySpark represents the convergence of technologies to offer innovative solutions for data analysis and exploration. By integrating the capabilities of Weaviate, a semantic graph database, and PySpark, a versatile data processing framework, we enable new exciting possible applications to query and extract insights from our data.Throughout this article, we started by explaining the Docker Compose environment, orchestrated the components, and introduced the Spark Connector, we set the stage for efficient data flow and analysis. The Spark Connector enables to transfer of data from Spark to Weaviate. Its flexibility in adapting to various data types and schema configurations showcased its significance in ETL processes and data interaction. Next, we continued with a hands-on exploration that guided us through the integration process, offering practical insights into initializing the Spark Session, loading and preparing data, configuring the Weaviate client, and orchestrating seamless data transfer.In essence, the integration of Weaviate and PySpark not only simplifies data transfer but also unlocks enhanced data insights and analysis. This collaboration underscores the transformative potential of harnessing advanced technologies to extract meaningful insights from large datasets. As the realm of data analysis continues to evolve, the integration of Weaviate and PySpark emerges as a promising avenue for innovation and exploration.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 0
  • 0
  • 1689
article-image-llm-powered-chatbots-for-financial-queries
Alan Bernardo Palacio
12 Sep 2023
27 min read
Save for later

LLM-powered Chatbots for Financial Queries

Alan Bernardo Palacio
12 Sep 2023
27 min read
IntroductionIn the ever-evolving realm of digital finance, the convergence of user-centric design and cutting-edge AI technologies is pushing the boundaries of innovation. But with the influx of data and queries, how can we better serve users and provide instantaneous, accurate insights? Within this context, Large Language Models (LLMs) have emerged as a revolutionary tool, providing businesses and developers with powerful capabilities. This hands-on article will walk you through the process of leveraging LLMs to create a chatbot that can query real-time financial data extracted from the NYSE to address users' queries in real time about the current market state. We will dive into the world of LLMs, explore their potential, and understand how they seamlessly integrate with databases using LangChain. Furthermore, we'll fetch real-time data using the finance package offering the chatbot the ability to answer questions using current data.In this comprehensive tutorial, you'll gain proficiency in diverse aspects of modern software development. You'll first delve into the realm of database interactions, mastering the setup and manipulation of a MySQL database to store essential financial ticker data. Unveil the intricate synergy between Large Language Models (LLMs) and SQL through the innovative LangChain tool, which empowers you to bridge natural language understanding and database operations seamlessly. Moving forward, you'll explore the dynamic fusion of Streamlit and LLMs as you demystify the mechanics behind crafting a user-friendly front. Witness the transformation of your interface using OpenAI's Davinci model, enhancing user engagement with its profound knowledge. As your journey progresses, you'll embrace the realm of containerization, ensuring your application's agility and scalability by harnessing the power of Docker. Grasp the nuances of constructing a potent Dockerfile and orchestrating dependencies, solidifying your grasp on holistic software development practices.By the end of this guide, readers will be equipped with the knowledge to design, implement, and deploy an intelligent, finance-focused chatbot. This isn't just about blending frontend and backend technologies; it's about crafting a digital assistant ready to revolutionize the way users interact with financial data. Let's dive in!The Power of Containerization with Docker ComposeNavigating the intricacies of modern software deployment is simplified through the strategic implementation of Docker Compose. This orchestration tool plays a pivotal role in harmonizing multiple components within a local environment, ensuring they collaborate seamlessly.Docker allows to deploy multiple components seamlessly in a local environment. In our journey, we will use docker-compose to harmonize various components, including MySQL for data storage, a Python script as a data fetcher for financial insights, and a Streamlit-based web application that bridges the gap between the user and the chatbot.Our deployment landscape consists of several interconnected components, each contributing to the finesse of our intelligent chatbot. The cornerstone of this orchestration is the docker-compose.yml file, a blueprint that encapsulates the deployment architecture, coordinating the services to deliver a holistic user experience.With Docker Compose, we can efficiently and consistently deploy multiple interconnected services. Let's dive into the structure of our docker-compose.yml:version: '3' services: db:    image: mysql:8.0    environment:      - MYSQL_ROOT_PASSWORD=root_password      - MYSQL_DATABASE=tickers_db      - MYSQL_USER=my_user   # Replace with your desired username      - MYSQL_PASSWORD=my_pass  # Replace with your desired password    volumes:      - ./db/setup.sql:/docker-entrypoint-initdb.d/setup.sql    ports:      - "3306:3306"  # Maps port 3306 in the container to port 3306 on the host ticker-fetcher:    image: ticker/python    build:      context: ./ticker_fetcher    depends_on:      - db    environment:      - DB_USER=my_user   # Must match the MYSQL_USER from above      - DB_PASSWORD=my_pass   # Must match the MYSQL_PASSWORD from above      - DB_NAME=tickers_db app:    build:      context: ./app    ports:      - 8501:8501    environment:      - OPENAI_API_KEY=${OPENAI_API_KEY}    depends_on:      - ticker-fetcherContained within the composition are three distinctive services:db: A MySQL database container, configured with environmental variables for establishing a secure and efficient connection. This container is the bedrock upon which our financial data repository, named tickers_db, is built. A volume is attached to import a setup SQL script, enabling rapid setup.ticker-fetcher: This service houses the heart of our real-time data acquisition system. Crafted around a custom Python image, it plays the crucial role of fetching the latest stock information from Yahoo Finance. It relies on the db service to persistently store the fetched data, ensuring that our chatbot's insights are real-time data.app: The crown jewel of our user interface is the Streamlit application, which bridges the gap between users and the chatbot. This container grants users access to OpenAI's LLM model. It harmonizes with the ticker-fetcher service to ensure that the data presented to users is not only insightful but also dynamic.Docker Compose's brilliance lies in its capacity to encapsulate these services within isolated, reproducible containers. While Docker inherently fosters isolation, Docker Compose takes it a step further by ensuring that each service plays its designated role in perfect sync.The docker-compose.yml configuration file serves as the conductor's baton, ensuring each service plays its part with precision and finesse. As you journey deeper into the deployment architecture, you'll uncover the intricate mechanisms powering the ticker-fetcher container, ensuring a continuous flow of fresh financial data. Through the lens of Docker Compose, the union of user-centric design, cutting-edge AI, and streamlined deployment becomes not just a vision, but a tangible reality poised to transform the way we interact with financial data.Enabling Real-Time Financial Data AcquisitionAt the core of our innovative architecture lies the pivotal component dedicated to real-time financial data acquisition. This essential module operates as the engine that drives our chatbot's ability to deliver up-to-the-minute insights from the ever-fluctuating financial landscape.Crafted as a dedicated Docker container, this module is powered by a Python script that through the yfinance package, retrieves the latest stock information directly from Yahoo Finance. The result is a continuous stream of the freshest financial intelligence, ensuring that our chatbot remains armed with the most current and accurate market data.Our Python script, fetcher.py, looks as follows:import os import time import yfinance as yf import mysql.connector import pandas_market_calendars as mcal import pandas as pd import traceback DB_USER = os.environ.get('DB_USER') DB_PASSWORD = os.environ.get('DB_PASSWORD') DB_NAME = 'tickers_db' DB_HOST = 'db' DB_PORT = 3306 def connect_to_db():    return mysql.connector.connect(        host=os.getenv("DB_HOST", "db"),        port=os.getenv("DB_PORT", 3306),        user=os.getenv("DB_USER"),        password=os.getenv("DB_PASSWORD"),        database=os.getenv("DB_NAME"),    ) def wait_for_db():    while True:        try:            conn = connect_to_db()            conn.close()            return        except mysql.connector.Error:            print("Unable to connect to the database. Retrying in 5 seconds...")            time.sleep(5) def is_market_open():    # Get the NYSE calendar    nyse = mcal.get_calendar('NYSE')    # Get the current timestamp and make it timezone-naive    now = pd.Timestamp.now(tz='UTC').tz_localize(None)    print("Now its:",now)    # Get the market open and close times for today    market_schedule = nyse.schedule(start_date=now, end_date=now)    # If the market isn't open at all today (e.g., a weekend or holiday)    if market_schedule.empty:        print('market is empty')        return False    # Today's schedule    print("Today's schedule")    # Check if the current time is within the trading hours    market_open = market_schedule.iloc[0]['market_open'].tz_localize(None)    market_close = market_schedule.iloc[0]['market_close'].tz_localize(None)    print("market_open",market_open)    print("market_close",market_close)    market_open_now = market_open <= now <= market_close    print("Is market open now:",market_open_now)    return market_open_now def chunks(lst, n):    """Yield successive n-sized chunks from lst."""    for i in range(0, len(lst), n):        yield lst[i:i + n] if __name__ == "__main__":    wait_for_db()    print("-"*50)    tickers = ["AAPL", "GOOGL"]  # Add or modify the tickers you want      print("Perform backfill once")    # historical_backfill(tickers)    data = yf.download(tickers, period="5d", interval="1m", group_by="ticker", timeout=10) # added timeout    print("Data fetched from yfinance.")    print("Head")    print(data.head().to_string())    print("Tail")    print(data.head().to_string())    print("-"*50)    print("Inserting data")    ticker_data = []    for ticker in tickers:        for idx, row in data[ticker].iterrows():            ticker_data.append({                'ticker': ticker,                'open': row['Open'],                'high': row['High'],                'low': row['Low'],                'close': row['Close'],                'volume': row['Volume'],                'datetime': idx.strftime('%Y-%m-%d %H:%M:%S')            })    # Insert data in bulk    batch_size=200    conn = connect_to_db()    cursor = conn.cursor()    # Create a placeholder SQL query    query = """INSERT INTO ticker_history (ticker, open, high, low, close, volume, datetime)               VALUES (%s, %s, %s, %s, %s, %s, %s)"""    # Convert the data into a list of tuples    data_tuples = []    for record in ticker_data:        for key, value in record.items():            if pd.isna(value):                record[key] = None        data_tuples.append((record['ticker'], record['open'], record['high'], record['low'],                            record['close'], record['volume'], record['datetime']))    # Insert records in chunks/batches    for chunk in chunks(data_tuples, batch_size):        cursor.executemany(query, chunk)        print(f"Inserted batch of {len(chunk)} records")    conn.commit()    cursor.close()    conn.close()    print("-"*50)    # Wait until starting to insert live values    time.sleep(60)    while True:        if is_market_open():            print("Market is open. Fetching data.")            print("Fetching data from yfinance...")            data = yf.download(tickers, period="1d", interval="1m", group_by="ticker", timeout=10) # added timeout            print("Data fetched from yfinance.")            print(data.head().to_string())                      ticker_data = []            for ticker in tickers:                latest_data = data[ticker].iloc[-1]                ticker_data.append({                    'ticker': ticker,                    'open': latest_data['Open'],                    'high': latest_data['High'],                    'low': latest_data['Low'],                    'close': latest_data['Close'],                    'volume': latest_data['Volume'],                    'datetime': latest_data.name.strftime('%Y-%m-%d %H:%M:%S')                })                # Insert the data                conn = connect_to_db()                cursor = conn.cursor()                print("Inserting data")                total_tickers = len(ticker_data)                for record in ticker_data:                    for key, value in record.items():                        if pd.isna(value):                            record[key] = "NULL"                    query = f"""INSERT INTO ticker_history (ticker, open, high, low, close, volume, datetime)                                VALUES (                                    '{record['ticker']}',{record['open']},{record['high']},{record['low']},{record['close']},{record['volume']},'{record['datetime']}')"""                    print(query)                    cursor.execute(query)                print("Data inserted")                conn.commit()                cursor.close()                conn.close()            print("Inserted data, waiting for the next batch in one minute.")            print("-"*50)            time.sleep(60)        else:            print("Market is closed. Waiting...")            print("-"*50)            time.sleep(60)  # Wait for 60 seconds before checking againWithin its code, the script seamlessly navigates through a series of well-defined stages:Database Connectivity: The script initiates by establishing a secure connection to our MySQL database. With the aid of the connect_to_db() function, a connection is created while the wait_for_db() mechanism guarantees the script's execution occurs only once the database service is fully primed.Market Schedule Evaluation: Vital to the script's operation is the is_market_open() function, which determines the market's operational status. By leveraging the pandas_market_calendars package, this function ascertains whether the New York Stock Exchange (NYSE) is currently active.Data Retrieval and Integration: During its maiden voyage, fetcher.py fetches historical stock data from the past five days for a specified list of tickers—typically major entities such as AAPL and GOOGL. This data is meticulously processed and subsequently integrated into the tickers_db database. During subsequent cycles, while the market is live, the script periodically procures real-time data at one-minute intervals.Batched Data Injection: Handling substantial volumes of stock data necessitates an efficient approach. To address this, the script ingeniously partitions the data into manageable chunks and employs batched SQL INSERT statements to populate our database. This technique ensures optimal performance and streamlined data insertion.Now let’s discuss the Dockerfile that defines this container. The Dockerfile is the blueprint for building the ticker-fetcher container. It dictates how the Python environment will be set up inside the container.FROM python:3 WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "-u", "fetcher.py"] Base Image: We start with a basic Python 3 image.Working Directory: The working directory inside the container is set to /app.Dependencies Installation: After copying the requirements.txt file into our container, the RUN command installs all necessary Python packages.Starting Point: The script's entry point, fetcher.py, is set as the command to run when the container starts.A list of Python packages needed to run our fetcher script:mysql-connector-python yfinance pandas-market-calendarsmysql-connector-python: Enables the script to connect to our MySQL database.yfinance: Fetches stock data from Yahoo Finance.pandas-market-calendars: Determines the NYSE's operational schedule.As a symphony of technology, the ticker-fetcher container epitomizes precision and reliability, acting as the conduit that channels real-time financial data into our comprehensive architecture.Through this foundational component, the chatbot's prowess in delivering instantaneous, accurate insights comes to life, representing a significant stride toward revolutionizing the interaction between users and financial data.With a continuously updating financial database at our disposal, the next logical step is to harness the potential of Large Language Models. The subsequent section will explore how we integrate LLMs using LangChain, allowing our chatbot to transform raw stock data into insightful conversations.Leveraging Large Language Models with SQL using LangChainThe beauty of modern chatbot systems lies in the synergy between the vast knowledge reservoirs of Large Language Models (LLMs) and real-time, structured data from databases. LangChain is a bridge that efficiently connects these two worlds, enabling seamless interactions between LLMs and databases such as SQL.The marriage between LLMs and SQL databases opens a world of possibilities. With LangChain as the bridge, LLMs can effectively query databases, offering dynamic responses based on stored data. This section delves into the core of our setup, the utils.py file. Here, we knit together the MySQL database with our Streamlit application, defining the agent that stands at the forefront of database interactions.LangChain is a library designed to facilitate the union of LLMs with structured databases. It provides utilities and agents that can direct LLMs to perform database operations via natural language prompts. Instead of a user having to craft SQL queries manually, they can simply ask a question in plain English, which the LLM interprets and translates into the appropriate SQL query.Below, we present the code if utils.py, that brings our LLM-database interaction to life:from langchain import PromptTemplate, FewShotPromptTemplate from langchain.prompts.example_selector import LengthBasedExampleSelector from langchain.llms import OpenAI from langchain.sql_database import SQLDatabase from langchain.agents import create_sql_agent from langchain.agents.agent_toolkits import SQLDatabaseToolkit from langchain.agents.agent_types import AgentType # Database credentials DB_USER = 'my_user' DB_PASSWORD = 'my_pass' DB_NAME = 'tickers_db' DB_HOST = 'db' DB_PORT = 3306 mysql_uri = f"mysql+mysqlconnector://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}" # Initialize the SQLDatabase and SQLDatabaseToolkit db = SQLDatabase.from_uri(mysql_uri) toolkit = SQLDatabaseToolkit(db=db, llm=OpenAI(temperature=0)) # Create SQL agent agent_executor = create_sql_agent(    llm=OpenAI(temperature=0),    toolkit=toolkit,    verbose=True,    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION, ) # Modified the generate_response function to now use the SQL agent def query_db(prompt):    return agent_executor.run(prompt)Here's a breakdown of the key components:Database Credentials: These credentials are required to connect our application to the tickers_db database. The MySQL URI string represents this connection.SQLDatabase Initialization: Using the SQLDatabase.from_uri method, LangChain initializes a connection to our database. The SQLDatabaseToolkit provides a set of tools that help our LLM interact with the SQL database.Creating the SQL Agent: The SQL agent, in our case a ZERO_SHOT_REACT_DESCRIPTION type, is the main executor that takes a natural language prompt and translates it into SQL. It then fetches the data and returns it in a comprehensible manner. The agent uses the OpenAI model (an instance of LLM) and the aforementioned toolkit to accomplish this.The query_db function: This is the interface to our SQL agent. Upon receiving a prompt, it triggers the SQL agent to run and then returns the response.The architecture is now in place: on one end, we have a constant stream of financial data being fetched and stored in tickers_db. On the other, we have an LLM ready to interpret and answer user queries. The user might ask, "What was the closing price of AAPL yesterday?" and our system will seamlessly fetch this from the database and provide a well-crafted response, all thanks to LangChain.In the forthcoming section, we'll discuss how we present this powerful capability through an intuitive interface using Streamlit. This will enable end-users, irrespective of their technical proficiency, to harness the combined might of LLMs and structured databases, all with a simple chat interface.Building an Interactive Chatbot with Streamlit, OpenAI, and LangChainIn the age of user-centric design, chatbots represent an ideal solution for user engagement. But while typical chatbots can handle queries, imagine a chatbot powered by both Streamlit for its front end and a Large Language Model (LLM) for its backend intelligence. This powerful union allows us to provide dynamic, intelligent responses, leveraging our stored financial data to answer user queries.The grand finale of our setup is the Streamlit application. This intuitive web interface allows users to converse with our chatbot in natural language, making database queries feel like casual chats. Behind the scenes, it leverages the power of the SQL agent, tapping into the real-time financial data stored in our database, and presenting users with instant, accurate insights.Let's break down our chatbot's core functionality, designed using Streamlit:import streamlit as st from streamlit_chat import message from streamlit_extras.colored_header import colored_header from streamlit_extras.add_vertical_space import add_vertical_space from utils import * # Now the Streamlit app # Sidebar contents with st.sidebar:    st.title('Financial QnA Engine')    st.markdown('''    ## About    This app is an LLM-powered chatbot built using:    - Streamlit    - Open AI Davinci LLM Model    - LangChain    - Finance    ''')    add_vertical_space(5)    st.write('Running in Docker!') # Generate empty lists for generated and past. ## generated stores AI generated responses if 'generated' not in st.session_state:    st.session_state['generated'] = ["Hi, how can I help today?"] ## past stores User's questions if 'past' not in st.session_state:    st.session_state['past'] = ['Hi!'] # Layout of input/response containers input_container = st.container() colored_header(label='', description='', color_name='blue-30') response_container = st.container() # User input ## Function for taking user provided prompt as input def get_text():    input_text = st.text_input("You: ", "", key="input")    return input_text ## Applying the user input box with input_container:    user_input = get_text() # Response output ## Function for taking user prompt as input followed by producing AI generated responses def generate_response(prompt):    response = query_db(prompt)    return response ## Conditional display of AI generated responses as a function of user provided prompts with response_container:    if user_input:        response = generate_response(user_input)        st.session_state.past.append(user_input)        st.session_state.generated.append(response)          if st.session_state['generated']:        for i in range(len(st.session_state['generated'])):            message(st.session_state['past'][i], is_user=True, key=str(i) + '_user',avatar_style='identicon',seed=123)            message(st.session_state["generated"][i], key=str(i),avatar_style='icons',seed=123)The key features of the application are in general:Sidebar Contents: The sidebar provides information about the chatbot, highlighting the technologies used to power it. With the aid of streamlit-extras, we've added vertical spacing for visual appeal.User Interaction Management: Our chatbot uses Streamlit's session_state to remember previous user interactions. The 'past' list stores user queries, and 'generated' stores the LLM-generated responses.Layout: With Streamlit's container feature, the layout is neatly divided into areas for user input and the AI response.User Input: Users interact with our chatbot using a simple text input box, where they type in their query.AI Response: Using the generate_response function, the chatbot processes user input, fetching data from the database using LangChain and LLM to generate an appropriate response. These responses, along with past interactions, are then dynamically displayed in the chat interface using the message function from streamlit_chat.Now in order to ensure portability and ease of deployment, our application is containerized using Docker. Below is the Dockerfile that aids in this process:FROM python:3 WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["streamlit", "run", "streamlit_app.py"]This Dockerfile:Starts with a base Python 3 image.Sets the working directory in the container to /app.Copies the requirements.txt into the container and installs the necessary dependencies.Finally, it copies the rest of the application and sets the default command to run the Streamlit application.Our application depends on several Python libraries which are specified in requirements.txt:streamlit streamlit-chat streamlit-extras mysql-connector-python openai==0.27.8 langchain==0.0.225These include:streamlit: For the main frontend application.streamlit-chat & streamlit-extras: For enhancing the chat interface and adding utility functions like colored headers and vertical spacing.mysql-connector-python: To interact with our MySQL database.openai: For accessing OpenAI's Davinci model.langchain: To bridge the gap between LLMs and our SQL database.Our chatbot application combines now a user-friendly frontend interface with the immense knowledge and adaptability of LLMs.We can verify that the values provided by the chatbot actually reflect the data in the database:In the next section, we'll dive deeper into deployment strategies, ensuring users worldwide can benefit from our financial Q&A engine.ConclusionAs we wrap up this comprehensive guide, we have used from raw financial data to a fully-fledged, AI-powered chatbot, we've traversed a myriad of technologies, frameworks, and paradigms to create something truly exceptional.We initiated our expedition by setting up a MySQL database brimming with financial data. But the real magic began when we introduced LangChain to the mix, establishing a bridge between human-like language understanding and the structured world of databases. This amalgamation ensured that our application could pull relevant financial insights on the fly, using natural language queries. Streamlit stood as the centerpiece of our user interaction. With its dynamic and intuitive design capabilities, we crafted an interface where users could communicate effortlessly. Marrying this with the vast knowledge of OpenAI's Davinci LLM, our chatbot could comprehend, reason, and respond, making financial inquiries a breeze. To ensure that our application was both robust and portable, we harnessed the power of Docker. This ensured that irrespective of the environment, our chatbot was always ready to assist, without the hassles of dependency management.We've showcased just the tip of the iceberg. The bigger picture is captivating, revealing countless potential applications. Deploying such tools in firms could help clients get real-time insights into their portfolios. Integrating such systems with personal finance apps can help individuals make informed decisions about investments, savings, or even daily expenses. The true potential unfolds when you, the reader, take these foundational blocks and experiment, innovate, and iterate. The amalgamation of LLMs, databases, and interactive interfaces like Streamlit opens a realm of possibilities limited only by imagination.As we conclude, remember that in the world of technology, learning is a continuous journey. What we've built today is a stepping stone. Challenge yourself, experiment with new data sets, refine the user experience, or even integrate more advanced features. The horizon is vast, and the opportunities, endless. Embrace this newfound knowledge and craft the next big thing in financial technology. After all, every revolution starts with a single step, and today, you've taken many. Happy coding!Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 0
  • 0
  • 967

article-image-efficient-llm-querying-with-lmql
Alan Bernardo Palacio
12 Sep 2023
14 min read
Save for later

Efficient LLM Querying with LMQL

Alan Bernardo Palacio
12 Sep 2023
14 min read
IntroductionIn the world of natural language processing, Large Language Models (LLMs) have proven to be highly successful at a variety of language-based tasks, such as machine translation, text summarization, question answering, reasoning, and code generation. LLMs like ChatGPT, GPT-4, and others have demonstrated outstanding performance by predicting the next token in a sequence based on input prompts. Users interact with these models by providing language instructions or examples to perform various downstream tasks. However, to achieve optimal results or adapt LLMs for specific tasks, complex and task-specific programs must be implemented, often requiring ad-hoc interactions and deep knowledge of the model's internals.In this article, we discuss LMQL, a framework for Language Model Programming (LMP), that allows users to specify complex interactions, control flow, and constraints without needing deep knowledge of the LLM's internals using a declarative programming language similar to SQL. LMQL supports high-level, logical constraints and users can express a wide range of prompting techniques concisely, reducing the need for ad-hoc interactions and manual work to steer model generation, avoiding costly re-querying, and guiding the text generation process according to their specific criteria. Let’s start.Overview of Large Language ModelsLanguage models (LMs) operate on sequences of tokens, where tokens are discrete elements that represent words or sub-words in a text. The process involves using a tokenizer to map input words to tokens, and then a language model predicts the probabilities of possible next tokens based on the input sequence. Various decoding methods are used in the LMs to output the right sequence of tokens from the language model's predictions out of which we can name:Decoding Methods:Greedy decoding: Select the token with the highest probability at each step.Sampling: Randomly sampling tokens based on the predicted probabilities.Full decoding: Enumerating all possible sequences and selecting the one with the highest probability (computationally expensive).Beam search: Maintaining a set of candidate sequences and refining them by predicting the next token.Masked Decoding: In some cases, certain tokens can be ruled out based on a mask that indicates which tokens are viable. Decoding is then performed on the remaining set of tokens.Few-Shot Prompting: LMs can be trained on broad text-sequence prediction datasets and then provided with context in the form of examples for specific tasks. This approach allows LMs to perform downstream tasks without task-specific training.Multi-Part Prompting: LMs are used not only for simple prompt completion but also as reasoning engines integrated into larger programs. Various LM programming schemes explore compositional reasoning, such as iterated decompositions, meta prompting, tool use, and composition of multiple prompts.It is also important to name that for beam searching and sampling there is a parameter named temperature which we can use to control the diversity of the output.These techniques enable LMs to be versatile and perform a wide range of tasks without requiring task-specific training, making them powerful multi-task reasoners.Asking the Right QuestionsWhile LLMs can be prompted with examples or instructions, using them effectively and adapting to new models often demands a deep understanding of their internal workings, along with the use of vendor-specific libraries and implementations. Constrained decoding to limit text generation to legal words or phrases can be challenging. Many advanced prompting methods require complex interactions and control flows between the LLM and the user, leading to manual work and restricting the generality of implementations. Additionally, generating complete sequences from LLMs may require multiple calls and become computationally expensive, resulting in high usage costs per query in pay-to-use APIs. Generally, the challenges that can associated with creating proper promts for LLMs are:Interaction Challenge: One challenge in LM interaction is the need for multiple manual interactions during the decoding process. For example, in meta prompting, where the language model is asked to expand the prompt and then provide an answer, the current approach requires inputting the prompt partially, invoking the LM, extracting information, and manually completing the sequence. This manual process may involve human intervention or several API calls, making joint optimization of template parameters difficult and limiting automated optimization possibilities.Constraints & Token Representation: Another issue arises when considering completions generated by LMs. Sometimes, LMs may produce long, ongoing sequences of text that do not adhere to desired constraints or output formats. Users often have specific constraints for the generated text, which may be violated by the LM. Expressing these constraints in terms of human-understandable concepts and logic is challenging, and existing methods require considerable manual implementation effort and model-level understanding of decoding procedures, tokenization, and vocabulary.Efficiency and Cost Challenge: Efficiency and performance remain significant challenges in LM usage. While efforts have been made to improve the inference step in modern LMs, they still demand high-end GPUs for reasonable performance. This makes practical usage costly, particularly when relying on hosted models running in the cloud with paid APIs. The computational and financial expenses associated with frequent LM querying can become prohibitive.Addressing these challenges, Language Model Programming and constraints offer new optimization opportunities. By defining behavior and limiting the search space, the number of LM invocations can be reduced. In this context, the cost of validation, parsing, and mask generation becomes negligible compared to the significant cost of a single LM call.So the question arises, how can we overcome the challenges of implementing complex interactions and constraints with LLMs while reducing computational costs and retaining or improving accuracy on downstream tasks?Introducing LMQLTo address these challenges and enhance language model programming, a team of researchers has introduced LMQL (Language Model Query Language). LMQL is an open-source programming language and platform for LLM interaction that combines prompts, constraints, and scripting. It is designed to elevate the capabilities of LLMs like ChatGPT, GPT-4, and any future models, offering a declarative, SQL-like approach based on Python.LMQL enables Language Model Programming (LMP), a novel paradigm that extends traditional natural language prompting by allowing lightweight scripting and output constraining. This separation of front-end and back-end interaction allows users to specify complex interactions, control flow, and constraints without needing deep knowledge of the LLM's internals. This approach abstracts away tokenization, implementation, and architecture details, making it more portable and easier to use across different LLMs.With LMQL, users can express a wide range of prompting techniques concisely, reducing the need for ad-hoc interactions and manual work. The language supports high-level, logical constraints, enabling users to steer model generation and avoid costly re-querying and validation. By guiding the text generation process according to specific criteria, users can achieve the desired output with fewer iterations and improved efficiency.Moreover, LMQL leverages evaluation semantics to automatically generate token masks for LM decoding based on user-specified constraints. This optimization reduces inference cost by up to 80%, resulting in significant latency reduction and lower computational expenses, particularly beneficial for pay-to-use APIs.LMQL ddresses certain challenges in LM interaction and usage which are namely.Overcoming Manual Interaction: LMQL simplifies the prompt and eliminates the need for manual interaction during the decoding process. It achieves this by allowing the use of variables, represented within square brackets, which store the answers obtained from the language model. These variables can be referenced later in the query, avoiding the need for manual extraction and input. By employing LMQL syntax, the interaction process becomes more automated and efficient.Constraints on Variable Parts: To address issues related to long and irrelevant outputs, LMQL introduces constraints on the variable parts of LM interaction. These constraints allow users to specify word and phrase limitations for the generated text. LMQL ensures that the decoded tokens for variables meet these constraints during the decoding process. This provides more control over the generated output and ensures that it adheres to user-defined restrictions.Generalization of Multi-Part Prompting: Language Model Programming through LMQL generalizes various multi-part prompting approaches discussed earlier. It streamlines the process of trying different values for variables by automating the selection process. Users can set constraints on variables, which are then applied to multiple inputs without any human intervention. Once developed and tested, an LMQL query can be easily applied to different inputs in an unsupervised manner, eliminating the need for manual trial and error.Efficient Execution: LMQL offers efficiency benefits over manual interaction. The constraints and scripting capabilities in LMQL are applied eagerly during decoding, reducing the number of times the LM needs to be invoked. This optimized approach results in notable time and cost savings, especially when using hosted models in cloud environments.The LMQL syntax involves components such as the decoder, the actual query, the model to query, and the constraints. The decoder specifies the decoding procedure, which can include argmax, sample, or beam search. LMQL allows for constraints on the generated text using Python syntax, making it more user-friendly and easily understandable. Additionally, the distribution instruction allows users to augment the returned result with probability distributions, which is useful for tasks like sentiment analysis.Using LMQL with PythonLMQL can be utilized in various ways - as a standalone language, in the Playground, or even as a Python library being the latter what we will demonstrate now. Integrating LMQL into Python projects allows users to streamline their code and incorporate LMQL queries seamlessly. Let's explore how to use LMQL as a Python library and understand some examples.To begin, make sure you have LMQL and LangChain installed by running the following command:!pip install lmql==0.0.6.6 langchain==0.0.225You can then define and execute LMQL queries within Python using a simple approach. Decorate a Python function with the lmql.query decorator, providing the query code as a multi-line string. The decorated function will automatically be compiled into an LMQL query. The return value of the decorated function will be the result of the LMQL query.Here's an example code snippet demonstrating this:import lmql import aiohttp import os os.environ['OPENAI_API_KEY'] = '<your-openai-key>' @lmql.query async def hello():    '''lmql    argmax        "Hello[WHO]"    from        "openai/text-ada-001"    where        len(TOKENS(WHO)) < 10    ''' print(await hello())LMQL provides a fully asynchronous API that enables running multiple LMQL queries in parallel. By declaring functions as async with @lmql.query, you can use await to execute the queries concurrently.The code below demonstrates how to look up information from Wikipedia and incorporate it into an LMQL prompt dynamically:async def look_up(term):    # Looks up term on Wikipedia    url = f"<https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles={term}&origin=*>"    async with aiohttp.ClientSession() as session:        async with session.get(url) as response:            # Get the first sentence on the first page            page = (await response.json())["query"]["pages"]            return list(page.values())[0]["extract"].split(".")[0] @lmql.query async def greet(term):    '''    argmax        """Greet {term} ({await look_up(term)}):        Hello[WHO]        """    from        "openai/text-davinci-003"    where        STOPS_AT(WHO, "\\n")    ''' print((await greet("Earth"))[0].prompt)As an alternative to @lmql.query you can use lmql.query(...) as a function that compiles a provided string of LMQL code into a Python function.q = lmql.query('argmax "Hello[WHO]" from "openai/text-ada-001" where len(TOKENS(WHO)) < 10') await q()LMQL queries can also be easily integrated into langchain's Chain components. This allows for sequential prompting using multiple queries.pythonCopy code from langchain import LLMChain, PromptTemplate from langchain.chat_models import ChatOpenAI from langchain.prompts.chat import (ChatPromptTemplate, HumanMessagePromptTemplate) from langchain.llms import OpenAI # Setup the LM to be used by langchain llm = OpenAI(temperature=0.9) human_message_prompt = HumanMessagePromptTemplate(    prompt=PromptTemplate(        template="What is a good name for a company that makes {product}?",        input_variables=["product"],    ) ) chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt]) chat = ChatOpenAI(temperature=0.9) chain = LLMChain(llm=chat, prompt=chat_prompt_template) # Run the chain chain.run("colorful socks")Lastly, by treating LMQL queries as Python functions, you can easily build pipelines by chaining functions together. Furthermore, the guaranteed output format of LMQL queries ensures ease of processing the returned values using data processing libraries like Pandas.Here's an example of processing the output of an LMQL query with Pandas:pythonCopy code import pandas as pd @lmql.query async def generate_dogs(n: int):    '''lmql    sample(n=n)        """Generate a dog with the following characteristics:        Name:[NAME]        Age: [AGE]        Breed:[BREED]        Quirky Move:[MOVE]        """    from        "openai/text-davinci-003"    where        STOPS_BEFORE(NAME, "\\n") and STOPS_BEFORE(BREED, "\\n") and        STOPS_BEFORE(MOVE, "\\n") and INT(AGE) and len(AGE) < 3    ''' result = await generate_dogs(8) df = pd.DataFrame([r.variables for r in result]) dfBy employing LMQL as a Python library, users can make their code more efficient and structured, allowing for easier integration with other Python libraries and tools.LMQL can be used in various ways - as a standalone language, in the Playground, or even as a Python library. When integrated into Python projects, LMQL queries can be executed seamlessly. Below, we provide a brief overview of using LMQL as a Python library.ConclusionLMQL introduces an efficient and powerful approach to interact with language models, revolutionizing language model programming. By combining prompts, constraints, and scripting, LMQL offers a user-friendly interface for working with large language models, significantly improving efficiency and accuracy across diverse tasks. Its capabilities allow developers to leverage the full potential of language models without the burden of complex implementations, making language model interaction more accessible and cost-effective.With LMQL, users can overcome challenges in LM interaction, including manual interactions, constraints on variable parts, and generalization of multi-part prompting. By automating the selection process and eager application of constraints during decoding, LMQL reduces the number of LM invocations, resulting in substantial time and cost savings. Moreover, LMQL's declarative, SQL-like approach simplifies the development process and abstracts away tokenization and implementation details, making it more portable and user-friendly.In conclusion, LMQL represents a promising advancement in the realm of large language models and language model programming. Its efficiency, flexibility, and ease of use open up new possibilities for creating complex interactions and steering model generation without deep knowledge of the model's internals. By embracing LMQL, developers can make the most of language models, unleashing their potential across a wide range of language-based tasks with heightened efficiency and reduced computational costs.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 0
  • 0
  • 488