Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

How-To Tutorials - ChatGPT

114 Articles
article-image-building-trust-in-ai-the-role-of-rag-in-data-security-and-transparency
Keith Bourne
13 Dec 2024
15 min read
Save for later

Building Trust in AI: The Role of RAG in Data Security and Transparency

Keith Bourne
13 Dec 2024
15 min read
This article is an excerpt from the book, "Unlocking Data with Generative AI and RAG", by Keith Bourne. Master Retrieval-Augmented Generation (RAG), the most popular generative AI tool, to unlock the full potential of your data. This book enables you to develop highly sought-after skills as corporate investment in generative AI soars.IntroductionAs the adoption of Retrieval-Augmented Generation (RAG) continues to grow, its potential to address key security challenges in AI-driven applications is becoming evident. Far from merely introducing risks, RAG offers a robust framework to enhance data protection, ensure accuracy, and maintain transparency in content generation. This article delves into the multifaceted security benefits of RAG, while also addressing the unique challenges it poses and strategies to mitigate them.How RAG can be leveraged as a security solutionLet’s start with the most positive security aspect of RAG. RAG can actually be considered a solution to mitigate security concerns, rather than cause them. If done right, you can limit data access via user, ensure more reliable responses, and provide more transparency of sources.Limiting dataRAG applications may be a relatively new concept, but you can still apply the same authentication and database-based access approaches you can with web and similar types of applications. This provides the same level of security you can apply in these other types of applications. By implementing userbased access controls, you can restrict the data that each user or user group can retrieve through the RAG system. This ensures that sensitive information is only accessible to authorized individuals. Additionally, by leveraging secure database connections and encryption techniques, you can safeguard the data at rest and in transit, preventing unauthorized access or data breaches.Ensuring the reliability of generated contentOne of the key benefits of RAG is its ability to mitigate inaccuracies in generated content. By allowing applications to retrieve proprietary data at the point of generation, the risk of producing misleading or incorrect responses is substantially reduced. Feeding the most current data available through your RAG system helps to mitigate inaccuracies that might otherwise occur.With RAG, you have control over the data sources used for retrieval. By carefully curating and maintaining high-quality, up-to-date datasets, you can ensure that the information used to generate responses is accurate and reliable. This is particularly important in domains where precision and correctness are critical, such as healthcare, finance, or legal applications.Maintaining transparencyRAG makes it easier to provide transparency in the generated content. By incorporating data such as citations and references to the retrieved data sources, you can increase the credibility and trustworthiness of the generated responses.When a RAG system generates a response, it can include links or references to the specific data points or documents used in the generation process. This allows users to verify the information and trace it back to its original sources. By providing this level of transparency, you can build trust with your users and demonstrate the reliability of the generated content.Transparency in RAG can also help with accountability and auditing. If there are any concerns or disputes regarding the generated content, having clear citations and references makes it easier to investigate and resolve any issues. This transparency also facilitates compliance with regulatory requirements or industry standards that may require traceability of information.That covers many of the security-related benefits you can achieve with RAG. However, there are some security challenges associated with RAG as well. Let’s discuss these challenges next.RAG security challengesRAG applications face unique security challenges due to their reliance on large language models (LLMs) and external data sources. Let’s start with the black box challenge, highlighting the relative difficulty in understanding how an LLM determines its response.LLMs as black boxesWhen something is in a dark, black box with the lid closed, you cannot see what is going on in there! That is the idea behind the black box when discussing LLMs, meaning there is a lack of transparency and interpretability in how these complex AI models process input and generate output. The most popular LLMs are also some of the largest, meaning they can have more than 100 billion parameters. The intricate interconnections and weights of these parameters make it difficult to understand how the model arrives at a particular output.While the black box aspects of LLMs do not directly create a security problem, it does make it more difficult to identify solutions to problems when they occur. This makes it difficult to trust LLM outputs, which is a critical factor in most of the applications for LLMs, including RAG applications. This lack of transparency makes it more difficult to debug issues you might have in building an RAG application, which increases the risk of having more security issues.There is a lot of research and effort in the academic field to build models that are more transparent and interpretable, called explainable AI. Explainable AI aims at making the operations of A I systems transparent and understandable. It can involve tools, frameworks, and anything else that, when applied to RAG, helps us understand how the language models that we use produce the content they are generating. This is a big movement in the field, but this technology may not be immediately available as you read this. It will hopefully play a larger role in the future to help mitigate black box risk, but right now, none of the most popular LLMs are using explainable models. So, in the meantime, we will talk about other ways to address this issue.You can use human-in-the-loop, where you involve humans at different stages of the process to provide an added line of defense against unexpected outputs. This can often help to reduce the impact of the black box aspect of LLMs. If your response time is not as critical, you can also use an additional LLM to perform a review of the response before it is returned to the user, looking for issues. We will review how to add a second LLM call in code lab 5.3, but with a focus on preventing prompt attacks. But this concept is similar, in that you can add additional LLMs to do a number of extra tasks and improve the security of your application.Black box isn’t the only security issue you face when using RAG applications though; another very important topic is privacy protection.Privacy concerns and protecting user dataPersonally identifiable information (PII) is a key topic in the generative AI space, with governments a round the world trying to determine the best path to balance user privacy with the data-hungry needs of these LLMs. As this gets worked out, it is important to pay attention to the laws and regulations that are taking shape where your company is doing business and make sure all of the technologies you are integrating into your RAG applications adhere. Many companies, such as Google and Microsoft , are taking these efforts into their own hands, establishing their own standards of protection for their user data and emphasizing them in training literature for their platforms.At the corporate level, there is another challenge related to PII and sensitive information. As we have said many times, the nature of the RAG application is to give it access to the company data and combine that with the power of the LLM. For example, for financial institutions, RAG represents a way to give their customers unprecedented access to their own data in ways that allow them to speak naturally with technologies such as chatbots and get near-instant access to hard-to-find answers buried deep in their customer data.In many ways, this can be a huge benefit if implemented properly. But given that this is a security discussion, you may already see where I am going with this. We are giving unprecedented access to customer data using a technology that has artificial intelligence, and as we said previously in the black box discussion, we don’t completely understand how it works! If not implemented properly, this could be a recipe for disaster with massive negative repercussions for companies that get it wrong. Of course, it could be argued that the databases that contain the data are also a potential security risk. Having the data anywhere is a risk! But without taking on this risk, we also cannot provide the significant benefits they represent.As with other IT applications that contain sensitive data, you can forge forward, but you need to have a healthy fear of what can happen to data and proactively take measures to protect that data. The more you understand how RAG works, the better job you can do in preventing a potentially disastrous data leak. These steps can help you protect your company as well as the people who trusted your company with their data.This section was about protecting data that exists. However, a new risk that has risen with LLMs has been the generation of data that isn’t real, called hallucinations. Let’s discuss how this presents a new risk not common in the IT world.HallucinationsWe have discussed this in previous chapters, but LLMs can, at times, generate responses that sound coherent and factual but can be very wrong. These are called hallucinations and there have been many shocking examples provided in the news, especially in late 2022 and 2023, when LLMs became everyday tools for many users.Some are just funny with little consequence other than a good laugh, such as when ChatGPT was asked by a writer for The Economist, “When was the Golden Gate Bridge transported for the second time across Egypt?” ChatGPT responded, “The Golden Gate Bridge was transported for the second time across Egypt in October of 2016” (https://www.economist.com/by-invitation/2022/09/02/artificialneural-networks-today-are-not-conscious-according-to-douglashofstadter).Other hallucinations are more nefarious, such as when a New York lawyer used ChatGPT for legal research in a client’s personal injury case against Avianca Airlines, where he submitted six cases that had been completely made up by the chatbot, leading to court sanctions (https://www. courthousenews.com/sanctions-ordered-for-lawyers-who-relied-onchatgpt-artificial-intelligence-to-prepare-court-brief/). Even worse, generative AI has been known to give biased, racist, and bigoted perspectives, particularly when prompted in a manipulative way.When combined with the black box nature of these LLMs, where we are not always certain how and why a response is generated, this can be a genuine issue for companies wanting to use these LLMs in their RAG applications.From what we know though, hallucinations are primarily a result of the probabilistic nature of LLMs. For all responses that an LLM generates, it typically uses a probability distribution to determine what token it is going to provide next. In situations where it has a strong knowledge base of a certain subject, these probabilities for the next word/token can be 99% or higher. But in situations where the knowledge base is not as strong, the highest probability could be low, such as 20% or even lower. In these cases, it is still the highest probability and, therefore, that is the token that has the highest probability to be selected. The LLM has been trained on stringing tokens together in a very natural language way while using this probabilistic approach to select which tokens to display. As it strings together words with low probability, it forms sentences, and then paragraphs that sound natural and factual but are not based on high probability data. Ultimately, this results in a response that sounds very plausible but is, in fact, based on very loose facts that are incorrect.For a company, this poses a risk that goes beyond the embarrassment of your chatbot saying something wrong. What is said wrong could ruin your relationship(s) with your customer(s), or it could lead to the LLM offering your customer something that you did not intend to offer, or worse, cannot afford to offer. For example, when Microsoft released a chatbot named Tay on Twitter in 2016 with the intention of learning from interactions with Twitter users, users manipulated this spongy personality trait to get it to say numerous racist and bigoted remarks. This reflected poorly on Microsoft, which was promoting its expertise in the AI area with Tay, causing significant damage to its reputation at the time (https://www.theguardian.com/technology/2016/mar/26/microsoftdeeply-sorry-for-offensive-tweets-by-ai-chatbot).Hallucinations, threats related to black box aspects, and protecting user data can all be addressed through red teaming.ConclusionRAG represents a promising avenue for enhancing security in AI applications, offering tools to limit data access, ensure reliable outputs, and promote transparency. However, challenges such as the black box nature of LLMs, privacy concerns, and the risk of hallucinations demand proactive measures. By employing strategies like user-based access controls, explainable AI, and red teaming, organizations can harness the advantages of RAG while mitigating risks. As the technology evolves, a thoughtful approach to its implementation will be crucial for maintaining trust, compliance, and the integrity of data-driven solutions.Author BioKeith Bourne is a senior Generative AI data scientist at Johnson & Johnson. He has over a decade of experience in machine learning and AI working across diverse projects in companies that range in size from start-ups to Fortune 500 companies. With an MBA from Babson College and a master’s in applied data science from the University of Michigan, he has developed several sophisticated modular Generative AI platforms from the ground up, using numerous advanced techniques, including RAG, AI agents, and foundational model fine-tuning. Keith seeks to share his knowledge with a broader audience, aiming to demystify the complexities of RAG for organizations looking to leverage this promising technology.
Read more
  • 0
  • 0
  • 139

article-image-mastering-transfer-learning-fine-tuning-bert-and-vision-transformers
Sinan Ozdemir
27 Nov 2024
15 min read
Save for later

Mastering Transfer Learning: Fine-Tuning BERT and Vision Transformers

Sinan Ozdemir
27 Nov 2024
15 min read
This article is an excerpt from the book, "Principles of Data Science", by Sinan Ozdemir. This book provides an end-to-end framework for cultivating critical thinking about data, performing practical data science, building performant machine learning models, and mitigating bias in AI pipelines. Learn the fundamentals of computational math and stats while exploring modern machine learning and large pre-trained models.IntroductionTransfer learning (TL) has revolutionized the field of deep learning by enabling pre-trained models to adapt their broad, generalized knowledge to specific tasks with minimal labeled data. This article delves into TL with BERT and GPT, demonstrating how to fine-tune these advanced models for text classification and image classification tasks. Through hands-on examples, we illustrate how TL leverages pre-trained architectures to simplify complex problems and achieve high accuracy with limited data.TL with BERT and GPTIn this article, we will take some models that have already learned a lot from their pre-training and fine-tune them to perform a new, related task. This process involves adjusting the model’s parameters to better suit the new task, much like fine-tuning a musical instrument:Figure 12.8 – ITLITL takes a pre-trained model that was generally trained on a semi-supervised (or unsupervised) task and then is given labeled data to learn a specific task.Examples of TLLet’s take a look at some examples of TL with specific pre-trained models.Example – Fine-tuning a pre-trained model for text classificationConsider a simple text classification problem. Suppose we need to analyze customer reviews and determine whether they’re positive or negative. We have a dataset of reviews, but it’s not nearly large enough to train a deep learning (DL) model from scratch. We will fine-tune BERT on a text classification task, allowing the model to adapt its existing knowledge to our specific problem.We will have to move away from the popular scikit-learn library to another popular library called transformers, which was created by HuggingFace (the pre-trained model repository I mentioned earlier) as scikit-learn does not (yet) support Transformer models.Figure 12.9 shows how we will have to take the original BERT model and make some minor modifications to it to perform text classification. Luckily, the transformers package has a built-in class to do this for  us called BertForSequenceClassification:Figure 12.9 – Simplest text classification caseIn many TL cases, we need to architect additional layers. In the simplest text classification case, we add a classification layer on top of a pre-trained BERT model so that it can perform the kind of classification we want.The following code block shows an end-to-end code example of fine-tuning BERT on a text classification task. Note that we are also using a package called datasets, also made by HuggingFace, to load a sentiment classification task from IMDb reviews. Let’s  begin by loading up the dataset:# Import necessary libraries from datasets import load_dataset from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments # Load the dataset imdb_data = load_dataset('imdb', split='train[:1000]') # Loading only 1000 samples for a toy example # Define the tokenizer tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') # Preprocess the data def encode(examples): return tokenizer(examples['text'], truncation=True, padding='max_ length', max_length=512) imdb_data = imdb_data.map(encode, batched=True) # Format the dataset to PyTorch tensors imdb_data.set_format(type='torch', columns=['input_ids', 'attention_ mask', 'label'])With our dataset loaded up, we can run some training code to update our BERT model on our labeled data:# Define the model model = BertForSequenceClassification.from_pretrained( 'bert-base-uncased', num_labels=2) # Define the training arguments training_args = TrainingArguments( output_dir='./results', num_train_epochs=1, per_device_train_batch_size=4 ) # Define the trainer trainer = Trainer(model=model, args=training_args, train_dataset=imdb_ data) # Train the model trainer.train() # Save the model model.save_pretrained('./my_bert_model')Once we have our saved model, we can use the following code to run the model against unseen data:from transformers import pipeline # Define the sentiment analysis pipeline nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer) # Use the pipeline to predict the sentiment of a new review review = "The movie was fantastic! I enjoyed every moment of it." result = nlp(review) # Print the result print(f"label: {result[0]['label']}, with score: {round(result[0] ['score'], 4)}") # "The movie was fantastic! I enjoyed every moment of it." # POSITIVE: 99%Example – TL for image classificationWe could take a pre-trained model such as ResNet or the Vision Transformer (shown in Figure 12.10), initially trained on a large-scale image dataset such as ImageNet. This model has already learned to detect various features from images, from simple shapes to complex objects. We can take advantage of this knowledge, fi ne-tuning  the model on a custom image classification task:Figure 12.10 – The Vision TransformerThe Vision Transformer is like a BERT model for images. It relies on many of the same principles, except instead of text tokens, it uses segments of images as “tokens” instead.The following code block shows an end-to-end code example of fine-tuning the Vision Transformer on an image classification task. The code should look very similar to the BERT code from the previous section because the aim of the transformers library is to standardize training and usage of modern pre-trained models so that no matter what task you are performing, they can offer a relatively unified training and inference experience.Let’s begin by loading up our data and taking a look at the kinds of images we have (seen in Figure 12.11). Note that we are only going to use 1% of the dataset to show that you really don’t need that much data to get a lot out of pre-trained models!# Import necessary libraries from datasets import load_dataset from transformers import ViTImageProcessor, ViTForImageClassification from torch.utils.data import DataLoader import matplotlib.pyplot as plt import torch from torchvision.transforms.functional import to_pil_image # Load the CIFAR10 dataset using Hugging Face datasets # Load only the first 1% of the train and test sets train_dataset = load_dataset("cifar10", split="train[:1%]") test_dataset = load_dataset("cifar10", split="test[:1%]") # Define the feature extractor feature_extractor = ViTImageProcessor.from_pretrained('google/vitbase-patch16-224') # Preprocess the data def transform(examples): # print(examples) # Convert to list of PIL Images examples['pixel_values'] = feature_ extractor(images=examples["img"], return_tensors="pt")["pixel_values"] return examples # Apply the transformations train_dataset = train_dataset.map( transform, batched=True, batch_size=32 ).with_format('pt') test_dataset = test_dataset.map( transform, batched=True, batch_size=32 ).with_format('pt')We can similarly use the model using the following code:Figure 12.11 – A single example from CIFAR10 showing an airplaneNow, we can train our pre-trained Vision Transformer:# Define the model model = ViTForImageClassification.from_pretrained( 'google/vit-base-patch16-224', num_labels=10, ignore_mismatched_sizes=True ) LABELS = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] model.config.id2label = LABELS # Define a function for computing metrics def compute_metrics(p): predictions, labels = p preds = np.argmax(predictions, axis=1) return {"accuracy": accuracy_score(labels, preds)} # Define the training arguments training_args = TrainingArguments( output_dir='./results', num_train_epochs=5, per_device_train_batch_size=4, load_best_model_at_end=True, # Save and evaluate at the end of each epoch evaluation_strategy='epoch', save_strategy='epoch' ) # Define the trainer trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=test_dataset )Our final model has about 95% accuracy on 1% of the test set. We can now use our new classifier on unseen images, as in this next code block:from PIL import Image from transformers import pipeline # Define an image classification pipeline classification_pipeline = pipeline( 'image-classification', model=model, feature_extractor=feature_extractor ) # Load an image image = Image.open('stock_image_plane.jpg') # Use the pipeline to classify the image result = classification_pipeline(image)Figure 12.12 shows the result of this single classification, and it looks like it did pretty well:Figure 12.12 – Our classifier predicting a stock image of a plane correctlyWith minimal labeled data, we can leverage TL to turn models off the shelf into powerhouse predictive models.ConclusionTransfer learning is a transformative technique in deep learning, empowering developers to harness the power of pre-trained models like BERT and the Vision Transformer for specialized tasks. From sentiment analysis to image classification, these models can be fine-tuned with minimal labeled data, offering impressive performance and adaptability. By using libraries like HuggingFace’s transformers, TL streamlines model training, making state-of-the-art AI accessible and versatile across domains. As demonstrated in this article, TL is not only efficient but also a practical way to achieve powerful predictive capabilities with limited resources.Author BioSinan is an active lecturer focusing on large language models and a former lecturer of data science at the Johns Hopkins University. He is the author of multiple textbooks on data science and machine learning including "Quick Start Guide to LLMs". Sinan is currently the founder of LoopGenius which uses AI to help people and businesses boost their sales and was previously the founder of the acquired Kylie.ai, an enterprise-grade conversational AI platform with RPA capabilities. He holds a Master’s Degree in Pure Mathematics from Johns Hopkins University and is based in San Francisco.
Read more
  • 0
  • 0
  • 1504

article-image-chatgpt-for-coding
Jakov Semenski
25 Apr 2024
6 min read
Save for later

ChatGPT for Coding

Jakov Semenski
25 Apr 2024
6 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionChatGPT's coding style is terrible:Verbosecomplexand outdated.Let's change that.ChatGPT promised to be our coding savior, but sometimes it feels more like a blast from the past.Remember those early 2000’s coding books? Yep, it's giving those vibes.It's like having a sports car with a tractor engine. Great potential, but the performance? Not quite there.Imagine harnessing the power of ChatGPT but with the finesse of a master coder.Ready for the upgrade?Here are 12 Pro prompts that will get you the right results.Tip #1: Specificity is the kingAs soon as you ask for some coding snippet from ChatGPT, by default, you will get the most basic HelloWorld example.The more vague your prompt is, the more mediocre your results will beInstead, specify exactlylanguageversionframeworkWrite backend code for Library app that uses Rest to communicate Cover endpoints for adding, removing, and filtering books by category and date published Use Java latest version. Use lambda streams instead of for loops Use Spring framework Tip #2: Avoid code vomitChatGPT loves to write a lot of code, the way I like to call it “code vomit”We are no longer rewarded by the amount of code we produce, but by the clarity and principles we follow.Give chat GPT instructions towrite clean codeuse latest principlescover logging and exception handlingWrite clean code Code needs to be covered with Logging and proper exception handling Use principles: Kiss & DRY, SOLID Keep in mind to use design patterns where it is applicable Using coding instructions I gave you, give me code for each class Tip #3: Make it easy to use with IDEEvery time ChatGPT writes code you getexplanationsimport statementscomments.This can be good for a beginner but is not something we need for our IDEOur IDE is already good with importing all the right packages, so let ChatGPT knowWhen writing code, avoid detailed explanations, just simple bullet points Don't add import statements, as IDE will do this instead Tip #4 Write testsYour code is not complete if you are not done with tests.But not just any tests. We want to have unit and integration tests in areadable format (give when then)covering the happy and unhappy pathuses the latest testing libraries such as AssertJ and BDDMockitoFor each class write a unit and integration tests Use given when then format For libraries use BDDMockito and AssertJ Cover happy and unhappy paths Tip #5 Give REST call request examplesWhat is the app if we cannot test it without some examplesInstead of creating them manually, ask ChatGPT to create Curl examples we can easily copy to Postman.For each request, generate curl examples Now go ahead and use your terminal or copy/paste them to PostmanTip #6 Create documentationWe don’t want just plain text, instead, we need a quick start guide for developersWrite a quick start guide for developers using markdown. Imagine this app has been published to github repository Cover - Introduction - how to install app - how to run it - how to use it Tip #7 Prepare deployment script for CloudThis app cannot live just in your local environment. Instead, we need a deployment script.Depending on where you want to deploy your changes, it might beKubernetes cluster scriptGoogle-specific terraform scriptsAWS cloud formation scriptAzure-specific deployment scriptOr ask ChatGPT to suggest a deployment scriptProvide me deployment script for one of most popular cloud providers Tip #8 Version ControlOur code for now is living only locally. Let’s ask chatGPT to give us instructions on how to set up Version ControlProvide Github version control setup instructions Tip #9 Define CI/CD pipelineCI/CD or continuous integration and continuous deployment is a must-have step for any serious development.There are plenty of options to choose from, such asJenkinsGitHub actionsBambooWith CI we guarantee we cansafely merge our changes by running build and testscheck if our code changes comply with sonar policiesWith CD we guarantee that we can safely deploy our changesProvide github actions that for each open pull request we run the build and run all the tests Also automatically include sonarqube scans Also create github action to run deployment on every code merge Tip #10 Performance optimizationOur backend rest service is now running, but the question we need to ask ourselveshow fast is ithow many requests it can handlewhat is the maximum limit of requestsFor that, we need to execute performance tests, e.g. using jmeter or gatling.We need to test what is the limit of our app. Write a load test script for gatling that tests how many book searches we can execute Tip #11 Run a security auditHow can we ensure our app is secure and not open to any threats?The best way is to run security scans.Our application might be open for security threats. Which security scan tools we can use for free and how can we use them. Give me step-by-step instruction on how to use it. Tip #12 Optimize for observabilityYou have your app running somewhere in the cloud.But did you optimize it for observability?How can you easily troubleshoot issues?How can you trace requests between different services?Did you set up monitoringWe want to make sure our application is optimized for observability Create guideline and configuration for the cloud environment for Traceability - tracing request from start to finish Monitoring - monitoring key performance metrics Logging - have a centralized logging system ConclusionYou can find the full prompt herehttps://chat.openai.com/share/f0bef1ca-062d-4a22-96aa-9711615329a5ChatGPT is a tool, and like any tool, it shines when used the right way.With these prompts, you get a coding assistant that keeps up with the latest trends, ensuring your code is not just functional but also follows modern standards.Author BioJakov Semenski is an IT Architect working at IBMiX with almost 20 years of experience.He is also a ChatGPT Speaker at the WeAreDevelopers conference and shares valuable tech stories on LinkedIn.
Read more
  • 0
  • 0
  • 1260

article-image-writing-unit-tests-for-r-functions-with-chatgpt
Dan MacLean
23 Apr 2024
5 min read
Save for later

Writing unit tests for R functions with ChatGPT

Dan MacLean
23 Apr 2024
5 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, R Bioinformatics Cookbook - Second Edition, by Dan MacLean. Discover over 80 recipes for modeling and handling real-life biological data using modern libraries from the R ecosystem.IntroductionChatGPT is beneficial for generating R unit tests because of its knowledge of programming patterns and best practices. It can understand code snippets, including functions and their expected behaviors, and generate relevant unit tests. By leveraging its training on diverse programming contexts, ChatGPT can assist in creating comprehensive test cases that cover different scenarios and edge cases. It helps ensure code correctness, identify potential bugs, and improve code reliability. ChatGPT’s ability to generate unit tests saves developers time and effort, promotes software quality, and contributes to building robust and well-tested R code. In this recipe, we’ll learn how to get ChatGPT to do the tedious work of generating test cases for code that we are working on.Getting readyWe just need a ChatGPT account: https://chat.openai.com/.How to do it…To write unit tests for an R function with ChatGPT, do the following:1. Paste the following request into ChatGPT:write unit tests for the following R code factorise_cols <- function(df, col_list ){ not_all_na <- function(x) any(!is.na(x)) for ( i in col_list){ df <- df %>% dplyr::mutate(!!(i) := as.factor(!!(i)) ) } df %>% dplyr::select_if(not_all_na) }2.  Read and check the output:library(dplyr) library(testthat) # Create a test data frame test_df <- data.frame( col1 = c("A", "B", "C", NA), col2 = c(1, 2, NA, 4), col3 = c("X", "Y", "Z", NA) ) # Define the expected output for the test data frame expected_df <- data.frame( col1 = factor(c("A", "B", "C", NA)), col3 = factor(c("X", "Y", "Z", NA)) ) # Start defining the unit tests test_that("factorise_cols correctly factorises selected columns", { # Call the factorise_cols function result_df <- factorise_cols(test_df, c("col1", "col3")) # Check if the selected columns are factorised expect_equal(result_df$col1, expected_df$col1) expect_equal(result_df$col3, expected_df$col3) # Check if other columns are not modified expect_equal(result_df$col2, test_df$col2) }) test_that("factorise_cols returns a data frame with non-NA columns", { # Call the factorise_cols function result_df <- factorise_cols(test_df, c("col1", "col3")) # Check if the returned data frame only contains non-NA columns expect_true(all(!is.na(result_df$col1))) expect_true(all(!is.na(result_df$col3))) expect_true(all(is.na(result_df$col2))) })And with that, we should have an output that can test some of the facets of our code at least and a good start to a more dependable and workable code base.How it works…The recipe here takes advantage of ChatGPT’s representations of the structure of the R programming language in order to write code that will effectively test some example code. In step 1, we simply define the function we wish to test and ask for tests.In step 2, we see the output that ChatGPT generated in this instance. It has given us a pretty good set of unit tests. As with everything to do with ChatGPT, there isn’t a guarantee that they are correct, but we can read and verify them very easily – certainly in much less time than it would take to write them. One thing to note is that, in this case at least, ChatGPT hasn’t generated tests for the case with only NA in a column, which we may decide we need. It is true that this isn’t clear in the initial code, so generating the test has given us a new thought on the safe running of this function.ConclusionIn conclusion, leveraging ChatGPT for unit testing R functions offers a transformative approach. Its adept understanding of programming nuances simplifies the arduous task of generating comprehensive tests, fostering code reliability and quality assurance. By effortlessly crafting diverse test cases, ChatGPT significantly reduces developers' workload, ensuring code correctness, identifying potential bugs, and fortifying the codebase against edge cases. While it doesn't guarantee absolute correctness, its output provides a solid foundation for enhancing code robustness. Embracing ChatGPT's capabilities not only saves time and effort but also contributes profoundly to building more dependable and well-tested R code, elevating the development process to new levels of efficiency and reliability.Author BioProfessor Dan MacLean has a Ph.D. in molecular biology from the University of Cambridge and gained postdoctoral experience in genomics and bioinformatics at Stanford University in California. Dan is now Head of Bioinformatics at the world-leading Sainsbury Laboratory in Norwich, UK where he works on bioinformatics, genomics, and machine learning. He teaches undergraduates, post-graduates, and post-doctoral students in data science and computational biology. His research group has developed numerous new methods and software in R, Python, and other languages with over 100,000 downloads combined.
Read more
  • 0
  • 0
  • 351

article-image-gemini-10-pro-vision-in-bigquery-python-ui-library-feature-engineering-with-fabric-and-pyspark-power-analytics-with-redshift-amazon-rds-for-mysql
Merlyn Shelley
19 Apr 2024
14 min read
Save for later

Gemini 1.0 Pro Vision in BigQuery, Python UI Library, Feature Engineering with Fabric and PySpark, Power analytics with Redshift, Amazon RDS for MySQL

Merlyn Shelley
19 Apr 2024
14 min read
Subscribe to our BI Pro newsletter for the latest insights. Don't miss out – sign up today!Get the first look at Sigma's new features and functionality at our virtual product launch on May 2nd at 12pm ET/9am PT.The virtual event will showcase talks and demos from Sigma's CEO, co-founders, and product managers about what's next in the future of analytics.Don't miss out. See how Sigma is reinventing BI.👋 Hello,Welcome to BI-Pro #52: Your Premier Destination for Data and BI Insights! 🌟 In This Edition: 🔮 Data Viz with Python Libraries Exploring causality with Python. Meet NiceGUI: Your Soon-to-be Favorite Python UI Library. Feature Engineering with Microsoft Fabric and PySpark. 10 GitHub Repositories to Master Python. 🔌 Power BI On-premises data gateway April 2024 release. Copilot in Power BI expansion. 🛠️ Microsoft Fabric Introducing Optimistic Job Admission for Fabric Spark. Introducing Job Queueing for Notebook in Microsoft Fabric. ☁️ AWS BI Meet Amazon QuickSight expert Sanjeeb Mohapatra. Handle tables without primary keys for Amazon Aurora MySQL and Amazon RDS for MySQL. Power analytics with Amazon Redshift. 🌐 Google Cloud Data Gemini 1.0 Pro Vision in BigQuery. BigQuery data canvas. Gemini in Looker AI-powered BI. Memorystore for Redis Cluster updates. Firestore launch updates. 📊Tableau Tableau vs Power BI: A Comparison of AI-Powered Analytics Tools. Salesforce-Informatica Deal Could Transform Enterprise GenAI Forever. ✨ Expert Insights from Packt Community ChatGPT for Cybersecurity Cookbook by Clint Bodungen. 💡 What's the Latest Scoop from the BI Community? Geospatial Data Analysis with Geemap. Microsoft Fabric Table Maintenance - Checkpoint and Statistics. Identifying Customer Buying Pattern in Power BI - Part 1. Full vs. Incremental Loads – Data Engineering with Fabric. Joining Queries in Azure Data Factory on Cosmos DB Sources. Feature Engineering with Microsoft Fabric and Dataflow Gen2. Stay ahead in the ever-evolving landscape of business intelligence with BI-Pro. Unleash the full potential of your data today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."📣 And here's the twist – we're tuning into YOUR frequency! Inspired by a reader's request, we're launching a column just for you. Got a burning question or a topic you're itching to dive into? Drop your suggestions in our content box – because your journey of discovery is our blueprint.We appreciate your input and hope you enjoy the book!Share your thoughts and opinions here! Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🚀 GitHub's Most Sought-After Repos 🐾 altair - Vega-Altair is a Python library for statistical visualization, offering simplicity, friendliness, and consistency for creating beautiful and effective visualizations. 🐾 bokeh - Bokeh is a Python library for creating interactive plots and data applications in web browsers, offering elegant and versatile graphics. 🐾 bqplot - bqplot is a 2-D visualization system for Jupyter, based on the Grammar of Graphics, enabling interactive plots with other Jupyter widgets. 🐾 cartopy - Cartopy simplifies map drawing in Python, offering easy projection definitions, point transformations, and integration with Matplotlib for advanced mapping. 🐾 diagrams - Diagrams simplifies cloud system architecture design in Python, supporting major providers and frameworks, allowing prototyping and visualization of existing architectures. Email Forwarded? Join BI-Pro Here!🔮 Data Viz with Python Libraries   🐍 Exploring causality with Python. Difference-in-differences: The series dives into causal inference, crucial in modern analytics, explaining tools like difference-in-differences. It explores how events impact outcomes, using examples such as minimum wage effects on employment. The setup involves treatment and control groups to establish cause-and-effect relationships in diverse real-world scenarios. 🐍 Meet the NiceGUI: Your Soon-to-be Favorite Python UI Library. NiceGUI is a Python UI framework for web and desktop apps, offering a simple interface for small projects, dashboards, and robotics. It simplifies state management and interaction, boasting features like easy layout, visualization tools, and integration with popular libraries. 🐍 Feature Engineering with Microsoft Fabric and PySpark: The post delves into feature engineering in Microsoft Fabric, emphasizing its importance in ML development. It explores PySpark's role in handling large datasets and provides a basic overview and example of using PySpark for feature engineering. 🐍 10 GitHub Repositories to Master Python: The blog explores 10 essential GitHub repositories for mastering Python, emphasizing hands-on experience and real-world projects to enhance skills. It covers a range of topics, from beginner to advanced, including machine learning, web development, and data analysis. Asabeneh/30-Days-Of-Python  trekhleb/learn-python  Avik-Jain/100-Days-Of-ML-Code  realpython/python-guide  zhiwehu/Python-programming-exercises  geekcomputers/Python  practical-tutorials/project-based-learning  avinashkranjan/Amazing-Python-Scripts  TheAlgorithms/Python  vinta/awesome-python   ⚡Stay Informed with Industry Highlights Power BI 📊 On-premises data gateway April 2024 release: This update to the on-premises data gateway aligns it with the April 2024 release of Power BI Desktop, ensuring consistency in query execution. Additionally, the gateway now supports refreshes longer than one hour, allowing tokens to be refreshed mid-stream for continuous operation.  📊 Copilot in Power BI: Soon available to more users in your organization. The update introduces changes to Copilot in Power BI, including enabling Copilot by default for all tenants starting May 20th, 2024. It also addresses features reported by customers and community, updates abuse monitoring to not store prompts, and improves geo mapping for EU data boundary customers. Microsoft Fabric📊 Introducing Optimistic Job Admission for Fabric Spark: The post introduces Optimistic Job Admission for Spark in Microsoft Fabric, a new feature aimed at improving concurrency and job admission experience. It explains how this feature optimizes resource allocation and increases the number of concurrent jobs that can be admitted to the cluster. 📊 Introducing Job Queueing for Notebook in Microsoft Fabric: Microsoft Fabric introduces Job Queueing for Notebook Jobs to streamline data engineering and data science processes. This feature automatically queues notebook jobs when Fabric capacity is maxed out, eliminating manual retries and improving user experience. Jobs are retried when resources become available, enhancing efficiency for enterprise users. AWS BI  📊 Meet one of Amazon QuickSight’s Top Community Experts: Sanjeeb Mohapatra. The Amazon QuickSight Community, launched in 2022, is a hub for BI authors and developers to collaborate, ask and answer questions, and learn about QuickSight. Sanjeeb Mohapatra, the top Community Expert for 2023, exemplifies the community's spirit by providing over 1,700 replies and 235 solutions in one year. 📊 Handle tables without primary keys while creating Amazon Aurora MySQL or Amazon RDS for MySQL zero-ETL integrations with Amazon Redshift: AWS is advancing its zero-ETL vision with Amazon Aurora zero-ETL integration to Amazon Redshift, combining transactional data with analytics capabilities. This integration, along with four new ones announced at re:Invent 2023, empowers customers to implement near real-time analytics for various use cases. 📊 Power analytics as a service capabilities using Amazon Redshift: Analytics as a service (AaaS) leverages cloud-based analytic capabilities to enable cost-effective, scalable solutions for organizations. Amazon Redshift, a cloud data warehouse service, facilitates real-time insights and predictive analytics, empowering AaaS providers to embed rich data analytics capabilities. Delivery models include managed, bring-your-own-Redshift (BYOR), and hybrid options, offering flexibility to meet customer needs. Google Cloud Data 📊 How to use Gemini 1.0 Pro Vision in BigQuery? BigQuery integrates with Vertex AI to leverage Gemini 1.0 Pro, PaLM, Vision AI, Speech AI, Doc AI, Natural Language AI, enabling analysis of unstructured data like images, audio, and documents. New integrations support multimodal generative AI, enhancing capabilities for object recognition, info seeking, captioning, digital content understanding, and structured content generation, allowing structured data output for deeper analysis. 📊 Get to know BigQuery data canvas: BigQuery Data Canvas simplifies the data-to-insights journey by offering a natural language-driven experience. It centralizes data tasks, accelerates analysis, and fosters collaboration, all within a unified workspace, enabling faster and more efficient data analytics. 📊 Gemini in Looker to bring intelligent AI-powered BI to everyone: Gemini in Looker introduces Conversational Analytics, transforming how businesses engage with data. It offers a natural language-driven experience, simplifying data analytics and fostering collaboration, all within a unified workspace. 📊 Memorystore for Redis Cluster updates at Next ‘24: The article elaborates on the rapid adoption and recent enhancements of Google Cloud's Memorystore for Redis Cluster. It features customer testimonials from companies like Statsig, Character.AI, and AXON Networks, showcasing the service's performance, scalability, and cost-effectiveness. It also highlights new features such as data persistence, new node types, and ultra-fast vector search. 📊 Firestore launches at Next ‘24: Firestore is beloved by developers for its speed in app development. Updates include improved developer productivity, AI-enabled app building, richer queries, and enterprise-level scalability. Gemini Code Assist now supports Firestore, allowing natural language queries and data model definitions, enhancing the development experience. Firestore also supports AI applications and integrations with LangChain and LlamaIndex for generative AI. Tableau📊 Tableau vs Power BI: A Comparison of AI-Powered Analytics Tools. The comparison delves into the unique strengths of Tableau and Power BI, showcasing how each excels in different areas of data visualization and analytics. It outlines Tableau's robust visualizations and analytics capabilities, especially for large datasets, contrasting with Power BI's integration with Microsoft services and affordability for small to medium-sized businesses. 📊 Salesforce-Informatica Deal Could Transform Enterprise GenAI Forever: Salesforce is reportedly in advanced talks to acquire Informatica, a data-management software provider, for $11 billion. This aligns with Salesforce's strategy to expand beyond CRM, bolstered by recent AI advancements like Einstein Copilot, complementing Informatica's data integration expertise and potential synergy with Tableau and MuleSoft. Additionally, it aligns with Salesforce's strategy to expand beyond CRM and become a comprehensive data journey platform. ✨ Expert Insights from Packt Community ChatGPT for Cybersecurity Cookbook - By Clint Bodungen Sending API Requests and Handling Responses with PythonIn this recipe, we will explore how to send requests to the OpenAI GPT API and handle the responses using Python. We’ll walk through the process of constructing API requests, sending them, and processing the responses using the openai module. Getting ready Ensure you have Python installed on your system. Install the OpenAI Python module by running the following command in your Terminal or command prompt: pip install openai How to do it… The importance of using the API lies in its ability to communicate with and get valuable insights from ChatGPT in real time. By sending API requests and handling responses, you can harness the power of GPT to answer questions, generate content, or solve problems in a dynamic and customizable way. In the following steps, we’ll demonstrate how to construct API requests, send them, and process the responses, enabling you to effectively integrate ChatGPT into your projects or applications: Start by importing the required modules: import openai from openai import OpenAI import os Set up your API key by retrieving it from an environment variable, as we did in the Setting the OpenAI API key as an Environment Variable recipe: openai.api_key = os.getenv("OPENAI_API_KEY") Define a function to send a prompt to the OpenAI API and receive a response:client = OpenAI() def get_chat_gpt_response(prompt):  response = client.chat.completions.create(    model="gpt-3.5-turbo",    messages=[{"role": "user", "content": prompt}],    max_tokens=2048,    temperature=0.7  )  return response.choices[0].message.content.strip() Call the function with a prompt to send a request and receive a response:prompt = "Explain the difference between symmetric and asymmetric encryption." response_text = get_chat_gpt_response(prompt) print(response_text) How it works… First, we import the required modules. The openai module is the OpenAI API library, and the os module helps us retrieve the API key from an environment variable. We set up the API key by retrieving it from an environment variable using the os module. Next, we define a function called get_chat_gpt_response() that takes a single argument: the prompt. This function sends a request to the OpenAI API using the openai.Completion.create() method. This method has several parameters: engine: Here, we specify the engine (in this case, chat-3.5-turbo). prompt: The input text for the model to generate a response. max_tokens: The maximum number of tokens in the generated response. A token can be as short as one character or as long as one word. n: The number of generated responses you want to receive from the model. In this case, we’ve set it to 1 to receive a single response. stop: A sequence of tokens that, if encountered by the model, will stop the generation process. This can be useful for limiting the response’s length or stopping at specific points, such as the end of a sentence or paragraph. temperature: A value that controls the randomness of the generated response. A higher temperature (for example, 1.0) will result in more random responses, while a lower temperature (for example, 0.1) will make the responses more focused and deterministic. Discover more insights from ChatGPT for Cybersecurity Cookbook - By Clint Bodungen. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!Read Here💡 What's the Latest Scoop from the BI Community?  🧠 Geospatial Data Analysis with Geemap: This article introduces geospatial data analysis, focusing on raster data from Google Earth Engine, accessed and analyzed using the Geemap Python library. Earth Engine offers a vast catalog of geospatial datasets, and Geemap simplifies access and analysis, making it easier to work with such data in Python. 🧠 Microsoft Fabric Table Maintenance - Checkpoint and Statistics: This article discusses the maintenance requirements for warehouse tables in Microsoft Fabric, particularly focusing on tasks like updating statistics, removing fragmentation, and managing log files. While some maintenance tasks, such as data compaction and log file checkpointing, are automated, others, like managing statistics, may require manual intervention. 🧠 Identifying Customer Buying Pattern in Power BI - Part 1: This article is part 1 of a retail analytics analysis in Power BI, focusing on customer purchasing frequency for various products over the years. It includes identifying data elements, creating calculated columns, and analyzing trends to aid in business decision-making. 🧠 Full vs. Incremental Loads – Data Engineering with Fabric: This article discusses using Apache Spark in Microsoft Fabric to achieve data quality zones (bronze and silver) in a data lake. It explores loading weather data, transforming it with Spark SQL and DataFrames, and implementing full and incremental load patterns. 🧠 Joining Queries in Azure Data Factory on Cosmos DB Sources: This article provides a detailed guide on joining two queries in Azure Data Factory (ADF). It covers prerequisites, creation of data sources, defining queries for each dataset, and using the "Join" transformation in ADF to merge data. Different join types such as inner, left outer, right outer, and full outer joins are explained. 🧠 Feature Engineering with Microsoft Fabric and Dataflow Gen2: This article introduces Dataflow Gen2 as a low-code data transformation and integration engine for creating data pipelines in Microsoft Fabric. It focuses on using Dataflow Gen2 to create features needed for training a machine learning model with college basketball game data, offering different approaches from no code to all code. See you next time!
Read more
  • 0
  • 0
  • 1014

article-image-ai-for-investment
Louis Owen
12 Apr 2024
12 min read
Save for later

AI for Investment

Louis Owen
12 Apr 2024
12 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionOne of the most important activities for an investor is to always keep up to date with the latest and relevant news. Usually, it’s done by reading at least a dozen news articles starting from macroeconomic issues, political issues, news related to the sector of the corresponding stock, analyst reports, and whatnot. This, of course, takes a lot of time and also sometimes can be overwhelming for new investors since the amount of information to be processed is too much.Many ML developers have tried to solve this issue by building a traditional ML workflow usually called the sentiment analyzer. This system will take text from the news as the input and return the sentiment score as the output. This is no doubt helpful for the investor, but it doesn’t solve the bigger problem which is the need to curate relevant articles and also knowing what’s the impact of each news toward their investment decision. In other words, it’s lacking of broader insight. What if there’s an AI assistant that can act as our personal investment news analyst? What if there’s an AI assistant that is able to analyze dozens of news articles and generate the insights summary along with the investment recommendation? And, what if I told you that this AI assistant is personalized toward your risk appetite and investment portfolio allocation? In this article, I’ll guide you on how to build an AI assistant that can do all the above-mentioned things with only a few lines of code - thanks to GPT4! We’ll discuss several ways to get the news data in bulk and in real-time. We’ll discuss what are the important search keywords we need to use to get relevant news data. We’ll also discuss how to construct the prompt to fulfill all of the above-mentioned criteria while also getting a great generated output. Finally, we’ll see how to put all of this together to build our AI assistant!Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to build your personal AI investment news analyst!News Data SourcesGetting as much news data as possible is important since we don’t want to miss any important information out there. Once we get all the information, we just need to filter them out with the help of our AI assistant.SerpAPI is one of the best all-in-one scraping tools that we can utilize to get news data from Google, Yahoo, Bing, DuckDuckGo, and many other search engines. It also provides a free plan with a 100 searches/month limit. However, this limit is surely not enough for our use case. If you don’t mind spending some money and want to get multiple search results from different search engines, then this tool is suitable for you.Another solution that is more budget-friendly is by utilizing DuckDuckGo search engine API directly. DuckDuckGo is a search engine that offers data privacy as their main unique selling point. No search history will be stored. Moreover, they also open their search engine API for free. We will use DuckDuckGo in this article and learn how to utilize it via Python!The more effective way to widen our search results is actually not by using different search engines but by having a diverse yet mutually exclusive set of search keywords. The goal of our AI investment assistant is to summarize the important insights that are relevant to a particular stock that we’re interested in. Hence, we need to provide relevant news data to be able to achieve our goal.The following are some of the search keywords that we can use. Note that this list is not exhaustive, you can surely expand the search keywords based on your own needs. We’ll use AAPL as the ticker example. You can change it to any ticker you want.$AAPL stock $AAPL industry and competitors $AAPL business model and strategy $AAPL management and leadershipBesides ticker-specific search keywords, we can also search for more general information that is not ticker-specific. The following is an example list of such keywords.economic growth this yearmonetary and fiscal policies todaypolitic todayeconomic todayinflation rate todayinterest rate todayreal estate todayDuckDuckGo APIOnce we have the keywords list, we can easily get the news data using DuckDuckGo via Python. First, we need to install the duckduckgo package by running the following command. pip install duckduckgo-searchOnce it is installed we can create the general Python function that can take the search keyword as the input and return the search results.from duckduckgo_search import DDGS import json ddgs = DDGS() def web_search(query: str, num_results: int = 4,debug=True) -> str:    """Useful for general internet search queries."""    if debug:        print("Searching with query {0}...".format(query))    search_results = []    if not query:        return json.dumps(search_results)    results = ddgs.text(query)    if not results:        return json.dumps(search_results)    total_added = 0    for j in results:        search_results.append(j.get('body',''))        total_added += 1        if total_added >= num_results:            break    return search_resultsUsing this function is very simple. We just need to pass the search keyword along with the number of search results to this function and get the list of search results.apple_competitors_news = web_search(“$AAPL industry and competitors”, num_results = 10)Prompt EngineeringThe next important thing to do is to build our AI assistant. Here, we’ll utilize GPT4 to build our assistant. Since it’s an LLM, we just need to provide the prompt without the need to train it from scratch. However, creating the prompt itself is indeed not an easy task. I have published another article regarding prompt engineering if you’re interested to learn more about it.Remember that the goal of our assistant is to analyze the provided news data dump and return the summary insights along with the recommendation as the output. However, to be able to give a recommendation, our assistant needs to know our risk appetite along with our portfolio condition. The following is an example of the system prompt that we can give to GPT4.system_prompt = “””You are an expert in giving recommendation to BUY / SELL / HOLD for {} ({}). You can only return in JSON format with 5 fields: "Investment Thesis" (dictionary of string. Consist of elaborated decision reasoning (in bullet points) based on the risk profile of the investor, unrealized profit, and all of the factors as the basis of your recommendation. Provide numbers to justify your assertions, a lot ideally. The deeper the analysis the better.), "Investor Profiling" (dictionary of string. Connect the investment thesis with each of the investor profiles, including risk profile and unrealized profit.) "Summary Thesis" (string. Summary of your all investment thesis as the basis of the given recommendation.  You have to take into account all factors in the investment thesis as well as the investor profiles.), "recommendation" ("BUY"/"SELL"/"HOLD") In the investment thesis, please cover the following factors. If a particular factor needed to write the investment thesis does not exist, don't try to make up the answer, just write "The information needed is unavailable". (1) Industry and Competitive Analysis: Assess the company's position within its industry and analyze industry trends, competition, barriers to entry, and market dynamics. (2) News and Events: Stay updated on relevant news, earnings announcements, product launches, regulatory changes, and other events that can impact the company or the overall market. (3) Market and Economic Conditions: Assess broader macroeconomic factors from news, including economic growth, interest rates, inflation, monetary and fiscal policies, geopolitical events, gold price, bond price, index price, real estate.”””And here’s an example of the user prompt that consists of all necessary data points. Risk profiles can be “Moderate”, “Aggresive”, or “Conservative”. user_prompt = “””<INVESTOR PROFILE> Risk Profile: {} Unrealized Profit: {}% {}”””Putting All TogetherNow, we just need to create the main function that will act as our personal AI investment assistant. def personal_investment_assistant(company_name:str, ticker:str, risk_profile: str,  unrealized_profit_perc: float):    news_data = []    for search_keyword in search_kwrds_lst:          news_data.extend(web_search(search_keyword))    news_data = "\n".join(news_data)            messages = [                        {                            "role": "system",                            "content": system_prompt.format(company_name,ticker)                        },                        {                            "role": "user",                            "content": user_prompt.format(risk_profile,unrealized_profit_perc,news_data)                }            ]    response = get_gpt_response("gpt-4",                                temperature = 0.0,                                messages = messages                                                )    return response["choices"][0]["message"]["content"].strip() import requests import json import os def get_gpt_response(model: str,temperature: float,messages: list): headers = {                       'content-type': "application/json",                       'Authorization': "Bearer " + os.environ["OPENAI_API_KEY"]                       } endpoint = 'https://api.openai.com/v1/chat/completions'           data = json.dumps({                                   "model": model, "messages": messages,                                   "temperature": temperature,                                   })             try: data = requests.post(endpoint, data=data, headers=headers)                       openai_response = json.loads(data.text)                       return openai_response           except Exception as e:                       print(e)                       return ""ConclusionCongratulations on keeping up to this point! Throughout this article, you have learned how to build your own personal AI investment analyst based on news data. You have learned how to get the news data, a list of useful search keywords, also the code implementation to build the AI assistant. Hope the best for your investment journey and see you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects. Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.
Read more
  • 0
  • 0
  • 217
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-apples-realm-google-deepminds-gecko-xais-grok-15-salesforce-ais-moira-stability-ais-stable-audio-20-twin-gpt-chatgpt-instant-usage
Merlyn Shelley
08 Apr 2024
12 min read
Save for later

Apple’s ReALM, Google DeepMind’s Gecko, X.ai's Grok 1.5, Salesforce AI’s Moira, Stability AI’s Stable Audio 2.0, TWIN-GPT, ChatGPT Instant usage

Merlyn Shelley
08 Apr 2024
12 min read
Subscribe to our Data Pro newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,Welcome to DataPro#88 – Your portal to the innovations in Data Science & Machine Learning! 🚀 In this edition, you'll find: ⚙️ LLMs & GPTs Unleashed TWIN-GPT: Digital Twins for Clinical Trials. Apple’s ReALM: AI with contextual understanding. Stability AI’s Stable Audio 2.0: Audio synthesis revolution. Salesforce AI’s Moira: Enhancing customer engagement. Google DeepMind’s Gecko: Versatile Text Embeddings. X.ai's Grok 1.5: Enhanced reasoning and context. ✨ What's Fresh & Exciting Distribute LLMs with llamafile: 5 Simple Steps. Dockerized Python Environment: The Elegant Way. Knowledge Distillation: Clone Powerful LLMs. Sora’s Diffusion Transformer (DiT): A Deep Dive. Generative AI: Copyright Reckoning. OpenAI Agent: Function Calling Capabilities. ⚡ Industry Pulse AWS & Mistral AI: Democratizing generative AI. Amazon SageMaker: No-code to code-first ML. Google Cloud Next: Database success stories. Google’s SEEDS in Weather Forecasting: AI quantifies uncertainty. Microsoft’s LLMs in the Imaginarium: Tool Learning. OpenAI: Fine-tuning API and custom models. ChatGPT: Instant usage. Synthetic Voices: Challenges and Opportunities. 📚 Packt's Latest Gem MATLAB for Machine Learning - Second Edition, By Giuseppe Ciaburro. DataPro Newsletter is not just a publication; it’s a comprehensive toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copy and start transforming your data expertise today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share your Feedback!Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🔰 GitHub Finds: Any of These Repos in Your Toolbox?🛠️ UpstageAI/dataverse: Dataverse simplifies ETL pipelines in Python, providing a user-friendly solution for data processing and management, accessible to all. 🛠️ GAP-LAB-CUHK-SZ/gaustudio: GauStudio is a modular framework for 3D Gaussian Splatting, providing streamlined pipelines and tools for easier implementation and deployment. 🛠️ TencentARC/BrushNet: BrushNet is a text-guided image inpainting model that enhances pre-trained diffusion models, focusing on divided features and dense control. 🛠️ agiresearch/AIOS: AIOS embeds LLMs into OS, enhancing resource allocation, context switch, concurrent execution, tool service, access control, and toolkit availability for developers. 🛠️ jasonppy/VoiceCraft: VoiceCraft excels in speech editing and zero-shot text-to-speech, requiring only a few seconds of reference to clone or edit voices. 📚 Expert Insights from Packt CommunityMATLAB for Machine Learning - Second Edition, By Giuseppe Ciaburro.Anomaly Detection in MATLAB Throughout the life cycle of a physical system, the occurrence of failures or malfunctions poses a potential threat to its normal functioning. To safeguard against critical interruptions, it becomes imperative to implement an anomaly detection system within the facility. Termed as a fault diagnosis system, this mechanism is designed to identify potential malfunctions within the monitored system. The pursuit of fault detection stands as a pivotal and defining phase in maintenance interventions, demanding a systematic and deterministic approach to comprehensively analyze all conceivable causes that might have led to the malfunction. Anomaly detection overview Anomaly detection is a technique used in data analysis and ML to identify data points or patterns that deviate significantly from the expected or normal behavior within a dataset. Anomalies, also known as outliers, are data points that do not conform to most of the data and may indicate errors, fraud, unusual events, or other important information. Anomaly detection has various applications across different domains, such as cybersecurity, industrial quality control (QC), finance, healthcare, and more. We can start to get an overview of different types of anomalies to understand what is intended with this term, we will list some types of anomalies: Point anomalies: These are individual data points that are considered anomalies, such as a single fraudulent transaction in a credit card dataset. Contextual anomalies: These are anomalies that are context-dependent. A data point might not be an anomaly on its own but is unusual in a particular context or time, such as a sudden spike in web traffic during a holiday sale. Collective anomalies: These are anomalies that are identified by examining a group of data points collectively. These anomalies involve patterns or relationships between data points. There are several methods for addressing anomaly detection problems, ranging from simple statistical techniques to complex ML algorithms. The choice of method depends on the nature of the data and the specific problem you are trying to solve. Here, we are listing the most used ones: Statistical methods: Statistical techniques such as z-scores, percentiles, and boxplots can be used to identify anomalies based on deviations from the mean or median of the data distribution. ML: Supervised, unsupervised, and semi-supervised ML algorithms can be used for anomaly detection. Some popular methods include Isolation Forest, One-Class Support Vector Machine (One-Class SVM), autoencoders (AEs), and k-means clustering. Time series analysis: Specialized techniques are used for detecting anomalies in time series data, such as autoregressive (AR) models, exponential smoothing, and moving averages (MAs). Density estimation: Methods such as kernel density estimation (KDE) and Gaussian Mixture Models (GMMs) are used to estimate the probability density function of the data and identify anomalies as low-density regions. Deep learning (DL): Neural networks (NNs), especially deep AEs (DAEs) and recurrent NNs (RNNs), are used for anomaly detection in high-dimensional data or sequences. Ensemble methods: Combining multiple anomaly detection models can improve overall performance and robustness. In addressing anomaly detection problems, we have to face some challenges. For example, determining an appropriate threshold for defining anomalies can be challenging. Imbalanced datasets, where anomalies are rare, can make model training and evaluation tricky. Handling high-dimensional data and noisy datasets can also be challenging. Anomaly detection is a valuable tool for identifying rare but potentially important events or patterns in large datasets. The choice of method depends on the specific domain, data characteristics, and the nature of anomalies that need to be detected. Discover more insights from "MATLAB for Machine Learning - Second Edition" by Giuseppe Ciaburro. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!Read Here!⚡ Tech Tidbits: Stay Wired to the Latest Industry Buzz! AWS ML Made Easy 🌀 AWS and Mistral AI commit to democratizing generative AI with a strengthened collaboration: The article discusses the growing use of generative AI applications across industries, facilitated by Amazon Bedrock. It highlights Mistral AI's Mistral Large model, now available on Amazon Bedrock, offering advanced language capabilities. This collaboration aims to provide customers with diverse model options to suit their specific business needs, promoting innovation in AI technology. 🌀 Seamlessly transition between no-code and code-first machine learning with Amazon SageMaker Canvas and Amazon SageMaker Studio: This post discusses Amazon SageMaker Studio, an integrated ML development environment, and SageMaker Canvas, a no-code ML tool, highlighting their features and integration for seamless collaboration between non-ML and ML experts. Google Research 🌀 Get inspired: Database success stories at Google Cloud Next. This blog post previews Google Cloud Next '24, focusing on customers using Google Cloud databases for transformative purposes. It highlights sessions featuring Nuro, Lightricks, Bayer, Yahoo!, and Statsig, showcasing their innovative use cases.🌀 Generative AI to quantify uncertainty in weather forecasting: Google is advancing weather forecasting with innovations like MetNet-3 and SEEDS, a generative AI model. SEEDS efficiently generates probabilistic ensembles, addressing the butterfly effect's uncertainty, and offers cost-effective solutions for extreme weather events. Microsoft Research🌀 LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error. This research enhances large language models' (LLMs) tool usage accuracy through simulated trial and error (STE), inspired by biological systems. STE improves learning by simulating tool use scenarios, interacting with tools, and leveraging short and long-term memory. Results show significant performance boosts over existing methods.OpenAI Updates🌀 Introducing improvements to the fine-tuning API and expanding our custom models program: This update discusses techniques to improve model performance, such as retrieval-augmented generation (RAG) and fine-tuning and introduces new API features for developers to control their fine-tuning jobs, enhancing model quality, reducing costs, and latency. 🌀 Start using ChatGPT instantly: This new initiative aims to make AI more accessible by allowing instant access to ChatGPT without the need to sign up. It targets those curious about AI's potential but hesitant to set up an account, offering a seamless experience for learning, creative inspiration, and answering questions. 🌀 Navigating the Challenges and Opportunities of Synthetic Voices: Voice Engine is a model by OpenAI that generates natural-sounding speech from text input and a short audio sample, closely resembling the original speaker. They're sharing insights from a small-scale preview, highlighting its potential for various applications like reading assistance and personalized responses in education. Email Forwarded? Join DataPro Here!🔍 From Bits to BERT: Keeping Up with LLMs & GPTs 🌀 TWIN-GPT: Digital Twins for Clinical Trials via LLM. The research explores virtual clinical trials' benefits in healthcare, emphasizing patient safety and cost reduction. Existing methods struggle with prediction accuracy due to limited data. TWIN-GPT, a proposed approach, uses large language models to create personalized digital twins, improving predictions and showcasing digital twins' potential in healthcare. 🌀 Apple’s ReALM: AI that can “See” to understand the context: ReALM (Reference Resolution As Language Modeling) addresses the challenge of context understanding, including non-conversational entities like on-screen elements. By leveraging Language Models (LLMs), it demonstrates significant improvements in reference resolution, even outperforming GPT-4, offering over 5% gains for on-screen references. 🌀 Stability AI’s Stable Audio 2.0: Stable Audio 2.0 introduces a groundbreaking AI-generated audio standard, offering high-quality, full tracks up to three minutes long at 44.1kHz stereo. It features audio-to-audio generation, honoring creator rights, and expands creative possibilities, available for free on the Stable Audio website. 🌀 Salesforce AI’s Moira: Moirai is a universal time series forecasting model designed to address diverse forecasting tasks across various domains, frequencies, and variables in a zero-shot manner. It tackles key challenges in forecasting and offers robust performance, making it valuable for IT operations, sales forecasting, and more. 🌀 Google DeepMind’s Gecko: Versatile Text Embeddings Distilled from LLMs. Gecko is a compact text embedding model that achieves strong retrieval performance by distilling knowledge from large language models (LLMs). Its two-step distillation process, generating synthetic paired data and refining data quality, outperforms larger models on the Massive Text Embedding Benchmark. Gecko with 256 dimensions outperforms all entries with 768 dimensions; Gecko with 768 dimensions competes with models 7x larger and 5x higher dimensional embeddings. 🌀 X.ai Unveils Grok 1.5: Enhanced Reasoning and Long Context Features. Grok-1.5, the latest version of x.ai's Grok model, offers improved reasoning and long context capabilities. It excels in coding and math tasks, scoring 50.6% on MATH and 90% on GSM8K benchmarks. Grok-1.5 can process long contexts up to 128K tokens and boasts robust infrastructure for large-scale training. Early testers and existing Grok users on the x.ai platform will soon have access to Grok-1.5, with further features expected to roll out gradually. ✨ On the Radar: Catch Up on What's Fresh🌀 Distribute and Run LLMs with llamafile in 5 Simple Steps: This blog introduces llamaFile, a framework that simplifies using large language models (LLMs) by providing a one-file executable that runs locally without installation. It explains how to use llamaFile with the LLaVa model, a 7-billion-parameter model quantized to 4 bits, for tasks like chat, image uploading, and question-answering. 🌀 Setting A Dockerized Python Environment — The Elegant Way. This blog post demonstrates a more elegant method for setting up a dockerized Python development environment using VScode and the Dev Containers extension. It provides step-by-step instructions and prerequisites, including Docker Desktop, a Docker Hub account, and VScode with the Dev Containers extension installed. The tutorial focuses on using the official Python image (`python:3.10`) and explains the Dev Containers extension's role in creating an isolated VScode session inside a docker container. 🌀 Clone the Abilities of Powerful LLMs into Small Local Models Using Knowledge Distillation: This post explores the use of specialized, smaller-scale language models for specific NLP tasks, such as grammatical error correction. It discusses the process of constructing tailored models through data annotation and fine-tuning, and the use of knowledge distillation to automate labeling. The post provides a workflow for distilling knowledge from a large language model to a smaller one, using prompts and APIs, and demonstrates this process in the context of building a grammatical error correction model. 🌀 Deep Dive into Sora’s Diffusion Transformer (DiT) by Hand: This blog introduces Sora, OpenAI's text-to-video model, explaining its unique approach combining diffusion transformer and transformer strength for video prediction. It explores key concepts like diffusion, dimension reduction, and noise addition, offering insights into how Sora converts text prompts into realistic videos. Ideal for AI enthusiasts and those interested in video generation technologies. 🌀 The Coming Copyright Reckoning for Generative AI: This blog explores the complexities of copyright law in America, particularly in the context of generative AI. It discusses key concepts like original works, fair use, and the implications of generative AI on copyright. It also delves into legal cases and future considerations, offering insights for data scientists and AI enthusiasts. 🌀 Create an Agent with OpenAI Function Calling Capabilities: This article explores the advancements and challenges in developing AI-powered applications in 2024. It discusses how AI streamlines app features for a better user experience and introduces OpenAI's Function Calling to simplify structured data extraction. The article also highlights the ongoing innovations and the future of AI applications. See you next time!
Read more
  • 0
  • 0
  • 395

article-image-revolutionising-work-and-everyday-life-with-chatgpt
M.T. White
01 Apr 2024
14 min read
Save for later

Revolutionising Work and Everyday Life with ChatGPT

M.T. White
01 Apr 2024
14 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionChatGPT is a revolutionary new technology that is making a large impact on society.  The full impact of ChatGPT cannot be fully known at the time of writing this article because of how novel the technology is.  However, what can be said is that since its introduction many industries have been trying to leverage it and increase productivity.  Simultaneously, everyday people are trying to learn to leverage it as well.  Overall, ChatGPT and similar systems are very new and the full impact of how to leverage them will take some more time to fully manifest.  This article is going to explore how ChatGPT can be used for everyday life by exploring a few use cases.What is ChatGPT?       Before we begin, it is important to understand what ChatGPT is and what it isn’t.  To begin ChatGPT is in a lay sense a super advanced chatbot.  More specifically, ChatGPT is known as a generative AI that uses Natural Language Processing (NLP) to create a dialog between a user and itself.  ChatGPT and similar systems are what are known as Large Language Models (LLMs).  In short, for AI models to work they have to be trained using data.  To train LLMs engineers use vast amounts such as books, articles, journals, and so on.  The result is a system like ChatGPT that has a vast knowledge base on many different subjects.  Before we can explore how to use ChatGPT for everyday life we need to explore how NOT to use ChatGPT. How not to use ChatGPT?ChatGPT is very powerful and can be used for many different things; however, is important to understand that ChatGPT is not a sage nor infallible.  Remember ChatGPT only knows about what it was trained on.  This means if the information it was taught was wrong or outdated so too will be the response it provides.  This means that as of writing this article ChatGPT cannot and should not be used as a replacement for humans.  Even with the answer ChatGPT gives, a decent level of domain knowledge is still required properly format and use its response.  As such, it is important to take suggestions with a certain amount of skepticism.  The best way to think of ChatGPT is as an extremely smart friend with a wide range of knowledge.  Though that friend may be very knowledgeable they can and will be wrong from time to time.  With that how does one use ChatGPT?How to use ChatGPT for everyday life?Using ChatGPT starts with what is called a prompt.  A prompt can be thought of as a query that the user feeds into ChatGPT.  This prompt is kind of like a Google search.  It will be a statement that asks ChatGPT a question or to perform a certain task.  Since ChatGPT used NLP, the prompt does not have to be formatted a certain way nor does it require a specific syntax.  In other words, a prompt can be entered the same way a user would pose the instruction to another person.  However, it is good to provide ChatGPT will enough information to adequately describe the problem.  An example, of a prompt is as follows:How can ChatGPT be used for everyday life?Once the prompt is run ChatGPT will produce text with suggestions, answers, programming software, or whatever was asked for.  That output that is produced is called a response which will have the answer to the request/question. There are a lot of ways ChatGPT can be used for everyday life.  In terms of professional life, ChatGPT can make an excellent assistant.  For most “text-based” jobs, that is jobs that don’t require design such as drawing diagrams, ChatGPT can be an excellent assistant.  For example, suppose a programmer needs a small piece of software for a project, assume they need a SQL query that can pull all the rows between two dates.  The user could use a prompt like the following:Write a SQL query that will pull all rows between January 3 and March 31.This prompt will produce the following response:To pull all rows between January 3 and March 31 from a table, you'll need to use the WHERE clause to filter the results based on the date column. Assuming you have a date column named "date_column" in your table, here's the SQL query: sqlCopy code SELECT * FROM your_table WHERE date_column >= '2023-01-03' AND date_column <= '2023-03-31'; Replace "your_table" with the actual name of your table, and "date_column" with the name of the column that contains the dates you want to filter. The query will select all rows where the "date_column" is on or after January 3, 2023, and on or before March 31, 2023.For those that are not familiar with SQL, the prompt produced a simple query with generic placeholders that will pull the correct data out of a database. Not everyone is a programmer, so how could a non-programmer use ChatGPT as an assistant?  One area in which ChatGPT can be used is by composing emails.  Suppose a salesperson needs to inform Tom, their boss, that a $1 million deal was closed today via email.  They could use a prompt like the following:Write Tom an email saying that the deal with Company X went through, and we made $1 million.The response would be something along the lines of:Subject: Successful Deal with Company X - $1 Million Revenue Achieved Hi Tom, I hope this email finds you well. I am thrilled to share some exciting news with you! The deal with Company X has successfully gone through, and I am delighted to inform you that we have made an astounding $1 million in revenue from this collaboration. This achievement wouldn't have been possible without the hard work and dedication of our entire team. It reflects the trust and confidence our partners have in our capabilities, and I believe it marks a significant milestone for our company's growth and success. I want to express my gratitude to you for your valuable contributions and expertise that played a crucial role in making this deal a reality. Your efforts have been commendable, and I am truly grateful for your commitment to the success of this venture. Let's take a moment to celebrate this triumph together. Your exceptional skills and commitment to excellence have made this accomplishment possible, and I am genuinely proud to have you as part of our team. Looking forward to even more prosperous ventures in the future. Once again, congratulations to you and the entire team! Best regards, [Your Name] [Your Title/Position] [Company Name] [Email] [Phone Number]In this case, all the users would have to do is proofread the email, copy it, and send it.  Now, for applications such as this proofreading is very important.  As stated before, ChatGPT is not a mind reader nor a sage, as such it may produce emails with wrong information, illogical wording, or anything else.  So, it’ll pay to read the message before sending it. Learning ToolA very powerful use of ChatGPT is as a learning tool.  For example, students can use this tool to help them master a new subject, generate practice problems, and even check answers to an extent.  Assume the user needs a few extra practice problems for their English homework, they could enter a prompt like the following:Generate some grammar practice problems for a 1st year English classWhich would yield the following output:Of course! Here are some grammar practice problems suitable for a 1st-year English class: 1. Subject-Verb Agreement: Choose the correct verb form that agrees with the subject in each sentence: a) The cat (purr/purrs) when it is happy. b) My friend and I (is/are) going to the movies tonight. c) Neither the students nor the teacher (understands/understand) the assignment. 2. Pronoun Usage: Select the appropriate pronoun to complete each sentence: a) She invited Sarah and (I/me) to the party. b) My sister and (her/she) are going shopping. c) The book is for (he/him) and his brother. 3. Articles (a/an/the): Insert the correct article (a, an, the, or no article) in each sentence: a) Can you pass me (a/an) apple, please? b) I need to buy (the) new shoes for the dance. c) My friend has (no article) car. 4. Prepositions: Choose the correct preposition to complete each sentence: a) We arrived (in/on/at) the airport fifteen minutes early. b) The cat is sleeping (under/below/above) the table. c) She's going to the store (by/with/for) her mom. 5. Verb Tenses: Identify the correct verb tense used in each sentence: a) They (will be/was) studying for the exam all night. b) By the time we arrived, the party (had started/has started). c) I (am/was) watching a movie right now. 6. Adjective vs. Adverb: Decide whether the word in parentheses should be an adjective or adverb in the sentence: a) She sings (beautiful/beautifully). b) The (happy/happily) children played in the park. c) The (quick/quickly) fox jumped over the fence. 7. Negation: Add the necessary negative word to each sentence: a) I have (no/not) seen him today. b) She does (not/none) like coffee. c) We will (not/none) be attending the party. 8. Passive Voice: Rewrite the following sentences in the passive voice: a) The chef cooked the meal. b) The teacher is explaining the lesson. c) The storm destroyed the old building. 9. Comparative and Superlative Forms: Complete each sentence with the correct comparative or superlative form of the adjective: a) This book is (interesting/more interesting/most interesting) than the last one. b) She is (tall/taller/tallest) than her sister. c) It was the (big/biggest/bigger) cake at the party. 10. Reported Speech: Rewrite the following sentences using reported speech: a) Mary said, "I am going to the store." b) "I will call you later," said John. c) "They have arrived," she said. These grammar practice problems aim to reinforce essential grammar rules and concepts, helping 1st-year English students develop their language skills and proficiency.As can be seen, ChatGPT generated a number of questions that one would usually find in an English 101 class.  Now, if the user wanted a more specific answer that could reword the prompt to accommodate. ChatGPT is also a great teaching tool as well.  Teachers can also use this tool to generate lesson plans, write intros for learning modules, generate problems, and more.  Overall, ChatGPT can be used as an excellent tool for academia in general.Now, it is important to remember that ChatGPT is not all-knowing.  It can only generate responses based on the data that it was trained on.  This means if the user is asking questions about a subject where there is little or no publicly available data the response probably isn’t going to be of the highest quality.  For example, if someone askedHow do I build a working time machine?They’re going to get the following:Building a working time machine, as depicted in science fiction, is currently beyond our technological capabilities and goes against the known laws of physics. Time travel, as popularly imagined, involves traveling backward or forward in time, similar to how we move through space. As of my last update in September 2021, time travel remains a concept explored mainly in theoretical physics, and no practical time machine has been built or proven to be possible. Several theories, such as the theory of relativity, suggest that time travel could be theoretically feasible under certain conditions, but these ideas are still purely theoretical and have not been put into practice. It's important to remember that building a time machine would present enormous scientific, technological, and ethical challenges. It could raise paradoxes, such as the "grandfather paradox," where a time traveler could potentially alter past events and create inconsistencies in history. As of now, time travel remains a fascinating topic in science fiction and theoretical physics, but it is not something that can be practically accomplished with our current understanding and technology. The scientific community continues to explore the nature of time and the universe, but building a working time machine as portrayed in movies and books is not yet a reality.ChatGPT gave some general information on time travel but unfortunately, it didn’t generate any schematics.  Again, this goes back to ChatGPT not being a sage as it only knows what it was “taught”.  Since there is no data on how to build a working time machine ChatGPT could not fully answer the prompt. Now, If one looks at the English question prompt above, one will notice that practice questions were quite broad.  It is common to have to “dig” with ChatGPT.  In other words, a person may have to refine the queriers to get what they need.  To get some practice try re-wording the prompt to generate practice questions for only passive writing.  SummaryChatGPT is a tool, and like any other tool, what it’s used for is up to the user.  As was seen in this article, ChatGPT is an excellent tool for helping a person through their day by generating software, emails, and so on.  ChatGPT can also be used as a great learning or teaching device to help students and teachers generate practice problems, create lesson plans, and so much more.  However, as was stated so many numerous times.  Unless ChatGPT has been trained on something it does not know about it.  This means that asking it things like how to build a time machine or domain specific concepts aren’t going to return quality responses.  Also, even if ChatGPT has been trained on the prompt, it may not always generate a quality response.  No matter the use case, the response should be vetted for accuracy.  This may mean doing a little extra research with the response given, testing the output, or whatever needs to be done to verify the response. Overall, ChatGPT at the time of writing this article is less than a year old.  This means that the full implication of using ChatGPT are not fully understood.  Also, how to fully leverage ChatGPT is not understood yet either.  What can be said is that ChatGPT and similar LLM systems will probably be the next Google.  In terms of everyday use, the only true inhibitors are the user's imagination and the data that was used to train ChatGPT.Author BioM.T. White has been programming since the age of 12. His fascination with robotics flourished when he was a child programming microcontrollers such as Arduino. M.T. currently holds an undergraduate degree in mathematics, and a master's degree in software engineering, and is currently working on an MBA in IT project management. M.T. is currently working as a software developer for a major US defense contractor and is an adjunct CIS instructor at ECPI University. His background mostly stems from the automation industry where he programmed PLCs and HMIs for many different types of applications. M.T. has programmed many different brands of PLCs over the years and has developed HMIs using many different tools.Author of the book: Mastering PLC Programming
Read more
  • 0
  • 0
  • 458

article-image-chatgpt-for-data-governance
Jyoti Pathak
22 Mar 2024
11 min read
Save for later

ChatGPT for Data Governance

Jyoti Pathak
22 Mar 2024
11 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionThe digital landscape is ever-accelerating. Here, data reigns supreme. The synergy between advanced technologies and effective governance practices is pivotal. ChatGPT, a revolutionary Artificial Intelligence marvel, is poised to transform the realm of data governance. Let us see ChatGPT's impact on data governance, explore its capabilities, unravel its applications, and understand how it stands as a beacon of innovation in AI-powered data management. We will delve into the heart of this transformative technology and discover the future of data governance, redefined by ChatGPT's prowess.                                                                                                                           SourceUnderstanding Data GovernanceData governance refers to managing, protecting, and ensuring high data quality within an organization. It involves defining policies, procedures, and roles to ensure data accuracy, privacy, and security.Best Practices for Data GovernanceThe best practices for data governance include:Define clear data governance policiesData Quality assuranceData classification and sensitivityMetadata managementData Security and encryptionCompliance with regulationsData access controlsData lifecycle managementData governance trainingData monitoring and auditingEthical considerationsCollaboration and communicationsThe 3 Key Roles of Data Governance1. Data Stewards:Data stewards act as custodians, overseeing the quality, integrity, and data compliance within an organization. They define and enforce data policies, ensuring data is accurate, consistent, and compliant with regulatory requirements. Think of them as the vigilant gatekeepers, ensuring that data remains trustworthy and reliable.Practical Example:Imagine a data steward in a financial institution verifying customer information. By meticulously cross-referencing data from various sources, they ensure the customer's details are accurate and consistent, preventing errors in financial transactions.2. Data Custodians:Data custodians offer technical implementation of data governance policies. They manage data storage, access controls, and encryption, safeguarding data against unauthorized access or tampering. Custodians are the architects behind the secure data infrastructure.Practical Example:A data custodian in a healthcare organization implements encryption protocols for patient records. This encryption ensures that sensitive patient data is protected, even if unauthorized access is attempted, maintaining confidentiality and compliance with data protection laws.3. Data Users:Data users are individuals or departments that utilize data for decision-making processes. They must adhere to data governance policies while extracting insights from data. Data users rely on accurate and reliable data to make informed choices, making them integral to the governance framework.Practical Example:Marketing professionals analyzing customer behavior data to tailor marketing campaigns are data users. By adhering to data governance policies, they ensure that the insights derived are based on trustworthy data, leading to effective and targeted marketing strategies.Data Governance ToolsData governance tools facilitate the implementation of governance policies. Let's explore some powerful data governance tools, including code snippets and practical insights, illuminating their transformative impact.                                                                                                                                Source1. Collibra: Unifying Data Governance EffortsPractical Insight: Collibra acts as a centralized hub, unifying data governance efforts across an organization. It enables collaboration among data stakeholders, streamlining policy management and ensuring consistent data definitions.Code Snippet: Automating Data Quality Checksimport collibra # Connect to Collibra API collibra.connect(api_key="your_api_key", base_url="https://collibra_instance/api") # Define data quality checks data_quality_checks = {    "Check for Missing Values": "SELECT COUNT(*) FROM table_name WHERE column_name IS NULL;",    # Add more checks as needed } # Execute data quality checks for check_name, sql_query in data_quality_checks.items():    result = collibra.execute_sql_query(sql_query)    print(f"{check_name}: {result}")2. IBM InfoSphere: Ensuring Data AccuracyPractical Insight: IBM InfoSphere offers advanced data profiling and data quality capabilities. It analyzes data sources, identifies anomalies, and ensures data accuracy, laying the foundation for trustworthy decision-making.Code Snippet: Data Profiling with IBM InfoSpherefrom ibm_infosphere import InfoSphereClient # Connect to InfoSphere client = InfoSphereClient(username="your_username", password="your_password") # Profile data from a CSV file data_profile = client.profile_data(file_path="data.csv") # Analyze profile results print("Data Profile Summary:") print(f"Number of Rows: {data_profile.num_rows}") print(f"Column Statistics: {data_profile.column_stats}")3. Apache Atlas: Navigating Data LineagePractical Insight: Apache Atlas enables comprehensive data lineage tracking. It visualizes how data flows through the organization, aiding compliance efforts and ensuring a clear understanding of data origins and transformations.Code Snippet: Retrieve Data Lineage Informationfrom apache_atlas import AtlasClient # Connect to Apache Atlas server atlas_client = AtlasClient(base_url="https://atlas_instance/api") # Get data lineage for a specific dataset dataset_name = "your_dataset" data_lineage = atlas_client.get_data_lineage(dataset_name) # Visualize data lineage graph (using a visualization library) visualize_data_lineage(data_lineage)How Can AI Be Used in Governance?Artificial Intelligence (AI) holds immense potential in enhancing governance processes, making them more efficient, transparent, and data-driven. Here are several ways AI can be used in governance, along with relevant examples and code snippets:● Automated Data AnalysisApplication: AI algorithms can analyze vast datasets, extracting meaningful insights and patterns to aid decision-making in governance.Example: Code Snippet for Automated Data Analysisimport pandas as pd from sklearn.ensemble import RandomForestClassifier # Load governance data governance_data = pd.read_csv("governance_data.csv") # Extract features and target variable X = governance_data.drop(columns=["outcome"]) y = governance_data["outcome"] # Train AI model (Random Forest Classifier) model = RandomForestClassifier() model.fit(X, y) # Make predictions for governance decisions predictions = model.predict(new_data)● Natural Language Processing (NLP) for Policy AnalysisApplication: NLP algorithms can analyze legal documents, policies, and public opinions, providing insights to policymakers.Example: Code Snippet for Policy Text Analysisimport nltk from nltk.sentiment.vader import SentimentIntensityAnalyzer # Sample policy text policy_text = "The new governance policy aims to enhance transparency and accountability." # Sentiment analysis using VADER (Valence Aware Dictionary and sEntiment Reasoner) analyzer = SentimentIntensityAnalyzer() sentiment_score = analyzer.polarity_scores(policy_text) print("Sentiment Score:", sentiment_score)● Predictive Analytics for Resource AllocationApplication: AI models can predict trends and demands, enabling governments to allocate resources efficiently in healthcare, transportation, or disaster management.Example: Code Snippet for Predictive Resource Allocationimport pandas as pd from sklearn.linear_model import LinearRegression # Load historical data (e.g., healthcare admissions) historical_data = pd.read_csv("historical_data.csv") # Extract features and target variable X = historical_data.drop(columns=["resource_allocation"]) y = historical_data["resource_allocation"] # Train AI model (Linear Regression for prediction) model = LinearRegression() model.fit(X, y) # Predict resource allocation for future scenarios predicted_allocation = model.predict(new_data)● Chatbots for Citizen EngagementApplication: AI-powered chatbots can handle citizen queries, provide information, and offer assistance, improving public services.Example: Code Snippet for Chatbot Implementationfrom chatterbot import ChatBot from chatterbot.trainers import ChatterBotCorpusTrainer # Initialize chatbot chatbot = ChatBot("GovernanceBot") # Train chatbot with corpus data trainer = ChatterBotCorpusTrainer(chatbot) trainer.train("chatterbot.corpus.english") # Get response for citizen query citizen_query = "How to pay property taxes online?" response = chatbot.get_response(citizen_query) print("Chatbot Response:", response)● Fraud Detection and SecurityApplication: AI algorithms can detect patterns indicative of fraud or security breaches, enhancing the integrity of governance systems.Example: Code Snippet for Fraud Detectionfrom sklearn.ensemble import IsolationForest # Load transaction data transaction_data = pd.read_csv("transaction_data.csv") # Extract features X = transaction_data.drop(columns=["transaction_id"]) # Detect anomalies using Isolation Forest algorithm model = IsolationForest(contamination=0.05) anomalies = model.fit_predict(X) # Identify and handle potential fraud cases fraud_cases = transaction_data[anomalies == -1] Example Code Snippet: AI-Powered Anomaly Detection from sklearn.ensemble import IsolationForest # Assume 'X' is the feature matrix model = IsolationForest(contamination=0.1) anomalies = model.fit_predict(X) print("Anomalies Detected:\n", anomalies)How Does AI Affect Data Governance?AI affects data governance by automating tasks related to data management, analysis, and compliance. Machine learning algorithms can process large datasets, identify trends, and predict potential governance issues. AI-driven tools enable real-time data monitoring, allowing organizations to proactively address governance challenges ensuring that data remains accurate, secure, and compliant with regulations.Example Code Snippet: AI-Driven Predictive Analyticsfrom sklearn.linear_model import LinearRegression # Assume 'X' is the feature matrix and 'y' is the target variable model = LinearRegression() model.fit(X, y) # Predict future values using the trained AI model future_data = prepare_future_data()  # Function to prepare future data predicted_values = model.predict(future_data) print("Predicted Values:\n", predicted_values)Critical Role of Data Governance in AIData governance plays a pivotal role in shaping the trajectory of Artificial Intelligence (AI) applications, influencing their accuracy, reliability, and ethical implications.Let's explore why data governance is indispensable for AI, illustrated through practical examples and code snippets.1. Ensuring Data Quality and AccuracyImportance: Inaccurate or inconsistent data leads to flawed AI models, hindering their effectiveness.Example: Code Snippet for Data Cleaningimport pandas as pd # Load dataset data = pd.read_csv("raw_data.csv") # Handle missing values data_cleaned = data.dropna() # Handle duplicates data_cleaned = data_cleaned.drop_duplicates() # Ensure consistent data formats data_cleaned['date_column'] = pd.to_datetime(data_cleaned['date_column'])2. Addressing Bias and Ensuring FairnessImportance: Biased data can perpetuate discrimination in AI outcomes, leading to unfair decisions.Example: Code Snippet for Bias Detectionfrom aif360.datasets import CompasDataset from aif360.algorithms.preprocessing import Reweighing # Load dataset dataset = CompasDataset() # Detect and mitigate bias privileged_group = [{'race': 1}] unprivileged_group = [{'race': 0}] privileged_groups = [privileged_group] unprivileged_groups = [unprivileged_group] rw = Reweighing(unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups) dataset_transformed = rw.fit_transform(dataset)3. Ensuring Data Security and PrivacyImportance: AI often deals with sensitive data; governance ensures its protection.Example: Code Snippet for Data Encryptionfrom cryptography.fernet import Fernet # Generate encryption key key = Fernet.generate_key() cipher_suite = Fernet(key) # Encrypt sensitive data encrypted_data = cipher_suite.encrypt(b"Sensitive information")4. Promoting Ethical Decision-MakingImportance: Ethical considerations shape AI’s impact on society; governance ensures ethical use.Example: Code Snippet for Ethical AI Policy Implementationdef check_ethical_guidelines(decision):    ethical_guidelines = ["fairness", "transparency", "accountability"]    if any(keyword in decision for keyword in ethical_guidelines):        return True    else:        return False decision = "Implement AI system with transparency." is_ethical = check_ethical_guidelines(decision)5. Adhering to Regulatory ComplianceImportance: Compliance with regulations builds trust and avoids legal repercussions.Example: Code Snippet for GDPR Compliancefrom gdpr_utils import GDPRUtils # Check GDPR compliance user_data = {    "name": "John Doe",    "email": "john.doe@example.com",    "age": 30,    # ... other user data fields } is_gdpr_compliant = GDPRUtils.check_compliance(user_data)Data governance is the cornerstone, ensuring that AI technologies are innovative but also ethical, secure, and reliable. By implementing robust data governance frameworks and integrating ethical considerations, organizations can unleash the full potential of AI, fostering a future where technological advancements are not just groundbreaking but also responsible and beneficial for all.ConclusionAs organizations grapple with the complexities of data management, ChatGPT stands tall, offering a sophisticated solution that transcends boundaries. Its ability to automate, analyze, and assist in real-time reshapes the landscape of data governance, propelling businesses into a future where informed decisions, ethical practices, and compliance are seamlessly intertwined. With ChatGPT at the helm, data governance is not merely a task; it becomes a strategic advantage, empowering enterprises to harness the full potential of their data securely and intelligently. Embrace the future of data governance with ChatGPT, where precision meets innovation and where data is not just managed but masterfully orchestrated for unparalleled success.Author BioJyoti Pathak is a distinguished data analytics leader with a 15-year track record of driving digital innovation and substantial business growth. Her expertise lies in modernizing data systems, launching data platforms, and enhancing digital commerce through analytics. Celebrated with the "Data and Analytics Professional of the Year" award and named a Snowflake Data Superhero, she excels in creating data-driven organizational cultures.Her leadership extends to developing strong, diverse teams and strategically managing vendor relationships to boost profitability and expansion. Jyoti's work is characterized by a commitment to inclusivity and the strategic use of data to inform business decisions and drive progress.
Read more
  • 0
  • 0
  • 6995

article-image-using-chatgpt-for-customer-service
Amita Kapoor
07 Mar 2024
10 min read
Save for later

Using ChatGPT for Customer Service

Amita Kapoor
07 Mar 2024
10 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionCustomer service bots of old can often feel robotic, rigid, and painfully predictable. But enter ChatGPT: the fresher, more dynamic contender in the bot arena.ChatGPT isn't just another bot. It's been meticulously trained on a vast sea of text and code, equipping it to grapple with questions that would stump its predecessors. And it's not limited to just customer queries; this versatile bot can craft a range of text formats, from poems to programming snippets.But the standout feature? ChatGPT's touch of humour. It's not just about answering questions; it's about engaging in a way that's both informative and entertaining. So if you're in search of a customer service experience that's more captivating than the norm, it might be time to chat with ChatGPT. Onboarding ChatGPT: A Quick and Easy GuideReady to set sail with ChatGPT? Here's your easy guide to make sure you're all set and ready to roll:1. Obtain the API Key: First, you'll need to get an API key from OpenAI. This is like your secret password to the world of ChatGPT. To get an API key, head to the OpenAI platform and sign up. Once you're signed in, go to the API section and click on "Create New Key."2. Integrate ChatGPT with Your System: Once you have your API key, you can integrate ChatGPT with your system. This is like introducing ChatGPT to your system and making sure they're friends, ready to work together smoothly. To integrate ChatGPT, you'll need to add your API key into your system's code. The specific steps involved will vary depending on your system, but there are many resources available online to help you. Here is an example of how you can do it in Python:import openai import os # Initialize OpenAI API Client api_key = os.environ.get("OPENAI_API_KEY") # Retrieve the API key from environment variables openai.api_key = api_key # Set the API key # API parameters model = "gpt-3.5-turbo"  # Choose the appropriate engine max_tokens = 150  # Limit the response length3. Fine-Tune ChatGPT (Optional): ChatGPT is super smart, but sometimes you might need it to learn some specific stuff about your company. That's where fine-tuning comes in. To fine-tune ChatGPT, you can provide it with training data that is specific to your company. This could include product information, customer service FAQs, or even just examples of the types of conversations that you want ChatGPT to be able to handle. Fine-tuning is not required, but it can help to improve the performance of ChatGPT on your specific tasks. [https://www.packtpub.com/article-hub/fine-tuning-gpt-35-and-4].And that's it! With these three steps, ChatGPT will be all set to jump in and take your customer service to the next level. Ready, set, ChatGPT!Utilise ChatGPT for Seamless Question AnsweringIn the ever-evolving world of customer service, stand out by integrating ChatGPT into your service channels, making real-time, accurate response a seamless experience for your customers.  Let’s delve into an example to understand the process better.Example: EdTech Site with Online K-12 CoursesImagine operating a customer service bot for an EdTech site with online courses for K-12. You want to ensure that the bot provides answers only on relevant questions, enhancing the user experience and ensuring the accuracy and efficiency of responses. Here's how you can achieve this:1. Pre-defined Context:Initiate the conversation with a system message that sets the context for the bot’s role.role_gpt = "You are a customer service assistant for an EdTech site that offers online K-12 courses. Provide information and assistance regarding the courses, enrollment, and related queries." This directive helps guide the model's responses, ensuring they align with the expected topics.2. Keyword Filtering:Implement keyword filtering to review user’s queries for relevance to topics the bot handles. If the query includes keywords related to courses, enrollment, etc., the bot answers; otherwise, it informs the user about the limitation. Here's a basic example of a keyword filtering function in Python. This function is_relevant_query checks if the query contains certain keywords related to the services offered by the EdTech site.def is_relevant_query(query, keywords): """ Check if the query contains any of the specified keywords. :param query: str, the user's query :param keywords: list of str, keywords to check for :return: bool, True if query contains any keyword, False otherwise """ query = query.lower() return any(keyword in query for keyword in keywords) # Usage example: keywords = ['enrollment', 'courses', 'k-12', 'online learning'] query = "Tell me about the enrollment process." is_relevant = is_relevant_query(query, keywords)Next, we combine the bot role and user query to build the complete messagemessages = [ {    "role": "system",    "content": f"{role_gpt}" }, {"role": "user", "content": f"{query}"} ]We now make the openAI API can only when the question is relevant:is_relevant = is_relevant_query(query, keywords) if is_relevant: # Process the query with ChatGPT     # Make API call response = openai.ChatCompletion.create( model=model, messages=messages ) # Extract and print chatbot's reply chatbot_reply = response['choices'][0]['message']['content' print("ChatGPT: ", chatbot_reply) else: print("I'm sorry, I can only answer questions related to enrollment, courses, and online learning for K-12.")To elevate the user experience, prompt your customers to use specific questions. This subtle guidance helps funnel their queries, ensuring they stay on-topic and receive the most relevant information quickly. Continuous observation of user interactions and consistent collection of their feedback is paramount. This valuable insight allows you to refine your bot, making it more intuitive and adept at handling various questions. Further enhancing the bot's efficiency, enable a feature where it can politely ask for clarification on vague or ambiguous inquiries. This ensures your bot continues to provide precise and relevant answers, solidifying its role as an invaluable resource for your customers.Utilise ChatGPT to tackle Frequently Asked QuestionsAmidst the myriad of queries in customer service, frequently asked questions (FAQs) create a pattern. With ChatGPT, transform the typical, monotonous FAQ experience into an engaging and efficient one.Example: A Hospital ChatbotConsider the scenario of a hospital chatbot. Patients might have numerous questions before and after appointments. They might be inquiring about the hospital’s visitor policies, appointment scheduling, post-consultation care, or the availability of specialists. A well-implemented ChatGPT can swiftly and accurately tackle these questions, giving relief to both the hospital staff and the patients.  Here is a tentative role setting for such a bot:role_gpt = "You are a friendly assistant for a hospital, guiding users with appointment scheduling, hospital policies, and post-consultation care."This orientation anchors the bot within the healthcare context, offering relevant and timely patient information. For optimal results, a finely tuned ChatGPT model for this use case is ideal. This enhancement allows for precise, context-aware processing of healthcare-related queries, ensuring your chatbot stands as a trustworthy, efficient resource for patient inquiries.The approach outlined above can be seamlessly adapted to various other sectors. Imagine a travel agency, where customers frequently inquire about trip details, booking procedures, and cancellation policies. Or consider a retail setting, where questions about product availability, return policies, and shipping details abound. Universities can employ ChatGPT to assist students and parents with admission queries, course details, and campus information. Even local government offices can utilize ChatGPT to provide citizens with instant information about public services, documentation procedures, and local regulations. In each scenario, a tailored ChatGPT, possibly fine-tuned for the specific industry, can provide swift, clear, and accurate responses, elevating the customer experience and allowing human staff to focus on more complex tasks. The possibilities are boundless, underscoring the transformative potential of integrating ChatGPT in customer service across diverse sectors. Adventures in AI Land🐙 Octopus Energy: Hailing from the UK's bustling lanes, Octopus Energy unleashed ChatGPT into the wild world of customer inquiries. Lo and behold, handling nearly half of all questions, ChatGPT isn’t just holding the fort – it’s conquering, earning accolades and outshining its human allies in ratings!📘 Chegg: Fear not, night-owl students! The world of academia isn’t left behind in the AI revolution. Chegg, armed with the mighty ChatGPT (aka Cheggmate), stands as the valiant knight ready to battle those brain-teasing queries when the world sleeps at 2 AM. Say goodbye to the midnight oil blues!🥤 PepsiCo: Oh, the fizz and dazzle! The giants aren’t just watching from the sidelines. PepsiCo, joining forces with Bain & Company, bestowed upon ChatGPT the quill to script their advertisements. Now every pop and fizz of their beverages echo with the whispers of AI, making each gulp a symphony of allure and refreshment.Ethical Considerations for Customer Service ChatGPTIn the journey of enhancing customer service with ChatGPT, companies should hold the compass of ethical considerations steadfast. Navigate through the AI world with a responsible map that ensures not just efficiency and innovation but also the upholding of ethical standards. Below are the vital checkpoints to ensure the ethical use of ChatGPT in customer service:Transparency: Uphold honesty by ensuring customers know they are interacting with a machine learning model. This clarity builds a foundation of trust and sets the right expectations.Data Privacy: Safeguard customer data with robust security measures, ensuring protection against unauthorized access and adherence to relevant data protection regulations. For further analysis or training, use anonymized data, safeguarding customer identity and sensitive information.Accountability: Keep a watchful eye on AI interactions, ensuring the responses are accurate, relevant, and appropriate. Establish a system for accountability and continuous improvement.Legal Compliance: Keep the use of AI in customer service within the bounds of relevant laws and regulations, ensuring compliance with AI, data protection, and customer rights laws.User Autonomy: Ensure customers have the choice to switch to a human representative, maintaining their comfort and ensuring their queries are comprehensively addressed.TConclusionTo Wrap it Up (with a Bow), if you're all about leveling up your customer service game, ChatGPT's your partner-in-crime. But like any good tool, it's all about how you wield it. So, gear up, fine-tune, and dive into this AI adventure!Author BioAmita Kapoor is an accomplished AI consultant and educator with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita retired early and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. After her retirement, Amita founded NePeur, a company providing data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford. 
Read more
  • 0
  • 0
  • 950
article-image-getting-started-with-microsoft-guidance
Prakhar Mishra
28 Feb 2024
8 min read
Save for later

Getting Started with Microsoft Guidance

Prakhar Mishra
28 Feb 2024
8 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionThe emergence of a massive language model is a watershed moment in the field of artificial intelligence (AI) and natural language processing (NLP). Because of their extraordinary capacity to write human-like text and perform a range of language-related tasks, these models, which are based on deep learning techniques, have earned considerable interest and acceptance. This field has undergone significant scientific developments in recent years. Researchers all over the world have been developing better and more domain-specific LLMs to meet the needs of various use cases.Large Language Models (LLMs) such as GPT-3 and its descendants, like any technology or strategy, have downsides and limits. And, in order to use LLMs properly, ethically, and to their maximum capacity, it is critical to grasp their downsides and limitations. Unlike large language models such as GPT-4, which can follow the majority of commands. Language models that are not equivalently large enough (such as GPT-2, LLaMa, and its derivatives) frequently suffer from the difficulty of not following instructions adequately, particularly the part of instruction that asks for generating output in a specific structure. This causes a bottleneck when constructing a pipeline in which the output of LLMs is fed to other downstream functions.Introducing Guidance - an effective and efficient means of controlling modern language models compared to conventional prompting methods. It supports both open (LLaMa, GPT-2, Alpaca, and so on) and closed LLMs (ChatGPT, GPT-4, and so on). It can be considered as a part of a larger ecosystem of tools for expanding the capabilities of language models.Guidance uses Handlebars - a templating language. Handlebars allow us to build semantic templates effectively by compiling templates into JavaScript functions. Making it’s execution faster than other templating engines. Guidance also integrates well with Jsonformer - a bulletproof way to generate structured JSON from language models. Here’s a detailed notebook on the same. Also, in case you were to use OpenAI from Azure AI then Guidance has you covered - notebook.Moving on to some of the outstanding features that Guidance offers. Feel free to check out the entire list of features.Features1. Guidance Acceleration - This addition significantly improves inference performance by efficiently utilizing the Key/Value caches as we proceed through the prompt by keeping a session state with the LLM inference. Benchmarking revealed a 50% reduction in runtime when compared to standard prompting approaches. Here’s the link to one of the benchmarking exercises. The below image shows an example of generating a character profile of an RPG game in JSON format. The green highlights are the generations done by the model, whereas the blue and no highlights are the ones that are copied as it is from the input prompt, unlike the traditional method that tries to generate every bit of it.SourceNote: As of now, the Guidance Acceleration feature is implemented for open LLMs. We can soon expect to see if working with closed LLMs as well.2.  Token Healing - This feature attempts to correct tokenization artifacts that commonly occur at the border between the end of a prompt and the start of a group of generated tokens.For example - If we ask LLM to auto-complete a URL with the below-mentioned Input, it’s likely to produce the shown output. Apart from the obvious limitation that the URL might not be valid. I'd like to draw your attention to the extra space it creates (highlighted in red). Such considerations make it difficult to construct a dependable parsing function and robustly absorb its result into subsequent phases.Input: “The link is <a href=http:”Actual Output: “The link is <a href=http: //www.google.com/search?q”Expected Output: “The link is <a href=http://www.google.com/search?q” This is the exact bucket of problems that Token Healing tries to solve using the backtracking method. Feel free to check out this jupyter notebook for more examples.3. Guaranteed Output Structure - Large language models are fantastic at producing useful outputs, but not so much at producing outputs in a specified format (especially open-source ones like LLaMa, GPT-2, and so on). When we want to use the output of a language model as input to another system, this is frequently an issue. With Handlebars, guidance guarantees the output format to be the same as what was being asked for.Let’s now see Guidance in action -InstallationInstalling guidance is a breeze, just do a pip :$ pip install guidanceAssume we are now creating a product description for an e-commerce website. Here's how the traditional generation compares to the guidance generation. Feel free to play with this colab notebook with both the below examples.Traditional GenerationInput:Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The below shown JSON is the structure of the description with constraints for each of the attributes enclosed within < > brackets. Please follow the mentioned constraint and format diligently. { prod_id: <numeric value of 5 digits>, prod_name: <name starts with the prefix 'p_'>, prod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } The product description isOutput:Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The below shown JSON is the structure of the description with constraints for each of the attributes enclosed within < > brackets. Please follow the mentioned constraint and format diligently. {   prod_id: <numeric value of 5 digits>,     prod_name: <name starts with the prefix 'p_'>, prod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } The product description is { resentprod_id: <numeric value of 5 digits>, resentprod_name: <name begins with the prefix 'p_'>, resentprod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } In the above example, the product description has 5 constraint fields and 5 attribute fields. The constraints are as follows: resentprod_id: - value of 5 digits, resentprod_name: - name of the product, resentprod_price: - price of the product, resentprod_price_suffix: - suffix of the product price, resentprod_id: - the product id, resentpro diabetic_id: value of 4 digits, resentprod_ astronomer_id: - value of 4 digits, resentprod_ star_id: - value of 4 digits, resentprod_is_generic: - if the product is generic and not the generic type, resentprod_type: - the type of the product, resentprod_is_generic_typeHere’s the code for the above example with GPT-2 language model -``` from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("gpt2-large") model = AutoModelForCausalLM.from_pretrained("gpt2-large") inputs = tokenizer(Input, return_tensors="pt") tokens = model.generate( **inputs, max_new_tokens=256, temperature=0.7, do_sample=True, )Output:tokenizer.decode(tokens[0], skip_special_tokens=True)) ```Guidance GenerationInput w/ code:guidance.llm = guidance.llms.Transformers("gpt-large") # define the prompt program = guidance("""Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The following is the format ```json { "prod_id": "{{gen 'id' pattern='[0-9]{5}' stop=','}}", "prod_name": "{{gen 'name' pattern='p_[A-Za-z]+' stop=','}}", "prod_price": "{{gen 'price' pattern='\b([1-9]|1[0-6])\b\$' stop=','}}" }```""") # execute the prompt Output = program()Output:Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of a fixed set of fields to be filled in the JSON. The following is the format```json { "prod_id": "11231", "prod_name": "p_pizzas", "prod_price": "11$" }```As seen in the preceding instances, with guidance, we can be certain that the output format will be followed within the given restrictions no matter how many times we execute the identical prompt. This capability makes it an excellent choice for constructing any dependable and strong multi-step LLM pipeline.I hope this overview of Guidance has helped you realize the value it may provide to your daily prompt development cycle. Also, here’s a consolidated notebook showcasing all the features of Guidance, feel free to check it out.Author BioPrakhar has a Master’s in Data Science with over 4 years of experience in industry across various sectors like Retail, Healthcare, Consumer Analytics, etc. His research interests include Natural Language Understanding and generation, and has published multiple research papers in reputed international publications in the relevant domain. Feel free to reach out to him on LinkedIn
Read more
  • 0
  • 0
  • 323

article-image-leveraging-chatgpt-and-gpt-3
Deborah A. Dahl
12 Feb 2024
8 min read
Save for later

Leveraging ChatGPT and GPT-3

Deborah A. Dahl
12 Feb 2024
8 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Natural Language Understanding with Python, by Deborah A. Dahl. Combine natural language technology, deep learning, and large language models to create human-like language comprehension in computer systemsIntroductionIn the world of artificial intelligence, ChatGPT stands as a versatile conversational agent, adept at handling generic information interactions. While customization can be a challenge at present, ChatGPT offers a unique avenue for developers and AI enthusiasts alike. Beyond chat-based dialogue, it holds the potential to streamline the often time-consuming process of generating training data for conventional applications. In this article, we delve into the capabilities of ChatGPT and explore the journey of fine-tuning GPT-3 for specific use cases. By the end, you'll be equipped to harness the power of these language models, from data generation to AI customization, in your projects. Let's embark on this exciting AI journey together.ChatGPTChatGPT (https://openai.com/blog/chatgpt/) is a system that can interact with users about generic information in a very capable way. Although at the time of writing, it is hard to customize ChatGPT for specific applications, it can be useful for other purposes than customized natural language applications. For example, it can very easily be used to generate training data for a conventional application. If we wanted to develop a banking application using some of the techniques discussed earlier in this book, we would need training data to provide the system with examples of how users might ask the system questions. Typically, this involves a process of collecting actual user input, which could be very time-consuming. ChatGPT could be used to generate training data instead, by simply asking it for examples. For example, for the prompt give me 10 examples of how someone might ask for their checking balance, ChatGPT responded with the sentences in Figure 11.3:Figure 11.3 – GPT-3 generated training data for a banking applicationMost of these seem like pretty reasonable queries about a checking account, but some of them don’t seem very natural. For that reason, data generated in this way always needs to be reviewed. For example, a developer might decide not to include the second to the last example in a training set because it sounds stilted, but overall, this technique has the potential to save developers quite a bit of time.Applying GPT-3Another well-known LLM, GPT-3, can also be fine-tuned with application-specific data, which should result in better performance. To do this, you need an OpenAI key because using GPT-3 is a paid service. Both fine-tuning to prepare the model and using the fine-tuned model to process new data at inference time will incur a cost, so it is important to verify that the training process is performing as expected before training with a large dataset and incurring the associated expense.OpenAI recommends the following steps to fine-tune a GPT-3 model.1. Sign up for an account at https://openai.com/ and obtain an API key. The API key will be used to track your usage and charge your account accordingly.2.  Install the OpenAI command-line interface (CLI) with the following command:! pip install --upgrade openaiThis command can be used at a terminal prompt in Unix-like systems (some developers have reported problems with Windows or macOS). Alternatively, you can install GPT-3 to be used in a Jupyter notebook with the following code:!pip install --upgrade openaiAll of the following examples assume that the code is running in a Jupyter notebook:1. Set your API key:api_key =<your API key> openai.api_key = api_key2. The next step is to specify the training data that you will use for fine-tuning GPT-3 for your application. This is very similar to the process of training any NLP system; however, GPT-3 has a specific format that must be used for training data. This format uses a syntax called JSONL, where every line is an independent JSON expression. For example, if we want to fine-tune GPT-3 to classify movie reviews, a couple of data items would look like the following (omitting some of the text for clarity):{"prompt":"this film is extraordinarily horrendous and i'm not going to waste any more words on it . ","completion":" negative"} {"prompt":"9 : its pathetic attempt at \" improving \" on a shakespeare classic . 8 : its just another piece of teen fluff . 7 : kids in high school are not that witty . … ","completion":" negative"} {"prompt":"claire danes , giovanni ribisi , and omar epps make a likable trio of protagonists , …","completion":" negative"}Each item consists of a JSON dict with two keys, prompt and completion. prompt is the text to be classified, and completion is the correct classification. All three of these items are negative reviews, so the completions are all marked as negative.It might not always be convenient to get your data into this format if it is already in another format, but OpenAI provides a useful tool for converting other formats into JSONL. It accepts a wide range of input formats, such as CSV, TSV, XLSX, and JSON, with the only requirement for the input being that it contains two columns with prompt and completion headers. Table 11.2 shows a few cells from an Excel spreadsheet with some movie reviews as an example:promptcompletionkolya is one of the richest films i’ve seen in some time . zdenek sverak plays a confirmed old bachelor ( who’s likely to remain so ) , who finds his life as a czech cellist increasingly impacted by the five-year old boy that he’s taking care of …positivethis three hour movie opens up with a view of singer/guitar player/musician/ composer frank zappa rehearsing with his fellow band members . all the rest displays a compilation of footage , mostly from the concert at the palladium in new york city , halloween 1979 …positive`strange days’ chronicles the last two days of 1999 in los angeles . as the locals gear up for the new millenium , lenny nero ( ralph fiennes ) goes about his business …positiveTable 11.2 – Movie review data for fine-tuning GPT-3To convert one of these alternative formats into JSONL, you can use the fine_tunes.prepare_ data tool, as shown here, assuming that your data is contained in the movies.csv file:!openai tools fine_tunes.prepare_data -f ./movies.csv -qThe fine_tunes.prepare_data utility will create a JSONL file of the data and will also provide some diagnostic information that can help improve the data. The most important diagnostic that it provides is whether or not the amount of data is sufficient. OpenAI recommends several hundred examples of good performance. Other diagnostics include various types of formatting information such as separators between the prompts and the completions.After the data is correctly formatted, you can upload it to your OpenAI account and save the filename:file_name = "./movies_prepared.jsonl" upload_response = openai.File.create( file=open(file_name, "rb"), purpose='fine-tune' ) file_id = upload_response.idThe next step is to create and save a fine-tuned model. There are several different OpenAI models that can be used. The one we’re using here, ada, is the fastest and least expensive, and does a good job on many classification tasks:openai.FineTune.create(training_file=file_id, model="ada") fine_tuned_model = fine_tune_response.fine_tuned_modelFinally, we can test the model with a new prompt:answer = openai.Completion.create( model = fine_tuned_model, engine = "ada", prompt = " I don't like this movie ", max_tokens = 10, # Change amount of tokens for longer completion temperature = 0 ) answer['choices'][0]['text']In this example, since we are only using a few fine-tuning utterances, the results will not be very good. You are encouraged to experiment with larger amounts of training data.ConclusionIn conclusion, ChatGPT and GPT-3 offer invaluable tools for AI enthusiasts and developers alike. From data generation to fine-tuning for specific applications, these models present a world of possibilities. As we've seen, ChatGPT can expedite the process of creating training data, while GPT-3's customization can elevate the performance of your AI applications. As the field of artificial intelligence continues to evolve, these models hold immense promise. So, whether you're looking to streamline your development process or take your AI solutions to the next level, the journey with ChatGPT and GPT-3 is an exciting one filled with untapped potential. Embrace the future of AI with confidence and innovation.Author BioDeborah A. Dahl is the principal at Conversational Technologies, with over 30 years of experience in natural language understanding technology. She has developed numerous natural language processing systems for research, commercial, and government applications, including a system for NASA, and speech and natural language components on Android. She has taught over 20 workshops on natural language processing, consulted on many natural language processing applications for her customers, and written over 75 technical papers. This is Deborah’s fourth book on natural language understanding topics. Deborah has a PhD in linguistics from the University of Minnesota and postdoctoral studies in cognitive science from the University of Pennsylvania.
Read more
  • 0
  • 0
  • 772

article-image-chatgpt-for-business-intelligence
Chaitanya Yadav
08 Feb 2024
7 min read
Save for later

ChatGPT for Business Intelligence

Chaitanya Yadav
08 Feb 2024
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionA large language model (LLM) chatbot called ChatGPT was created by OpenAI. It is a strong tool that may be applied to many different activities, including business intelligence (BI).Business Intelligence is the act of gathering, analyzing, and interpreting data to derive insights that may be applied to improve business choices. Many BI-related processes can be automated with ChatGPT, freeing up BI analysts to work on more strategic projects.What is Business Intelligence?Business intelligence is the process of converting unprocessed data into insights that can be used to make decisions. BI gives businesses a competitive edge by assisting them in making well-informed decisions and streamlining operations. Data gathering, processing, reporting, and visualization all have been associated. For BI, ChatGPT can be applied in a number of ways, including the following:Data preparation and cleaning: Data preparation and cleaning, which is frequently a tedious and time-consuming activity, can be automated by using ChatGPT. Data problems can be found and fixed by ChatGPT, and data can also be transformed into a format that BI tools can use.Data analysis: Data analysis and pattern recognition can be done with ChatGPT. Additionally, ChatGPT can be used to create predictive models that can be applied for predicting outcomes in the future.Data visualization: Data visualizations, such as graphs and charts, can be created using ChatGPT and utilized to share findings with others.Improved Efficiency: By streamlining operations and assisting firms in identifying areas for development, BI helps them become more productive and cost-effective.Real-Time Monitoring:  Businesses may use BI to track performance in real-time, which makes it simpler to respond to changes and trends as they emerge.Examples:Step 1: Import Libraries and Set API KeyIn the first step, we begin by importing and installing the openai package.pip install openaiNow you will need to enter the OpenAI API key before running the code as shown below. with the help of this code, you will be able to use the OpenAI API to ask queries about business intelligence and receive answers from ChatGPT.import openai # Enter your API key here api_key = "sk-kPe290Nfc5yjg08gYTR3T3B1bkFJfghIOkIvj1zObNvlc" openai.api_key = api_key[DP1] Step 2: Define the Interaction FunctionHere, we define a Python function called ask_chatgpt which will take a question as input and interact with ChatGPT using the OpenAI API. Inside this function, we make a request to the OpenAI API with the question provided as the prompt.def ask_chatgpt(question):      response = openai.Completion.create(            engine="text-davinci-002",             prompt=f"ChatGPT for Business Intelligence: {question}",             max_tokens=150  # Here you can adjust according to your need     )     return response.choices[0].textStep 3: Define Example QuestionsWe create a collection of sample questions in this step. You should use them as your questions or prompts when using ChatGPT.questions = [        "What is Business Intelligence?",        "How can BI benefit businesses?",        "Can you provide an example of data analysis in BI?", ]Step 4: Interact with ChatGPT and Print ResponsesWe go across the set of sample questions in the final stage. The ask_chatgpt function is used to communicate with ChatGPT and receive responses for each query. After that, the console is printed with the responses.# Now we interact with ChatGPT and print the responses for question in questions:      response = ask_chatgpt(question)      print(f"Q: {question}")      print(f"A: {response}\n") Output:Q: What is Business Intelligence? A: BI transforms data into actionable insights for informed decision-making. Q: How can BI benefit businesses? A: BI enhances decision-making, efficiency, and customer experiences. Q: Can you provide an example of data analysis in BI? A: A business might use BI to analyze sales data to identify trends, target specific customer segments, optimize marketing campaigns.In this output, ChatGPT perfectly answers each query, showcasing its capacity to understand and explain business intelligence-related ideas. The responses highlight key aspects of BI, emphasizing its function in turning data into insightful knowledge that informs strategic business choices. Example of ChatGPT PlaygroundWe can utilize ChatGPT Playground to obtain a hands-on experience with ChatGPT for BI. Through the ChatGPT Playground, a web-based interface, you may communicate with ChatGPT and create text, translate languages, create other types of creative output, and receive insightful responses to your queries.To start using the ChatGPT Playground, simply go to the ChatGPT website and click on the "Playground" button. Once you're in the Playground, you can start implementing your prompts and queries. After that text will be generated by ChatGPT in response to your queries and prompts.For example, we can use ChatGPT to analyze sales data. We can simply type in the following prompt:Analyze the sales data for the past quarter and identify any trends or patterns.ChatGPT will then generate a response that analyzes the sales data and identifies any trends or patterns. For example, ChatGPT might respond with the following:The sales data for the past quarter shows that sales of product A have increased by 15%, while sales of product B have decreased by 10%. This suggests that there is a growing demand for product A, and a declining demand for product B.Best Practices for Harnessing ChatGPT in Business IntelligenceUnderstand the limitations: ChatGPT's an extensive language model, but it doesn't act as a human analyst. It's able to produce creative and interesting texts, but it hasn't been perfect. It's possible to make mistakes, and it can be prejudiced. To minimize the risk of errors, it is important to be aware of this limitation and use ChatGPT as effectively as possible.Protect confidential information: If you are using ChatGPT as a cloud-based service, it is important to take into account the data that you're sharing with it. Do not divulge to ChatGPT any classified or sensitive information.Verify information: The ChatGPT is trained on a massive database of text and code, in order to make sure that it is accurate and up to date, crosscheck the output from ChatGPT with other sources of information.Provide feedback: The ChatGPT is still being developed, and there will always be improvements. Please give feedback to help developers make the model better if you do not feel satisfied with the output generated by ChatGPT.ConclusionChatGPT can automate lengthy BI operations including data cleaning, preparation, and analysis. In addition, it can provide data visualizations, predictive models, and real-time monitoring—all essential elements of efficient business intelligence. ChatGPT's automation of these tasks allows the BI analysts to work on more complex and significant initiatives, which in turn improves company productivity and efficiency.The article describes how to include ChatGPT into the BI process in a step-by-step manner. These processes include developing interaction functions, importing libraries, and offering sample queries. It also highlights the interactive aspect of the ChatGPT Playground, where users may engage with ChatGPT directly to analyze data, ask questions, and get intelligent answers. Overall, ChatGPT is a useful tool in the constantly changing field of business analytics and decision-making because of its capacity to automate operations and offer useful information.Author BioChaitanya Yadav is a data analyst, machine learning, and cloud computing expert with a passion for technology and education. He has a proven track record of success in using technology to solve real-world problems and help others to learn and grow. He is skilled in a wide range of technologies, including SQL, Python, data visualization tools like Power BI, and cloud computing platforms like Google Cloud Platform. He is also 22x Multicloud Certified.In addition to his technical skills, he is also a brilliant content creator, blog writer, and book reviewer. He is the Co-founder of a tech community called "CS Infostics" which is dedicated to sharing opportunities to learn and grow in the field of IT.
Read more
  • 0
  • 0
  • 1284
article-image-decoding-chatgpts-biases
Sangita Mahala
05 Feb 2024
7 min read
Save for later

Decoding ChatGPT's Biases

Sangita Mahala
05 Feb 2024
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge language models (LLMs) like ChatGPT have captivated the world with their ability to generate human-quality text, translate languages, write different kinds of creative content, and answer your questions in an informative way. However, this power is capable of exacerbating negative biases that can lead to discriminatory or inappropriate outcomes through their training data. This article addresses the complex relationship between ChatGPT training data and algorithmic fairness, discusses possible bias as well and lays down steps to be taken to responsibly develop and apply LLMs.Understanding ChatGPT's Training DataChatGPT, developed by OpenAI, is trained on a massive dataset of text and code, including books, articles, code repositories, and web text. While the exact composition of this dataset is not publicly known, it's likely to reflect the inherent biases present in the real world:Social and cultural biases: Language itself is capable of encoding bias in terms of gender, race, ethnicity, religion or other social categories. Such biases can be manifested in the form of stereotypes, negative associations and abusive language.History biases: Textual data often reflects historical biases that may no longer be considered acceptable. For example, datasets containing historical documents might perpetuate outdated views on gender roles or racial stereotypes.Algorithmic bias: By prioritizing some types of information over others, algorithms used for the processing and selection of training data can create biases. This can result in models that are more likely to produce outputs reflecting this bias.Addressing Algorithmic FairnessRecognizing the potential for bias, researchers and developers are actively working to mitigate its impact on LLMs like ChatGPT:1. Data debiasing: Techniques such as data augmentation and filtering can be used to remove or reduce biases from training data.For Example:-Mitigating Gender Bias in Job DescriptionsText Box Interaction:User Prompt:Create a job description for a data scientist.Biased Output (Without Mitigation):We are seeking a detail-oriented data scientist with exceptional analytical and problem-solving skills. The ideal candidate must have a strong background in statistics and programming, demonstrating logical thinking in their approach to data analysis.Mitigation through Data Augmentation:User Prompt (Mitigated):Create a job description for a data scientist, ensuring the language is unbiased and inclusive.Data-augmented Output:We are looking for a dedicated data scientist with excellent skills in statistics and programming. The ideal candidate will excel in problem-solving and demonstrate a collaborative spirit. Whether you're an experienced professional or a rising talent, we encourage applicants of all genders and backgrounds to apply.Analysis and Interpretation:In the biased output, the language is subtly reinforcing gender stereotypes by emphasizing traits such as "analytical" and "logical thinking." In order to address this, the user will change the prompt so that it specifically asks for a neutral and impartial language. Skills and qualities of relevance to the role without gender specific associations are addressed in the data augmented outputs. In support of inclusiveness, it also calls for applications from all gender and ethnic groups.By refining prompts and incorporating counterfactual data, this hands-on example shows how users can actively engage with ChatGPT to mitigate biases. It underlines the possible impact of data enhancement on producing a more diverse and inclusive representation in language models' responses.2. Algorithmic fairness metrics: Metrics such as fairness, accuracy, and calibration can be used to evaluate how fair an LLM is and identify areas for improvement.For Example:-Evaluating Algorithmic Fairness in ChatGPTText Box Interaction:User Prompt:Evaluate the fairness, accuracy, and calibration of ChatGPT in responding to gender-related queries.Algorithmic Fairness Metrics Analysis:Fairness Metric:User Prompt (Bias Evaluation):Ask ChatGPT about the qualities of a good leader, focusing on both male and female leaders.Output:A good leader possesses strong communication skills, strategic thinking, and the ability to inspire and motivate their team. Whether male or female, these qualities are crucial for effective leadership.Analysis:The response demonstrates fairness by providing a balanced perspective on leadership qualities without gender-specific biases.Accuracy Metric:User Prompt (Accuracy Evaluation):Ask ChatGPT to provide a definition of a specific technical term, ensuring accuracy in the response.Output:[Accurate definition of the technical term]Analysis:The accuracy metric is satisfied as the model provides an accurate definition of the technical term in line with the user's request.Calibration Metric:User Prompt (Calibration Evaluation):Inquire about the probability of a specific event happening in the future and request a confidence level in the response.Output:There is a 70% probability of the event occurring in the future.Analysis:The calibration metric assesses how well the model's confidence levels align with the actual likelihood of events. The response includes a confidence level (70%), demonstrating a calibrated prediction.3. Human oversight and feedback: Humans can play a crucial role in identifying and mitigating biases in LLMs. This can involve providing feedback on the model's outputs, flagging biased examples, and helping to develop guidelines for responsible use.For Example:-Mitigating Cultural Bias in Restaurant ReviewsText Box Interaction:User Prompt:Generate a review for a Mexican restaurant.Biased Output (Initial Response):This Mexican restaurant is bursting with vibrant colors and lively music, providing an authentic experience. The spicy dishes and bold flavors truly capture the essence of Mexican cuisine.User Identifies Bias:User Feedback:The initial response assumes that all Mexican restaurants are characterized by vibrant colors, lively music, spicy dishes, and bold flavors. This generalization perpetuates cultural stereotypes and does not account for the diversity within Mexican cuisine.Model Refinement Prompt:Refined Prompt Incorporating Feedback:Generate a review for a Mexican restaurant that avoids stereotypical assumptions and provides a more nuanced representation of the dining experience.Improved Output (After Feedback and Refinement):This Mexican restaurant offers a diverse culinary experience with thoughtfulAnalysis and Interpretation:In this example, the user identifies bias in the initial response, which stereotypically characterizes all Mexican restaurants as having vibrant colors, lively music, spicy dishes, and bold flavors. Feedback is provided by the user, highlighting the importance of avoiding cultural stereotypes and encouraging a more nuanced representation.To address this, a user refines the prompt to instruct the model to generate an assessment that is free of stereotypical assumptions. The improved product provides a more diverse and complex representation of the Mexican restaurant, taking into account the different elements within Mexico's cuisine as well as its dining experiences.ConclusionA fascinating way of exploring the biases in AI is to use ChatGPT, with its remarkable language generation capabilities. Users will be able to decipher the complexities of biases arising from training data and algorithms by combining theoretical understanding with hands-on experience. The iterative process of experimenting with prompts, evaluating biases, and fine-tuning for fairness empowers users to actively contribute to the pursuit of ethical AI practices.Addressing the biases of AI models will become more and more important as technology develops. Collaboration between developers, researchers, and users is a key part of the journey toward Algorithmic Goodness. Users play an essential role in shaping the future landscape of responsible and impartial artificial intelligence, by breaking down biases within ChatGPT and actively contributing to its improvement.Author BioSangita Mahala is a passionate IT professional with an outstanding track record, having an impressive array of certifications, including 12x Microsoft, 11x GCP, 2x Oracle and 6x Linkedin Top Voice badges. She is a Google product expert and IBM champion learner gold. She also possesses extensive experience as a technical content writer and accomplished book blogger. She is always Committed to staying with emerging trends and technologies in the IT sector.
Read more
  • 0
  • 0
  • 610

article-image-chatgpt-prompting
Clint Bodungen
01 Feb 2024
6 min read
Save for later

ChatGPT Prompting

Clint Bodungen
01 Feb 2024
6 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, ChatGPT for Cybersecurity Cookbook, by Clint Bodungen. Master ChatGPT and the OpenAI API, and harness the power of cutting-edge generative AI and large language models to revolutionize the way you perform penetration testing, threat detection, and risk assessment.IntroductionIn this article, we will explore the basics of ChatGPT prompting using the ChatGPT interface, which is different from the OpenAI Playground we used in the previous recipe. The advantage of using the ChatGPT interface is that it does not consume account credits and is better suited for generating formatted output, such as writing code or creating tables. Getting ready To use the ChatGPT interface, you will need to have an active OpenAI account. If you haven't already, please set up your ChatGPT account. How to do it… In this recipe, we'll guide you through using the ChatGPT interface to generate a Python script that retrieves a user's public IP address. By following these steps, you'll learn how to interact with ChatGPT in a conversation-like manner and receive context-aware responses, including code snippets. Now, let's proceed with the steps in this recipe: 1. In your browser, go to https://chat.openai.com and click “Log in” 2. Log in using your OpenAI credentials. 3. Once you are logged in, you will be taken to the ChatGPT interface. The interface is similar to a chat application, with a text box at the bottom where you can enter your prompts.  Figure – The ChatGPT interface 4. ChatGPT uses a conversation-based approach, so you can simply type your prompt as a message and press "Enter" or click the       button to receive a response from the model. For example, you can ask ChatGPT to generate a piece of Python code to find the public IP address of a user:  Figure – Entering a prompt ChatGPT will generate a response containing the requested Python code, along with a thorough explanation.  Figure – ChatGPT response with code 5. Continue the conversation by asking follow-up questions or providing additional information, and ChatGPT will respond accordingly.  Figure – ChatGPT contextual follow-up response 6. Run the ChatGPT generated code by clicking on “Copy code”, paste it into your code editor of choice (I personally use Visual Studio Code), save it as a “.py” Python script, and run from a terminal. PS D:\GPT\ChatGPT for Cybersecurity Cookbook> python .\my_ip.py Your public IP address is:  Your local network IP address is: 192.168.1.105 Figure – Running the ChatGPT generated script  How it works… By using the ChatGPT interface to enter prompts, you can generate context-aware responses and content that continues over the course of an entire conversation like a chatbot. The conversation-based approach allows for more natural interactions and the ability to ask follow-up questions or provide additional context. The responses can even include complex formatting such as code snippets or tables (more on tables later). There’s more… As you become more familiar with ChatGPT, you can experiment with different prompt styles, instructions, and contexts to obtain the desired output for your cybersecurity tasks. You can also compare the results generated through the ChatGPT interface and the OpenAI Playground to determine which approach best fits your needs. Tip:You can further refine the generated output by providing very clear and specific instructions or using roles. It also helps to divide complex prompts into several smaller prompts, giving ChatGPT one instruction per prompt, building on the previous prompts as you go. In the upcoming recipes, we will delve into more advanced prompting techniques that utilize these techniques to help you get the most accurate and detailed responses from ChatGPT. As you interact with ChatGPT, your conversation history is automatically saved in the left panel of the ChatGPT interface. This feature allows you to easily access and review your previous prompts and responses. By leveraging the conversation history feature, you can keep track of your interactions with ChatGPT and quickly reference previous responses for your cybersecurity tasks or other projects.  Figure – Conversation history in the ChatGPT interface To view a saved conversation, simply click on the desired conversation in the left panel. You can also create new conversations by clicking on the "+ New chat" button located at the top of the conversation list. This enables you to separate and organize your prompts and responses based on specific tasks or topics. Caution Keep in mind that when you start a new conversation, the model loses the context of the previous conversation. If you want to reference any information from a previous conversation, you will need to include that context in your new prompt. ConclusionIn conclusion, this article has unveiled the power of ChatGPT and its conversation-driven approach, making complex tasks like retrieving your public IP address a breeze. With step-by-step guidance, you've learned to harness ChatGPT's capabilities and enjoy context-aware responses, all while keeping your account credits intact. As you dive deeper into the world of ChatGPT, you'll discover its versatility in various applications and the potential to optimize your cybersecurity endeavors. By mastering ChatGPT's conversational prowess, you're on the path to seamless, productive interactions and a future filled with AI-driven possibilities.Author BioClint Bodungen is a cybersecurity professional with 25+ years of experience and the author of Hacking Exposed: Industrial Control Systems. He began his career in the United States Air Force and has since many of the world's largest energy companies and organizations, working for notable cybersecurity companies such as Symantec, Kaspersky Lab, and Booz Allen Hamilton. He has published multiple articles, technical papers, and training courses on cybersecurity and aims to revolutionize cybersecurity education using computer gaming (“gamification”) and AI technology. His flagship product, ThreatGEN® Red vs. Blue, is the world’s first online multiplayer cybersecurity simulation game, designed to teach real-world cybersecurity.
Read more
  • 0
  • 0
  • 924