How-To Tutorials

article-image-summarizing-data-with-openai-chatgpt

02 Jun 2023

4 min read

Summarizing Data with OpenAI ChatGPT

02 Jun 2023

This article is an excerpt from the book, Machine Learning with Microsoft Power BI, by Greg Beaumont. This book is designed for data scientists and BI professionals seeking to improve their existing solutions and workloads using AI. In the ever-expanding landscape of data analysis, the ability to summarize vast amounts of information concisely and accurately is invaluable. Enter ChatGPT, an advanced AI language model developed by OpenAI. In this article, we delve into the realm of data summarization with ChatGPT, exploring how this powerful tool can revolutionize the process of distilling complex datasets into concise and informative summaries.Numerous databases feature free text fields that comprise entries from a diverse array of sources, including survey results, physician notes, feedback forms, and comments regarding incident reports for the FAA Wildlife Strike database that we have used in this book. These text entry fields represent a wide range of content, from structured data to unstructured data, making it challenging to extract meaning from them without the assistance of sophisticated natural language processing tools. The Remarks field of the FAA Wildlife Strike database contains text that was presumably entered by people involved in filling out the incident form about an aircraft striking wildlife. A few examples of the remarks for recent entries are shown in Power BI in the following screenshot: Figure 1 – Examples of Remarks from the FAA Wildlife Strike Database You will notice that the remarks have a great deal of variability in the format of the content, the length of the content, and the acronyms that were used. Testing one of the entries by simply adding a statement at the beginning to “Summarize the following:” yields the following result: Figure 2 – Summarizing the remarks for a single incident using ChatGPT Summarizing data for a less detailed Remarks field yields the following results: Figure 3 – Summarization of a sparsely populated results field In order to obtain uniform summaries from the FAA Wildlife Strike data's Remarks field, one must consider entries that vary in robustness, sparsity, completeness of sentences, and the presence of acronyms and quick notes. The workshop accompanying this technical book is your chance to experiment with various data fields and explore diverse outcomes. Both the book and the Packt GitHub site will utilize a standardized format as input to a GPT model that can incorporate event data and produce a consistent summary for each row. An example of the format is as follows: Summarize the following in three sentences: A [Operator] [Aircraft] struck a [Species]. Remarks on the FAA report were: [Remarks]. Using data from an FAA Wildlife Strike Database event to test this approach in OpenAI ChatGPT is shown in the following screenshot: Figure 4 – OpenAI ChatGPT testing a summarization of the remarks field Next, you test another scenario that had more robust text in the Remarks field: Figure 5 – Another scenario with robust remarks tested using OpenAI ChatGPT SummaryThis article explores how ChatGPT can revolutionize the process of condensing complex datasets into concise and informative summaries. By leveraging its powerful language generation capabilities, ChatGPT enables researchers, analysts, and decision-makers to quickly extract key insights and make informed decisions. Dive into the world of data summarization with ChatGPT and unlock new possibilities for efficient data analysis and knowledge extraction. Author Bio:Greg Beaumont is a Data Architect at Microsoft; Greg is an expert in solving complex problems and creating value for customers. With a focus on the healthcare industry, Greg works closely with customers to plan enterprise analytics strategies, evaluate new tools and products, conduct training sessions and hackathons, and architect solutions that improve the quality of care and reduce costs. With years of experience in data architecture and a passion for innovation, Greg has a unique ability to identify and solve complex challenges. He is a trusted advisor to his customers and is always seeking new ways to drive progress and help organizations thrive. For more than 15 years, Greg has worked with healthcare customers who strive to improve patient outcomes and find opportunities for efficiencies. He is a veteran of the Microsoft data speaker network and has worked with hundreds of customers on their data management and analytics strategies.You can follow Greg on LinkedIn

0
0
184

article-image-data-cleaning-made-easy-with-chatgpt

Sagar Lad

02 Jun 2023

5 min read

Data Cleaning Made Easy with ChatGPT

Sagar Lad

02 Jun 2023

5 min read

Identifying inconsistencies and inaccuracies in the data is a vital part of the data analysis process. ChatGPT is a natural language processing tool powered by AI that enables users to have human-like conversations and helps them complete tasks quickly. In this article, we'll focus on how chatGPT can make the process of data cleansing and cleaning more efficient. Data Cleansing/Cleaning with ChatGPT Given the volume, velocity, and variety of data we deal with nowadays, manually carrying out the data cleansing task is a very time-consuming process. Data cleansing, the removal of duplicate data, data validity, uniqueness, consistency, and correctness are all steps taken to increase the quality of the data. Better business insights and the ability for business users to make wise decisions are provided by cleansed data. Data cleansing activities go via a series of steps, starting with gathering the data and ending with integrating, producing, and normalizing the data, as shown in the image below: Image 1: Data cleansing cycle The majority of corporate organizations carry out the following tasks as part of the exploratory data analysis's data cleansing procedure: Identify and clean up Duplicate Values Fill Null Values with a default valueRectify and Correct inconsistent dataStandardising date formats Standardising name or addressArea codes out of phone numbersFlattening nested data structuresErasing incomplete dataDetecting conflicts in the database The strength of ChatGPT allows us to perform time-consuming and extremely boring tasks like data purification with ease. Let's use the example of employee details for the banking industry to better comprehend it which has columns: Employee ID, Employee Name, Department Name, and Joining Date. While reviewing the data, we discovered a number of data quality concerns that must be resolved before we can truly use this data for analytics. Example: Employee Name is inconsistent - some instances use lowercase while others use uppercase letters. The data format is not uniform for the joining date column. Traditional Way of Working To clean up this data in Excel, we must manually construct the formulas and apply functions like TRIM, UPPER, or LOWER before using it for analytics. It calls for development work, and upkeep of Excel logic without version control, history, etc. Sounds extremely tedious, isn’t it? Working with ChatGPT We can utilize ChatGPT to automate the aforementioned data purification operation by implementing some Python code. In this example, we'll use the ChatGPT Python code to demonstrate how to standardize the name for the employee's name and the date format for the joining date.ChatGPT prompt:Here is the prompt that we can provide in the text format, in case you plan to copy and paste: Employee ID | Employee Name | Department Name | Joining Date 214 john Root HR 1-06-2003 435 STEVE Smith Retail 21-Feb-05 654 Sachin WALA OPSI 25-July-1999 Above is the employee data source which should be cleaned. Employee names are not consistent, and the joining date is not in a uniform date format. Generate a Python code to create accurate data. Image 2: Input to the ChatGPTWe pass a dataset and a description of how and for which columns we want to clean the data as seen in the image above. Output from ChatGPTChatGPT automatically creates Python code with a variety of generic functions to clean the specified column in accordance with our specifications. The ChatGPT tool's output Python code is shown below. Image 3: Output Python code from ChatGPT After running the Python code generated by ChatGPT on the stated data, ChatGPT also displays a sample result on the data here. It is clear that employee names are now uniform, and the joining date is likewise shown using a common date format. Image 4 : Sample output from ChatGPT This Python code can be used to clean any data source in the future when we need to do so, not just the employee dataset. Therefore, using ChatGPT's capabilities, we can develop a fully automated data cleaning process that is precise, effective, and totally automated.There are also tools on the market like RATH, which has an integration with ChatGPT, to simplify the data analysis workflow and increase your productivity without putting in a lot of manual work if you are having trouble with a large volume of data and need to spend a lot of time performing the data cleaning activity ConclusionThis article gave you a fundamental grasp of the data cleaning/cleansing procedure, which will enable you to use the data to make more trustworthy decisions. The most effective method for using ChatGPT to clean your data simply and effectively for any data quantities. Author Bio:Sagar Lad is a Cloud Data Solution Architect with a leading organisation and has deep expertise in designing and building Enterprise-grade Intelligent Azure Data and Analytics Solutions. He is a published author, content writer, Microsoft Certified Trainer, and C# Corner MVP.You can follow Sagar on - Medium, Amazon, LinkedIn

0
0
1039

article-image-responding-to-generative-ai-from-an-ethical-standpoint

Dr. Alex Antic

02 Jun 2023

7 min read

Responding to Generative AI from an Ethical Standpoint

Dr. Alex Antic

02 Jun 2023

7 min read

This article is an excerpt from the book Creators of Intelligence, by Dr. Alex Antic. This book will provide you with insights from 18 AI leaders on how to build a rewarding data science career. As Generative Artificial Intelligence (AI) continues to advance, the need for ethical considerations becomes increasingly vital. In this article, we engage in a conversation between a Generative AI expert, Edward Santow, and an author to uncover practical ways to incorporate ethics into the rapidly evolving landscape of generative AI, ensuring its responsible and beneficial implementation. Importance of Ethics in Generative AI Generative AI is a rapidly developing field with the potential to revolutionize many aspects of our lives. However, it also raises a number of ethical concerns. Some of the most pressing ethical issues in generative AI include: Bias: Generative AI models are trained on large datasets of data, which can introduce bias into the models. This bias can then be reflected in the outputs of the models, such as the images, text, or music that they generate. Transparency: Generative AI models are often complex and difficult to understand. This can make it difficult to assess how the models work and to identify any potential biases. Accountability: If a generative AI model is used to generate harmful content, such as deepfakes or hate speech, it is important to be able to hold the developers of the model accountable. Privacy: Generative AI models can be used to generate content that is based on personal data. This raises concerns about the privacy of individuals whose data is used to train the models. Fairness: Generative AI models should be used in a way that is fair and does not discriminate against any particular group of people. It is important to address these ethical concerns in order to ensure that generative AI is used in a responsible and ethical manner. Some of the steps that can be taken to address these concerns include: Using unbiased data: When training generative AI models, it is important to use data that is as unbiased as possible. This can help to reduce the risk of bias in the models. Making models transparent: It is important to make generative AI models as transparent as possible. This can help to identify any potential biases and to make it easier to understand how the models work. Holding developers accountable: If a generative AI model is used to generate harmful content, it is important to be able to hold the developers of the model accountable. This can be done by developing clear guidelines and regulations for the development and use of generative AI. Protecting privacy: It is important to protect the privacy of individuals whose data is used to train generative AI models. This can be done by using anonymized data or by obtaining consent from individuals before using their data.Ensuring fairness: Generative AI models should be used in a way that is fair and does not discriminate against any group of people. This can be done by developing ethical guidelines for the use of generative AI.By addressing these ethical concerns, we can help to ensure that generative AI is used in a responsible and ethical manner. Ed Santow’s Opinion on Implementing Ethics Given the popularity and advances in generative AI tools, such as ChatGPT, I’d like to get your thoughts on how generative AI has impacted ethics frameworks. What complications has it added? Ed Santow: In one sense, it hasn’t, as the frameworks are broad enough and apply to AI generally, and their application depends on adapting to the specific context in which they’re being applied. One of the great advantages of this is that generative AI is included within its scope. It may be a newer form of AI, as compared with analytical AI, but existing AI ethics frameworks already cover a range of privacy and human rights issue, so they are applicable. The previous work to create those frameworks has made it easier and faster to adapt to the specific aspects of generative AI from an ethical perspective. One of the main complexities is the relatively low community understanding of how generative AI actually works and, particularly, the science behind it. Very few people can distinguish between analytical and generative AI. Most people in senior roles haven’t made the distinction yet or identified the true impact. The issue is, if you don’t understand the underlying technology well enough, then it’s difficult to make the frameworks work in practice. Analytical and generative AI share similar core science. However, generative AI can pose greater risks than simple classification AI. But the nature and scale of those risks generally haven’t been worked through in most organizations. Simply setting black-and-white rules – such as you can or can’t use generative AI – isn’t usually the best answer. You need to understand how to safely use it. How will organizations need to adapt their ethical frameworks in response to generative AI? Ed Santow: First and foremost, they need to understand that skills and knowledge are vital. They need to upskill their staff and develop a better understanding of the technology and its implications – and this applies at all levels of the organization. Second, they need to set a nuanced policy framework, outline how to use such technology safely and develop appropriate risk mitigation procedures that can flag when it’s not safe to rely on the outputs of generative AI applications. Most AI ethics frameworks don’t go into this level of detail. Finally, consideration needs to be given to how generative AI can be used lawfully. For example, entering confidential client data – or proprietary company data – into ChatGPT is likely to be unlawful, yet we also know this is happening. What advice can you offer CDOs and senior leaders in relation to navigating some of these challenges? Edward Santow: There are simply no shortcuts. People can’t assume that even though others in their industry are using generative AI, their organization can use it without considering the legal and ethical ramifications. They also need to be able to experiment safely with such technology. For example, a new chatbot based on generative AI shouldn’t be simply unleased on customers. They need to first test and validate it in a controlled environment to understand all the risks – including the ethical and legal ramifications. Leaders need to ensure that an appropriately safe test environment is established to mitigate any risk of harm to staff or customers. Summary In this article, we went through various ethical issues that can arise while implementing Generative AI and some ways to tackle these challenges effectively. We also learned certain practical best practices through an expert opinion from an expert in the field of Generative AI. Author Bio :Dr. Alex Antic is an award-winning Data Science and Analytics Leader, Consultant, and Advisor, and a highly sought Speaker and Trainer, with over 20 years of experience. Alex is the CDO and co-founder of Healices Health - which focuses on advancing cancer care using Data Science and is co-founder of Two Twigs - a Data Science consulting, advisory, and training company. Alex has been described as "one of Australia’s iconic data leaders" and "one of the most premium thought leaders in data analytics globally". He was recognized in 2021 as one of the Top 5 Analytics Leaders by the Institute of Analytics Professionals of Australia (IAPA). Alex is an Adjunct Professor at RMIT University, and his qualifications include a Ph.D. in Applied Mathematics. LinkedIn

0
0
158

Dario Radečić

02 Jun 2023

7 min read

Introduction to LLaMA

Dario Radečić

02 Jun 2023

7 min read

It seems like everyone, and their grandmothers, are discussing Large Language Models (LLMs) these days. These models got all the hype since ChatGPT's release in late 2022. The average user might get lost in acronyms such as GPT, PaLM, or LLaMA, and that’s understandable. This article will shed some light on why you should generally care about LLMs and exactly what they bring to the table. By the end of this article, you’ll have a fundamental understanding of the LLaMA model, how it compares to other large language models, and will have the 7B flavor of LLaMA running locally on your machine. There’s no time to waste, so let’s dive straight in! The Purpose of LLaMA and Other Large Language Models The main idea behind LLMs is to understand and generate human-like text based on the input you feed into them. Ask a human-like question and you’ll get a human-like response back. You know what we’re talking about if you’ve ever tried ChatGPT. These models are typically trained on huge volumes of data, sometimes even as large as everything that has been written on the Internet over some time span. This data is then fed into the algorithms using unsupervised learning which has the task of learning words and relationships between them. Large Language Models can be generic or domain-specific. You can use a generic LLM and fine-tune it for a certain task, similar to what OpenAI did with Codex (LLM for programming).As the end-user, you can benefit from LLMs in several ways:Content generation – You can use LLMs to generate content for personal or professional purposes, such as articles, emails, social media posts, and so on.Information retrieval – LLMs help you find relevant information quickly and often do a better job when compared to a traditional web search. Just be aware of the training date cap the model has – it might not do as well on the recent events.Language assistance and translation – These models can detect spelling errors and grammar mistakes, suggest writing improvements, provide synonyms, idioms, and even provide a meaningful translation from one language to another.At the end of the day, probably everyone can find a helpful use case in a large language model.But which one should you choose? There are many publicly available models, but the one that stands out recently is LLaMA. Let’s see why and how it works next. What is LLaMA and How it Works? LLaMA stands for “Large Language Model Meta AI” and is a large language model published by – you’ve guessed it – Meta AI. It was released in February 2023 in a variety of flavors – from 7 billion to 65 billion parameters.A LLaMA model uses the Transformer architecture and works by generating probability distributions over sequences of words (or tokens). In plain English, this means the LLaMA model predicts the next most reasonable word given the sequence of input words.It’s interesting to point out that LLaMA-13B (13 billion parameters) outperforms GPT-3 on most benchmarks, even though GPT-3 has 13 times more parameters (175 billion). The more parameter-rich LLaMA (65B parameters) is on par with the best large language models we have available today, according to the official paper by Meta AI.In fact, let’s take a look at these performance differences by ourselves. The following table from the official paper summarizes it well: Figure 1 - LLaMA performance comparison with other LLMs Generally speaking, the more parameters the LLaMA model contains, the better it performs. The interesting fact is that even the 7B version is comparable in performance – or even outperforms – the models with significantly more parameters. The 7B model performs reasonably well, so how can you try it out? In the next section, you’ll have LLaMA running locally with only two shell commands. How to Run LLaMA Locally? You’ll need a couple of things to run LLaMA locally – decent hardware (doesn’t have to be the newest), a lot of hard drive space, and a couple of software dependencies installed. It doesn’t matter which operating system you’re using, as the implementation we’re about to show you is cross-platform.For reference, we ran the 7B parameter model on an M1 Pro MacBook with 16 GB of RAM. The model occupied 31 GB of storage, and you can expect this amount to grow if you choose a LLaMA flavor with more parameters.Regarding software dependencies, you’ll need a recent version of Node. We used version 18.16.0 with npm version 9.5.1.Once you have Node installed, open up a new Terminal/CMD window and run the following command. It will install the 7B LLaMA model: npx dalai llama install 7B You might get a prompt to install dalai first, so just type y into the console. Once Dalai is installed, it will proceed to download the model weights. You should see something similar during this process: Figure 2 - Downloading LLaMA 7B model weights It will take some time, depending on your Internet speed. Once done, you’ll have the 7B model available in the Dalie web UI. Launch it with the following shell command:npx dalai serve This is the output you should see: Figure 3 - Running dalai web UI locally The web UI is now running locally on port 3000. As soon as you open http://localhost:3000, you’ll be presented with the interface that allows you to choose the model, tweak the parameters, and select a prompting template.For reference, we’ve selected the chatbot template and left every setting as default. The prompt we’ve entered is “What is machine learning?” Here’s what the LLaMA model with 7B parameters outputted: Figure 4 - Dalai user interface The answer is mostly correct, but the LLaMA response started looking like a blog post toward the end (“In this article…”). As with all large language models, you can use it to draw insights, but only after some human intervention.And that’s how you can run a large language model locally! Let’s make a brief recap next. ConclusionIt’s getting easier and cheaper to train large language models, which means the number of options you’ll have is only going to grow over time.LLaMA was only recently released to the public, and today you’ve learned what it is, got a high-level overview of how it works, and how to get it running locally. You might want to tweak the 7B version if you’re not getting the desired response or opt for a version with more parameters (if your hardware allows it). Either way, have fun!Author Bio:Dario Radečić is a Senior Data Scientist at Neos, Croatia. Book author: "Machine Learning Automation with TPOT". Owner of betterdatascience.com. You can follow him on Medium: https://medium.com/@radecicdario

0
0
122

article-image-chatgpt-for-information-retrieval-and-competitive-intelligence

Valentina Alto

02 Jun 2023

2 min read

ChatGPT for Information Retrieval and Competitive Intelligence

Valentina Alto

02 Jun 2023

2 min read

This article is an excerpt from the book Modern Generative AI with ChatGPT and OpenAI Models, by Valentina Alto. This book will provide you with insights into the inner workings of the LLMs and guide you through creating your own language models. Information retrieval and competitive intelligence are fields where ChatGPT is a game-changer. It can retrieve information from its knowledge base and reframe it in an original way.One example is using ChatGPT as a search engine to provide summaries, reviews, and recommendations for books: Alternatively, we could ask for some suggestions for a new book we wish to read based on our preferences: If we design the prompt with specific information, ChatGPT can serve as a tool for pointing us towards the right references for research or studies. For example, asking ChatGPT to list relevant references for feedforward neural networks: ChatGPT can also be useful for competitive intelligence. For example, generating a list of existing books with similar content: Or providing advice on how to be competitive in the market: ChatGPT can also suggest improvements regarding book content to make it stand out: Overall, ChatGPT can be a valuable assistant for information retrieval and competitive intelligence. However, it's important to remember that the knowledge base cutoff is 2021, so real-time information may not be available. About the AuthorValentina Alto graduated in 2021 in Data Science. Since 2020 she has been working in Microsoft as Azure Solution Specialist and, since 2022, she focused on Data&AI workloads within the Manufacturing and Pharmaceutical industry. She has been working on customers’ projects closely with system integrators to deploy cloud architecture with a focus on datalake house and DWH, data integration and engineering, IoT and real-time analytics, Azure Machine Learning, Azure cognitive services (including Azure OpenAI Service), and PowerBI for dashboarding. She holds a BSc in Finance and an MSc degree in Data Science from Bocconi University, Milan, Italy. Since her academic journey she has been writing Tech articles about Statistics, Machine Learning, Deep Learning and AI on various publications. She has also written a book about the fundamentals of Machine Learning with Python. You can connect with Valentina on:LinkedInMedium

0
0
226

article-image-customize-chatgpt-for-specific-tasks-using-effective-prompts-shot-learning

Valentina Alto

02 Jun 2023

5 min read

Customize ChatGPT for Specific Tasks Using Effective Prompts – Shot Learning

Valentina Alto

02 Jun 2023

5 min read

This article is an excerpt from the book Modern Generative AI with ChatGPT and OpenAI Models, by Valentina Alto. This book will provide you with insights into the inner workings of the LLMs and guide you through creating your own language models. We know for the fact that OpenAI models, and hence also ChatGPT, come in a pre-trained format. They have been trained on a huge amount of data and have had their (billions of) parameters configured accordingly. However, this doesn’t mean that those models can’t learn anymore. One way to customize an OpenAI model and make it more capable of addressing specific tasks is by fine-tuning.Fine-tuning is a proper training process that requires a training dataset, compute power, and some training time (depending on the amount of data and compute instances). That is why it is worth testing another method for our model to become more skilled in specific tasks: shot learning.The idea is to let the model learn from simple examples rather than the entire dataset. Those examples are samples of the way we would like the model to respond so that the model not only learns the content but also the format, style, and taxonomy to use in its response. Furthermore, shot learning occurs directly via the prompt (as we will see in the following scenarios), so the whole experience is less time-consuming and easier to perform.The number of examples provided determines the level of shot learning we are referring to. In other words, we refer to zero-shot if no example is provided, one-shot if one example is provided, and few-shot if more than 2-3 examples are provided.Let’s focus on each of those scenarios: Zero-shot learning In this type of learning, the model is asked to perform a task for which it has not seen any training examples. The model must rely on prior knowledge or general information about the task to complete the task. For example, a zero-shot learning approach could be that of asking the model to generate a description, as defined in my prompt: One-shot learning In this type of learning, the model is given a single example of each new task it is asked to perform. The model must use its prior knowledge to generalize from this single example to perform the task. If we consider the preceding example, I could provide my model with a prompt-completion example before asking it to generate a new one: Note that the way I provided an example was similar to the structure used for fine-tuning: Few-shot learning In this type of learning, the model is given a small number of examples (typically between 3 and 5) of each new task it is asked to perform. The model must use its prior knowledge to generalize from these examples to perform the task. Let’s continue with our example and provide the model with further examples: The nice thing about few-shot learning is that you can also control model output in terms of how it is presented. You can also provide your model with a template of the way you would like your output to look. For example, consider the following tweet classifier: Let’s examine the preceding figure. First, I provided ChatGPT with some examples of labeled tweets. Then, I provided the same tweets but in a different data format (list format), as well as the labels in the same format. Finally, in list format, I provided unlabeled tweets so that the model returns a list of labels. Understanding Prompt Design The output format is not the only thing you can teach your model, though. You can also teach it to act and speak with a particular jargon and taxonomy, which could help you obtain the desired result with the desired wording: Or, imagine you want to generate a chatbot called Simpy that is very funny and sarcastic while responding: We have to say, with this last one, ChatGPT nailed it.Summary Short–learning possibilities are limitless (and often more useful than Simpy) – it’s only a matter of testing and a little bit of patience in finding the proper prompt design.As mentioned previously, it is important to remember that these forms of learning are different from traditional supervised learning, as well as fine-tuning. In few-shot learning, the goal is to enable the model to learn from very few examples, and to generalize from those examples to new tasks.About the Author Valentina Alto graduated in 2021 in Data Science. Since 2020 she has been working in Microsoft as Azure Solution Specialist and, since 2022, she focused on Data&AI workloads within the Manufacturing and Pharmaceutical industry. She has been working on customers’ projects closely with system integrators to deploy cloud architecture with a focus on datalake house and DWH, data integration and engineering, IoT and real-time analytics, Azure Machine Learning, Azure cognitive services (including Azure OpenAI Service), and PowerBI for dashboarding. She holds a BSc in Finance and an MSc degree in Data Science from Bocconi University, Milan, Italy. Since her academic journey she has been writing Tech articles about Statistics, Machine Learning, Deep Learning and AI on various publications. She has also written a book about the fundamentals of Machine Learning with Python. You can connect with Valentina on:LinkedinMedium

0
0
724

article-image-4-ways-to-treat-a-hallucinating-ai-with-prompt-engineering

Andrei Gheorghiu

02 Jun 2023

9 min read

4 Ways to Treat a Hallucinating AI with Prompt Engineering

Andrei Gheorghiu

02 Jun 2023

9 min read

Hey there, fellow AI enthusiast! Are you tired of your LLM (Large Language Model) creating random, nonsensical outputs? Fear not, because today I’m opening the box of prompt engineering pills looking for something to help you reduce those pesky hallucinations.First, let's break down what we're dealing with. Prompt engineering is the art of creating input prompts for AI models in a way that guides them towards generating more accurate, relevant, and useful responses. Think of it as gently nudging your AI model in the right direction, so it doesn't end up lost in a sea of information. The word “engineering” was probably not the wisest choice in many people’s opinion but that’s already history as everybody got used to it as it is. In my opinion, it’s more of a mix of logical thinking, creativity, language, and problem-solving skills. It feels a lot like writing code but using just natural language instead of structured syntax and vocabulary. While the user gets the freedom of using their own language and depth, with great freedom comes great responsibility. An average prompt will probably result in an average answer. The issue I’m addressing in this article is just one example from the many pitfalls that can be avoided with some basic prompt hygiene when interacting with AI.Now, onto the bizarre world of hallucinations. In the AI realm, hallucinations refer to instances when an AI model (particularly LLMs) generates output that is unrelated, implausible, or just plain weird. Some of you may have been there already, asking an AI model like GPT-3 to write a paragraph about cats, only to get a response about aliens invading Earth! And while the issue has been greatly mitigated in GPT-4 and similar newer AI models, it’s still something to be concerned about, especially if you’re looking for precise, fact-based responses. To make matters worse, sometimes the hallucinated answer sounds very convincing and seems to be plausible in the given context.For example, when asked the name of the Voodoo Lady in the Monkey Island series of games ChatGPT provides a series of convincing answers, all of which are wrong: It’s a bit of a trick question, as she is simply known as the Voodoo Lady in the original series of games, but you can see how convinced ChatGPT is of the answers that it provides (and continued to provide). If I hadn’t already known the answer, then I never would have known that ChatGPT was hallucinating. What Are the Technical Reasons Why AI Models Hallucinate? Training Data: Machine learning models are trained on vast amounts of text data from diverse sources. This data may contain inconsistencies, noise, and biases. As a result, when generating text, the model might output content that is influenced by these inconsistencies or noise, leading to hallucinations.Probabilistic Nature: Generative models like GPTs are based on probabilistic techniques that predict the next token (e.g., word or character) in a sequence, given the context. They estimate the likelihood of each token appearing and sample tokens based on these probabilities. If you’ve ever watched “Family Feud” on TV, you get a pretty good idea of what token prediction means. This sampling process can sometimes result in unpredictable and implausible outputs, as the model might choose less likely tokens, generating hallucinations. To make matters worse, GPTs are usually not built to say "I don't know" when they lack information. Instead, they produce the most likely answer. Lack of Ground Truth: Unlike supervised learning tasks where there is a clear ground truth for the model to learn from, generative tasks do not have a single correct output. Most LLMs that we use do not have the capability to check the facts in their output against a real-time validated source as they do not have Internet access. The absence of a ground truth can make it difficult for the model to learn constraints and discern what is plausible or correct, leading to the generation of hallucinated content. Optimization Challenges: During training, the models are optimized using a loss function that measures the discrepancy between the generated output and the expected outcome. In generative tasks, this loss function may not always capture the nuances of human language, making it difficult for the model to learn the correct patterns and avoid hallucinations.Model Complexity: State-of-the-art generative models like GPT-3 have billions of parameters that make them highly expressive and capable of capturing complex patterns in the data. However, this complexity can also result in overfitting and memorization of irrelevant or spurious patterns, causing hallucinations in generated outputs.So, clearly, we have a problem to solve. Here are four tips for how to improve your prompts and get better responses from ChatGPT. Four Tips for Improving Your Prompts Not being clear and specific in your promptsTo get the best results, you must clearly understand the problem yourself first. Make sure you know what you want to achieve and keep your prompts focused on that objective. The more explicit your prompt, the better the AI model can understand what you're looking for. So instead of asking, "Tell me about the Internet," try something like, "Explain how the Internet works and its importance in modern society." By doing this, you're giving your AI model a clearer picture of what you want. Sometimes you’ll have to make your way through multiple prompt iterations to get the result you’re after. Sometimes results you'll get may steer away from the initial topic. Make sure to stay on track and avoid deviating from the task at hand. Make sure you bring the conversation back in focus, otherwise the hallucination effect may amplify. Ignoring the power of an exampleEveryone loves examples they say, even AI models! Providing examples in your prompt helps your model understand the context and generate more accurate responses. For instance, "Write a brief history of Python, similar to how the history of Java is described in this article {example}" This not only gives the AI a clear topic but also a reference point to follow. Providing a well-structured example can also save you a lot of time in explaining the output you’re expecting to receive. Without an example your prompt might be too generic, allowing too much freedom in interpretation. Think about it like a conversation. Sometimes, the best approach to make yourself understood by the other party is to provide an example. Do you want to make sure there’s no misunderstanding from the start? Include an example in your initial prompt. Not following “Divide et Impera”Have you ever tried to build IKEA furniture without instructions? It's a bit like that for AI models dealing with complex prompts. Too many nuts and bolts to keep track of. Too many variables to consider. Instead of asking the model to "Explain the process of creating a neural network," break it down into smaller, more manageable tasks like, "Step 1: Define the problem, Step 2: Collect and prepare data," and so on. This way, the AI can tackle each step individually and generate more coherent outputs. It’s also very useful when you are trying to generate a more verbose and comprehensive response and not just a simple factual answer. You can, of course, combine both approaches asking the AI to provide the steps first, and then asking for more information on each step. Relying on the first response you receiveAs most LLMs in use today do not provide enough transparency in their reasoning process, working with them sometimes feels like interacting with a magic box. The non-deterministic nature of generative AI can further amplify this problem, so when you need precision it's best to experiment with various prompt formats and compare the results. Pro tip: some open-source models can already be queried in parallel using this website: Or, when interacting with a single AI model, try multiple approaches for your query like rephrasing the prompt, asking a question or presenting it as a statement.For example, if you're looking for information about cloud computing, you could try:"What is cloud computing and how does it work?""Explain cloud computing and its benefits.""Cloud computing has transformed the IT industry; discuss its impact and future potential."Some LLMs, such as Google's Bard, provide multiple responses by default so you can pick the most suitable from among them. Compare the outputs. Validate any important facts with other independent sources. Look for implausible or weird responses. Although a hallucination is possible, by using different prompts you’ll greatly reduce the probability of generating the same hallucination every time and therefore it’s going to be easier to detect it.Returning to our Voodoo Lady example earlier, by rephrasing the question we can get the right answer from ChatGPT. And there you have it! By trying to avoid these common mistakes you'll be well on your way to minimizing AI hallucinations and getting the output you're looking for. We all know how fast and unpredictable this domain can be, so the best approach is to learn together and share best practices among the community. The best prompt engineering books have not yet been written and there’s a ton of new things to learn about this emergent technology, so let’s stay in touch and share our findings!Happy prompting! About the Author Andrei Gheorghiu is an experienced trainer with a passion for helping learners achieve their maximum potential. He always strives to bring a high level of expertise and empathy to his teaching. With a background in IT audit, information security, and IT service management, Andrei has delivered training to over 10,000 students across different industries and countries. He is also a Certified Information Systems Security Professional and Certified Information Systems Auditor, with a keen interest in digital domains like Security Management and Artificial Intelligence. In his free time, Andrei enjoys trail running, photography, video editing and exploring the latest developments in technology.You can connect with Andrei on:LinkedinTwitter

1
0
967

article-image-learning-essential-linux-commands-for-navigating-the-shell-effectively

Expert Network

16 Aug 2021

9 min read

Learning Essential Linux Commands for Navigating the Shell Effectively

Expert Network

16 Aug 2021

9 min read

0
0
12642

article-image-gain-practical-expertise-latest-edition-software-architecture-with-c-sharp9-dotnet5

Expert Network

08 Jul 2021

3 min read

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Expert Network

08 Jul 2021

3 min read

Software architecture is one of the most discussed topics in the software industry today, and its importance will certainly grow more in the future. But the speed at which new features are added to these software solutions keeps increasing, and new architectural opportunities keep emerging. To strengthen your command on this, Packt brings to you the Second Edition of Software Architecture with C# 9 and .NET 5 by Gabriel Baptista and Francesco Abbruzzese – a fully revised and expanded guide, featuring the latest features of .NET 5 and C# 9. This book covers the most common design patterns and frameworks involved in modern cloud-based and distributed software architectures. It discusses when and how to use each pattern, by providing you with practical real-world scenarios. This book also presents techniques and processes such as DevOps, microservices, Kubernetes, continuous integration, and cloud computing, so that you can have a best-in-class software solution developed and delivered for your customers. This book will help you to understand the product that your customer wants from you. It will guide you to deliver and solve the biggest problems you can face during development. It also covers the do's and don'ts that you need to follow when you manage your application in a cloud-based environment. You will learn about different architectural approaches, such as layered architectures, service-oriented architecture, microservices, Single Page Applications, and cloud architecture, and understand how to apply them to specific business requirements. Finally, you will deploy code in remote environments or on the cloud using Azure. All the concepts in this book will be explained with the help of real-world practical use cases where design principles make the difference when creating safe and robust applications. By the end of the book, you will be able to develop and deliver highly scalable and secure enterprise-ready applications that meet the end customers' business needs. It is worth mentioning that Software Architecture with C# 9 and .NET 5, Second Edition will not only cover the best practices that a software architect should follow for developing C# and .NET Core solutions, but it will also discuss all the environments that we need to master in order to develop a software product according to the latest trends. This second edition is improved in code, and adapted to the new opportunities offered by C# 9 and .Net 5. We added all new frameworks and technologies such as gRPC, and Blazor, and described Kubernetes in more detail in a dedicated chapter. To get the most out of this book, understand it as a guidance that you may want to revisit many times for different circumstances. Do not forget to have Visual Studio Community 2019 or higher installed and be sure that you understand C# .NET principles.

0
0
8691

article-image-understanding-the-foundation-of-protocol-oriented-design

Expert Network

30 Jun 2021

7 min read

Understanding the Foundation of Protocol-oriented Design

Expert Network

30 Jun 2021

7 min read

When Apple announced Swift 2 at the World Wide Developers Conference (WWDC) in 2016, they also declared that Swift was the world’s first protocol-oriented programming (POP) language. From its name, we might assume that POP is all about protocol; however, that would be a wrong assumption. POP is about so much more than just protocol; it is actually a new way of not only writing applications but also thinking about programming. This article is an excerpt from the book Mastering Swift, 6th Edition by Jon Hoffman. In this article, we will discuss a protocol-oriented design and how we can use protocols and protocol extensions to replace superclasses. We will look at how to define animal types for a video game in a protocol-oriented way. Requirements When we develop applications, we usually have a set of requirements that we need to develop against. With that in mind, let’s define the requirements for the animal types that we will be creating in this article: We will have three categories of animals: land, sea, and air. Animals may be members of multiple categories. For example, an alligator can be a member of both the land and sea categories. Animals may attack and/or move when they are on a tile that matches the categories they are in. Animals will start off with a certain number of hit points, and if those hit points reach 0 or less, then they will be considered dead. POP Design We will start off by looking at how we would design the animal types needed and the relationships between them. Figure 1 shows our protocol-oriented design: Figure 1: Protocol-oriented design In this design, we use three techniques: protocol inheritance, protocol composition, and protocol extensions. Protocol inheritance Protocol inheritance is where one protocol can inherit the requirements from one or more additional protocols. We can also inherit requirements from multiple protocols, whereas a class in Swift can have only one superclass. Protocol inheritance is extremely powerful because we can define several smaller protocols and mix/match them to create larger protocols. You will want to be careful not to create protocols that are too granular because they will become hard to maintain and manage. Protocol composition Protocol composition allows types to conform to more than one protocol. With protocol-oriented design, we are encouraged to create multiple smaller protocols with very specific requirements. Let’s look at how protocol composition works. Protocol inheritance and composition are really powerful features but can also cause problems if used wrongly. Protocol composition and inheritance may not seem that powerful on their own; however, when we combine them with protocol extensions, we have a very powerful programming paradigm. Let’s look at how powerful this paradigm is. Protocol-oriented design — putting it all together We will begin by writing the Animal superclass as a protocol: protocol Animal { var hitPoints: Int { get set } } In the Animal protocol, the only item that we are defining is the hitPoints property. If we were putting in all the requirements for an animal in a video game, this protocol would contain all the requirements that would be common to every animal. We only need to add the hitPoints property to this protocol. Next, we need to add an Animal protocol extension, which will contain the functionality that is common for all types that conform to the protocol. Our Animal protocol extension would contain the following code: extension Animal { mutating func takeHit(amount: Int) { hitPoints -= amount } func hitPointsRemaining() -> Int { return hitPoints } func isAlive() -> Bool { return hitPoints > 0 ? true : false } } The Animal protocol extension contains the same takeHit(), hitPointsRemaining(), and isAlive() methods. Any type that conforms to the Animal protocol will automatically inherit these three methods. Now let’s define our LandAnimal, SeaAnimal, and AirAnimal protocols. These protocols will define the requirements for the land, sea, and air animals respectively: protocol LandAnimal: Animal { var landAttack: Bool { get } var landMovement: Bool { get } func doLandAttack() func doLandMovement() } protocol SeaAnimal: Animal { var seaAttack: Bool { get } var seaMovement: Bool { get } func doSeaAttack() func doSeaMovement() } protocol AirAnimal: Animal { var airAttack: Bool { get } var airMovement: Bool { get } func doAirAttack() func doAirMovement() } These three protocols only contain the functionality needed for their particular type of animal. Each of these protocols only contains four lines of code. This makes our protocol design much easier to read and manage. The protocol design is also much safer because the functionalities for the various animal types are isolated in their own protocols rather than being embedded in a giant superclass. We are also able to avoid the use of flags to define the animal category and, instead, define the category of the animal by the protocols it conforms to. In a full design, we would probably need to add some protocol extensions for each of the animal types, but we do not need them for our example here. Now, let’s look at how we would create our Lion and Alligator types using protocol-oriented design: struct Lion: LandAnimal { var hitPoints = 20 let landAttack = true let landMovement = true func doLandAttack() { print(“Lion Attack”) } func doLandMovement() { print(“Lion Move”) } } struct Alligator: LandAnimal, SeaAnimal { var hitPoints = 35 let landAttack = true let landMovement = true let seaAttack = true let seaMovement = true func doLandAttack() { print(“Alligator Land Attack”) } func doLandMovement() { print(“Alligator Land Move”) } func doSeaAttack() { print(“Alligator Sea Attack”) } func doSeaMovement() { print(“Alligator Sea Move”) } } Notice that we specify that the Lion type conforms to the LandAnimal protocol, while the Alligator type conforms to both the LandAnimal and SeaAnimal protocols. As we saw previously, having a single type that conforms to multiple protocols is called protocol composition and is what allows us to use smaller protocols, rather than one giant monolithic superclass. Both the Lion and Alligator types originate from the Animal protocol; therefore, they will inherit the functionality added with the Animal protocol extension. If our animal type protocols also had extensions, then they would also inherit the function added by those extensions. With protocol inheritance, composition, and extensions, our concrete types contain only the functionality needed by the particular animal types that they conform to. Since the Lion and Alligator types originate from the Animal protocol, we can use polymorphism. Let’s look at how this works: var animals = [Animal]() animals.append(Alligator()) animals.append(Alligator()) animals.append(Lion()) for (index, animal) in animals.enumerated() { if let _ = animal as? AirAnimal { print(“Animal at \(index) is Air”) } if let _ = animal as? LandAnimal { print(“Animal at \(index) is Land”) } if let _ = animal as? SeaAnimal { print(“Animal at \(index) is Sea”) } } In this example, we create an array that will contain Animal types named animals. We then create two instances of the Alligator type and one instance of the Lion type that are added to the animals array. Finally, we use a for-in loop to loop through the array and print out the animal type based on the protocol that the instance conforms to. Upgrade your knowledge and become an expert in the latest version of the Swift programming language with Mastering Swift 5.3, 6th Edition by Jon Hoffman. About Jon Hoffman has over 25 years of experience in the field of information technology. He has worked in the areas of system administration, network administration, network security, application development, and architecture. Currently, Jon works as an Enterprise Software Manager for Syn-Tech Systems.

0
0
6305

article-image-top-6-cybersecurity-books-from-packt-to-accelerate-your-career

Expert Network

28 Jun 2021

7 min read

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Expert Network

28 Jun 2021

7 min read

With new technology threats, rising international tensions, and state-sponsored cyber-attacks, cybersecurity is more important than ever. In organizations worldwide, there is not only a dire need for cybersecurity analysts, engineers, and consultants but the senior management executives and leaders are expected to be cognizant of the possible threats and risk management. The era of cyberwarfare is now upon us. What we do now and how we determine what we will do in the future is the difference between whether our businesses live or die and whether our digital self-survives the digital battlefield. In this article, we'll discuss 6 titles from Packt’s bank of cybersecurity resources for everyone from an aspiring cybersecurity professional to an expert. Adversarial Tradecraft in Cybersecurity A comprehensive guide that helps you master cutting-edge techniques and countermeasures to protect your organization from live hackers. It enables you to leverage cyber deception in your operations to gain an edge over the competition. Little has been written about how to act when live hackers attack your system and run amok. Even experienced hackers sometimes tend to struggle when they realize the network defender has caught them and is zoning in on their implants in real-time. This book provides tips and tricks all along the kill chain of an attack, showing where hackers can have the upper hand in a live conflict and how defenders can outsmart them in this adversarial game of computer cat and mouse. This book contains two subsections in each chapter, specifically focusing on the offensive and defensive teams. Pentesters to red teamers, SOC analysis to incident response, attackers, defenders, general hackers, advanced computer users, and security engineers should gain a lot from this book. This book will also be beneficial to those getting into purple teaming or adversarial simulations, as it includes processes for gaining an advantage over the other team. The author, Dan Borges, is a passionate programmer and security researcher who has worked in security positions for companies such as Uber, Mandiant, and CrowdStrike. Dan has been programming various devices for >20 years, with 14+ years in the security industry. Cybersecurity – Attack and Defense Strategies, Second Edition A book that enables you to counter modern threats and employ state-of-the-art tools and techniques to protect your organization against cybercriminals. It is a completely revised new edition of the bestselling book, covering the very latest security threats and defense mechanisms including a detailed overview of Cloud Security Posture Management (CSPM) and an assessment of the current threat landscape, with additional focus on new IoT threats and cryptomining. This book is for IT professionals venturing into the IT security domain, IT pentesters, security consultants, or those looking to perform ethical hacking. Prior knowledge of penetration testing is beneficial. This book is authored by Yuri Diogenes and Dr. Erdal Ozkaya. Yuri Diogenes is a professor at EC-Council University for their master's degree in cybersecurity and a Senior Program Manager at Microsoft for Azure Security Center. Dr. Erdal Ozkaya is a leading Cybersecurity Professional with business development, management, and academic skills who focuses on securing Cyber Space and sharing his real-life skills as a Security Advisor, Speaker, Lecturer, and Author. Cyber Minds This book comprises insights on cybersecurity across the cloud, data, artificial intelligence, blockchain, and IoT to keep you cyber safe. Shira Rubinoff's Cyber Minds brings together the top authorities in cybersecurity to discuss the emergent threats that face industries, societies, militaries, and governments today. Cyber Minds serves as a strategic briefing on cybersecurity and data safety, collecting expert insights from sector security leaders. This book will help you to arm and inform yourself of what you need to know to keep your business – or your country – safe. This book is essential reading for business leaders, the C-Suite, board members, IT decision-makers within an organization, and anyone with a responsibility for cybersecurity. The author, Shira Rubinoff is a recognized cybersecurity executive, cybersecurity and blockchain advisor, global keynote speaker, and influencer who has built two cybersecurity product companies and led multiple women-in-technology efforts. Cyber Warfare – Truth, Tactics, and Strategies Cyber Warfare – Truth, Tactics, and Strategies is as real-life and up-to-date as cyber can possibly be, with examples of actual attacks and defense techniques, tools, and strategies presented for you to learn how to think about defending your own systems and data. This book introduces you to strategic concepts and truths to help you and your organization survive on the battleground of cyber warfare. The book not only covers cyber warfare, but also looks at the political, cultural, and geographical influences that pertain to these attack methods and helps you understand the motivation and impacts that are likely in each scenario. This book is for any engineer, leader, or professional with either responsibility for cybersecurity within their organizations, or an interest in working in this ever-growing field. The author, Dr. Chase Cunningham holds a Ph.D. and M.S. in computer science from Colorado Technical University and a B.S. from American Military University focused on counter-terrorism operations in cyberspace. Incident Response in the Age of Cloud This book is a comprehensive guide for organizations on how to prepare for cyber-attacks and control cyber threats and network security breaches in a way that decreases damage, recovery time, and costs, facilitating the adaptation of existing strategies to cloud-based environments. It is aimed at first-time incident responders, cybersecurity enthusiasts who want to get into IR, and anyone who is responsible for maintaining business security. This book will also interest CIOs, CISOs, and members of IR, SOC, and CSIRT teams. However, IR is not just about information technology or security teams, and anyone with legal, HR, media, or other active business roles would benefit from this book. The book assumes you have some admin experience. No prior DFIR experience is required. Some infosec knowledge will be a plus but isn’t mandatory. The author, Dr. Erdal Ozkaya, is a technically sophisticated executive leader with a solid education and strong business acumen. Over the course of his progressive career, he has developed a keen aptitude for facilitating the integration of standard operating procedures that ensure the optimal functionality of all technical functions and systems. Cybersecurity Threats, Malware Trends, and Strategies This book trains you to mitigate exploits, malware, phishing, and other social engineering attacks. After scrutinizing numerous cybersecurity strategies, Microsoft's former Global Chief Security Advisor provides unique insights on the evolution of the threat landscape and how enterprises can address modern cybersecurity challenges. The book will provide you with an evaluation of the various cybersecurity strategies that have ultimately failed over the past twenty years, along with one or two that have actually worked. It will help executives and security and compliance professionals understand how cloud computing is a game-changer for them. This book is designed to benefit senior management at commercial sector and public sector organizations, including Chief Information Security Officers (CISOs) and other senior managers of cybersecurity groups, Chief Information Officers (CIOs), Chief Technology Officers (CTOs), and senior IT managers who want to explore the entire spectrum of cybersecurity, from threat hunting and security risk management to malware analysis. The author, Tim Rains worked at Microsoft for the better part of two decades where he held a number of roles including Global Chief Security Advisor, Director of Security, Identity and Enterprise Mobility, Director of Trustworthy Computing, and was a founding technical leader of Microsoft's customer-facing Security Incident Response team. Summary If you aspire to become a cybersecurity expert, any good study/reference material is as important as hands-on training and practical understanding. By choosing a suitable guide, one can drastically accelerate the learning graph and carve out one’s own successful career trajectory.

0
0
11054

article-image-exploring-the-strategy-behavioral-design-pattern-in-node-js

Expert Network

02 Jun 2021

10 min read

Exploring the Strategy Behavioral Design Pattern in Node.js

Expert Network

02 Jun 2021

10 min read

A design pattern is a reusable solution to a recurring problem. The term is really broad in its definition and can span multiple domains of an application. However, the term is often associated with a well-known set of object-oriented patterns that were popularized in the 90s by the book, Design Patterns: Elements of Reusable Object- Oriented Software, Pearson Education, by the almost legendary Gang of Four (GoF): Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. This article is an excerpt from the book Node.js Design Patterns, Third Edition by Mario Casciaro and Luciano Mammino – a comprehensive guide for learning proven patterns, techniques, and tricks to take full advantage of the Node.js platform. In this article, we’ll look at the behavior of components in software design. We’ll learn how to combine objects and how to define the way they communicate so that the behavior of the resulting structure becomes extensible, modular, reusable, and adaptable. After introducing all the behavioral design patterns, we will dive deep into the details of the strategy pattern. Now, it's time to roll up your sleeves and get your hands dirty with some behavioral design patterns. Types of Behavioral Design Patterns The Strategy pattern allows us to extract the common parts of a family of closely related components into a component called the context and allows us to define strategy objects that the context can use to implement specific behaviors. The State pattern is a variation of the Strategy pattern where the strategies are used to model the behavior of a component when under different states. The Template pattern, instead, can be considered the "static" version of the Strategy pattern, where the different specific behaviors are implemented as subclasses of the template class, which models the common parts of the algorithm. The Iterator pattern provides us with a common interface to iterate over a collection. It has now become a core pattern in Node.js. JavaScript offers native support for the pattern (with the iterator and iterable protocols). Iterators can be used as an alternative to complex async iteration patterns and even to Node.js streams. The Middleware pattern allows us to define a modular chain of processing steps. This is a very distinctive pattern born from within the Node.js ecosystem. It can be used to preprocess and postprocess data and requests. The Command pattern materializes the information required to execute a routine, allowing such information to be easily transferred, stored, and processed. The Strategy Pattern The Strategy pattern enables an object, called the context, to support variations in its logic by extracting the variable parts into separate, interchangeable objects called strategies. The context implements the common logic of a family of algorithms, while a strategy implements the mutable parts, allowing the context to adapt its behavior depending on different factors, such as an input value, a system configuration, or user preferences. Strategies are usually part of a family of solutions and all of them implement the same interface expected by the context. The following figure shows the situation we just described: Figure 1: General structure of the Strategy pattern Figure 1 shows you how the context object can plug different strategies into its structure as if they were replaceable parts of a piece of machinery. Imagine a car; its tires can be considered its strategy for adapting to different road conditions. We can fit winter tires to go on snowy roads thanks to their studs, while we can decide to fit high-performance tires for traveling mainly on motorways for a long trip. On the one hand, we don't want to change the entire car for this to be possible, and on the other, we don't want a car with eight wheels so that it can go on every possible road. The Strategy pattern is particularly useful in all those situations where supporting variations in the behavior of a component requires complex conditional logic (lots of if...else or switch statements) or mixing different components of the same family. Imagine an object called Order that represents an online order on an e-commerce website. The object has a method called pay() that, as it says, finalizes the order and transfers the funds from the user to the online store. To support different payment systems, we have a couple of options: Use an ..elsestatement in the pay() method to complete the operation based on the chosen payment option Delegate the logic of the payment to a strategy object that implements the logic for the specific payment gateway selected by the user In the first solution, our Order object cannot support other payment methods unless its code is modified. Also, this can become quite complex when the number of payment options grows. Instead, using the Strategy pattern enables the Order object to support a virtually unlimited number of payment methods and keeps its scope limited to only managing the details of the user, the purchased items, and the relative price while delegating the job of completing the payment to another object. Let's now demonstrate this pattern with a simple, realistic example. Multi-format configuration objects Let's consider an object called Config that holds a set of configuration parameters used by an application, such as the database URL, the listening port of the server, and so on. The Config object should be able to provide a simple interface to access these parameters, but also a way to import and export the configuration using persistent storage, such as a file. We want to be able to support different formats to store the configuration, for example, JSON, INI, or YAML. By applying what we learned about the Strategy pattern, we can immediately identify the variable part of the Config object, which is the functionality that allows us to serialize and deserialize the configuration. This is going to be our strategy. Creating a new module Let's create a new module called config.js, and let's define the generic part of our configuration manager: import { promises as fs } from 'fs' import objectPath from 'object-path' export class Config { constructor (formatStrategy) { // (1) this.data = {} this.formatStrategy = formatStrategy } get (configPath) { // (2) return objectPath.get(this.data, configPath) } set (configPath, value) { // (2) return objectPath.set(this.data, configPath, value) } async load (filePath) { // (3) console.log(`Deserializing from ${filePath}`) this.data = this.formatStrategy.deserialize( await fs.readFile(filePath, 'utf-8') ) } async save (filePath) { // (3) console.log(`Serializing to ${filePath}`) await fs.writeFile(filePath, this.formatStrategy.serialize(this.data)) } } This is what's happening in the preceding code: In the constructor, we create an instance variable called data to hold the configuration data. Then we also store formatStrategy, which represents the component that we will use to parse and serialize the data. We provide two methods, set()and get(), to access the configuration properties using a dotted path notation (for example, property.subProperty) by leveraging a library called object-path (nodejsdp.link/object-path). The load() and save() methods are where we delegate, respectively, the deserialization and serialization of the data to our strategy. This is where the logic of the Config class is altered based on the formatStrategy passed as an input in the constructor. As we can see, this very simple and neat design allows the Config object to seamlessly support different file formats when loading and saving its data. The best part is that the logic to support those various formats is not hardcoded anywhere, so the Config class can adapt without any modification to virtually any file format, given the right strategy. Creating format Strategies To demonstrate this characteristic, let's now create a couple of format strategies in a file called strategies.js. Let's start with a strategy for parsing and serializing data using the INI file format, which is a widely used configuration format (more info about it here: nodejsdp.link/ini-format). For the task, we will use an npm package called ini (nodejsdp.link/ini): import ini from 'ini' export const iniStrategy = { deserialize: data => ini.parse(data), serialize: data => ini.stringify(data) } Nothing really complicated! Our strategy simply implements the agreed interface, so that it can be used by the Config object. Similarly, the next strategy that we are going to create allows us to support the JSON file format, widely used in JavaScript and in the web development ecosystem in general: export const jsonStrategy = { deserialize: data => JSON.parse(data), serialize: data => JSON.stringify(data, null, ' ') } Now, to show you how everything comes together, let's create a file named index.js, and let's try to load and save a sample configuration using different formats: import { Config } from './config.js' import { jsonStrategy, iniStrategy } from './strategies.js' async function main () { const iniConfig = new Config(iniStrategy) await iniConfig.load('samples/conf.ini') iniConfig.set('book.nodejs', 'design patterns') await iniConfig.save('samples/conf_mod.ini') const jsonConfig = new Config(jsonStrategy) await jsonConfig.load('samples/conf.json') jsonConfig.set('book.nodejs', 'design patterns') await jsonConfig.save('samples/conf_mod.json') } main() Our test module reveals the core properties of the Strategy pattern. We defined only one Config class, which implements the common parts of our configuration manager, then, by using different strategies for serializing and deserializing data, we created different Config class instances supporting different file formats. The example we've just seen shows us only one of the possible alternatives that we had for selecting a strategy. Other valid approaches might have been the following: Creating two different strategy families: One for the deserialization and the other for the serialization. This would have allowed reading from a format and saving to another. Dynamically selecting the strategy: Depending on the extension of the file provided; the Config object could have maintained a map extension → strategy and used it to select the right algorithm for the given extension. As we can see, we have several options for selecting the strategy to use, and the right one only depends on your requirements and the tradeoff in terms of features and the simplicity you want to obtain. Furthermore, the implementation of the pattern itself can vary a lot as well. For example, in its simplest form, the context and the strategy can both be simple functions: function context(strategy) {...} Even though this may seem insignificant, it should not be underestimated in a programming language such as JavaScript, where functions are first-class citizens and used as much as fully-fledged objects. Between all these variations, though, what does not change is the idea behind the pattern; as always, the implementation can slightly change but the core concepts that drive the pattern are always the same. Summary In this article, we dive deep into the details of the strategy pattern, one of the Behavioral Design Patterns in Node.js. Learn more in the book, Node.js Design Patterns, Third Edition by Mario Casciaro and Luciano Mammino. About the Authors Mario Casciaro is a software engineer and entrepreneur. Mario worked at IBM for a number of years, first in Rome, then in Dublin Software Lab. He currently splits his time between Var7 Technologies-his own software company-and his role as lead engineer at D4H Technologies where he creates software for emergency response teams. Luciano Mammino wrote his first line of code at the age of 12 on his father's old i386. Since then he has never stopped coding. He is currently working at FabFitFun as principal software engineer where he builds microservices to serve millions of users every day.

0
0
11722

article-image-scientific-analysis-of-donald-trumps-tweets-on-covid-19-with-transformers

Expert Network

19 May 2021

7 min read

Scientific Analysis of Donald Trump’s Tweets on COVID-19 with Transformers

Expert Network

19 May 2021

7 min read

It takes time and effort to figure out what is fake news and what isn't. Like children, we have to work our way through something we perceive as fake news. This article is an excerpt from the book Transformers for Natural Language Processing by Denis Rothman – A comprehensive guide for deep learning & NLP practitioners, data analysts and data scientists who want an introduction to AI language understanding to process the increasing amounts of language-driven functions. In this article, we will focus on the logic of fake news. We will run the BERT model on SRL and visualize the results on AllenNLP.org. Now, let's go through some presidential tweets on COVID-19. Our goal is certainly not to judge anybody or anything. Fake news involves both opinion and facts. News often depends on the perception of facts by local culture. We will provide ideas and tools to help others gather more information on a topic and find their way in the jungle of information we receive every day. Semantic Role Labeling (SRL) SRL is an excellent educational tool for all of us. We tend just to read Tweets passively and listen to what others say about them. Breaking messages down with SRL is a good way to develop social media analytical skills to distinguish fake from accurate information. I recommend using SRL transformers for educational purposes in class. A young student can enter a Tweet and analyze each verb and its arguments. It could help younger generations become active readers on social media. We will first analyze a relatively undivided Tweet and then a conflictual Tweet. Analyzing the undivided Tweet Let's analyze the latest Tweet found on July 4 while writing the book, Transformers for Natural Language Processing. I took the name of the person who is referred to as a "Black American" out and paraphrased some of the former President's text: "X is a great American, is hospitalized with coronavirus, and has requested prayer. Would you join me in praying for him today, as well as all those who are suffering from COVID-19?" Let's go to AllenNLP.org, visualize our SRL using https://demo.allennlp.org/semantic-role-labeling, run the sentence, and look at the result. The verb "hospitalized" shows the member is staying close to the facts: Figure: SRL arguments of the verb "hospitalized" The message is simple: "X" + "hospitalized" + "coronavirus." The verb "requested" shows that the message is becoming political: Figure: SRL arguments of the verb "requested" We don't know if the person requested the former President to pray or he decided he would be the center of the request. A good exercise would be to display an HTML page and ask the users what they think. For example, the users could be asked to look at the results of the SRL task and answer the two following questions: "Was former President Trump asked to pray, or did he deviate a request made to others for political reasons?" "Is the fact that former President Trump states that he was indirectly asked to pray for X fake news or not?" You can think about it and decide for yourself! Analyzing the Banned Tweet Let's have a look at one that was banned from Twitter. I took the names out and paraphrased it and toned it down. Still, when we run it on AllenNLP.org and visualize the results, we get some surprising SRL outputs. Here is the toned-down and paraphrased Tweet: These thugs are dishonoring the memory of X. When the looting starts, actions must be taken. Although I suppressed the main part of the original Tweet, we can see that the SRL task shows the bad associations made in the Tweet: Figure: SRL arguments of the verb "dishonoring" An educational approach to this would be to explain that we should not associate the arguments "thugs" and "memory" and "looting." They do not fit together at all. An important exercise would be to ask a user why the SRL arguments do not fit together. I recommend many such exercises so that the transformer model users develop SRL skills to have a critical view of any topic presented to them. Critical thinking is the best way to stop the propagation of the fake news pandemic! We have gone through rational approaches to fake news with transformers, heuristics, and instructive websites. However, in the end, a lot of the heat in fake news debates boils down to emotional and irrational reactions. In a world of opinion, you will never find an entirely objective transformer model that detects fake news since opposing sides never agree on what the truth is in the first place! One side will agree with the transformer model's output. Another will say that the model is biased and built by enemies of their opinion! The best approach is to listen to others and try to keep the heat down! Looking for the silver bullet Looking for a silver bullet transformer model can be time-consuming or rewarding, depending on how much time and money you want to spend on continually changing models. For example, a new approach to transformers can be found through disentanglement. Disentanglement in AI allows you to separate the features of a representation to make the training process more flexible. Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen designed DeBERTa, a disentangled version of a transformer, and described the model in an interesting article: DeBERTa: Decoding-enhanced BERT with Disentangled Attention, https://arxiv.org/ abs/2006.03654 The two main ideas implemented in DeBERTa are: Disentangle the content and position in the transformer model to train the two vectors separately. Use an absolute position in thedecoderto predict masked tokens in the pretraining process. The authors provide the code on GitHub: https://github.com/microsoft/DeBERTa DeBERTa exceeds the human baseline on the SuperGLUE leaderboard in December 2020 using 1.5B parameters. Should you stop everything you are doing on transformers and rush to this model, integrate your data, train the model, test it, and implement it? It is very probable that by the end of 2021, another model will beat this one and so on. Should you change models all of the time in production? That will be your decision. You can also choose to design better training methods. Looking for reliable training methods Looking for reliable training methods with smaller models such as the PET designed by Timo Schick can also be a solution. Why? Being in a good position on the SuperGLUE leaderboard does not mean that the model will provide a high quality of decision-making for medical, legal, and other critical areas for sequence predications. Looking for customized training solutions for a specific topic could be more productive than trying all the best transformers on the SuperGLUE leaderboard. Take your time to think about implementing transformers to find the best approach for your project. We will now conclude the article. Summary Fake news begins deep inside our emotional history as humans. When an event occurs, emotions take over to help us react quickly to a situation. We are hardwired to react strongly when we are threatened. We went through raging conflicts over COVID-19, former President Trump, and climate change. In each case, we saw that emotional reactions are the fastest ones to build up into conflicts. We then designed a roadmap to take the emotional perception of fake news to a rational level. We showed that it is possible to find key information in Tweets, Facebook messages, and other media. The news used in this article is perceived by some as real news and others as fake news to create a rationale for teachers, parents, friends, co-workers, or just people talking. About the Author Denis Rothman graduated from Sorbonne University and Paris-Diderot University, patenting one of the very first word2matrix embedding solutions. Denis Rothman is the author of three cutting-edge AI solutions: one of the first AI cognitive chatbots more than 30 years ago; a profit-orientated AI resource optimizing system; and an AI APS (Advanced Planning and Scheduling) solution based on cognitive patterns used worldwide in aerospace, rail, energy, apparel, and many other fields. Designed initially as a cognitive AI bot for IBM, it then went on to become a robust APS solution used to this day.

0
0
5433

article-image-distributed-training-in-tensorflow-2-x

Expert Network

30 Apr 2021

7 min read

Distributed training in TensorFlow 2.x

Expert Network

30 Apr 2021

7 min read

TensorFlow 2 is a rich development ecosystem composed of two main parts: Training and Serving. Training consists of a set of libraries for dealing with datasets (tf.data), a set of libraries for building models, including high-level libraries (tf.Keras and Estimators), low-level libraries (tf.*), and a collection of pretrained models (tf.Hub). Training can happen on CPUs, GPUs, and TPUs via distribution strategies and the result can be saved using the appropriate libraries. This article is an excerpt from the book, Deep Learning with TensorFlow 2 and Keras, Second Edition by Antonio Gulli, Amita Kapoor, and Sujit Pal. This book teaches deep learning techniques alongside TensorFlow (TF) and Keras. In this article, we’ll review the addition of the powerful new feature, distributed training, in TensorFlow 2.x. One very useful addition to TensorFlow 2.x is the possibility to train models using distributed GPUs, multiple machines, and TPUs in a very simple way with very few additional lines of code. tf.distribute.Strategy is the TensorFlow API used in this case and it supports both tf.keras and tf.estimator APIs and eager execution. You can switch between GPUs, TPUs, and multiple machines by just changing the strategy instance. Strategies can be synchronous, where all workers train over different slices of input data in a form of sync data parallel computation, or asynchronous, where updates from the optimizers are not happening in sync. All strategies require that data is loaded in batches via the tf.data.Dataset api. Note that the distributed training support is still experimental. A roadmap is given in Figure 1: Figure 1: Distributed training support fr different strategies and APIs Let’s discuss in detail all the different strategies reported in Figure 1. Multiple GPUs TensorFlow 2.x can utilize multiple GPUs. If we want to have synchronous distributed training on multiple GPUs on one machine, there are two things that we need to do: (1) We need to load the data in a way that will be distributed into the GPUs, and (2) We need to distribute some computations into the GPUs too: In order to load our data in a way that can be distributed into the GPUs, we simply need tf.data.Dataset (which has already been discussed in the previous paragraphs). If we do not have a tf.data.Dataset but we have a normal tensor, then we can easily convert the latter into the former using tf.data.Dataset.from_tensors_slices(). This will take a tensor in memory and return a source dataset, the elements of which are slices of the given tensor. In our toy example, we use NumPy to generate training data x and labels y, and we transform it into tf.data.Dataset with tf.data.Dataset.from_tensor_slices(). Then we apply a shuffle to avoid bias in training across GPUs and then generate SIZE_BATCHES batches: import tensorflow as tf import numpy as np from tensorflow import keras N_TRAIN_EXAMPLES = 1024*1024 N_FEATURES = 10 SIZE_BATCHES = 256 # 10 random floats in the half-open interval [0.0, 1.0). x = np.random.random((N_TRAIN_EXAMPLES, N_FEATURES)) y = np.random.randint(2, size=(N_TRAIN_EXAMPLES, 1)) x = tf.dtypes.cast(x, tf.float32) print (x) dataset = tf.data.Dataset.from_tensor_slices((x, y)) dataset = dataset.shuffle(buffer_size=N_TRAIN_EXAMPLES).batch(SIZE_BATCHES) In order to distribute some computations to GPUs, we instantiate a distribution = tf.distribute.MirroredStrategy() object, which supports synchronous distributed training on multiple GPUs on one machine. Then, we move the creation and compilation of the Keras model inside the strategy.scope(). Note that each variable in the model is mirrored across all the replicas. Let’s see it in our toy example: # this is the distribution strategy distribution = tf.distribute.MirroredStrategy() # this piece of code is distributed to multiple GPUs with distribution.scope(): model = tf.keras.Sequential() model.add(tf.keras.layers.Dense(16, activation=‘relu’, input_shape=(N_FEATURES,))) model.add(tf.keras.layers.Dense(1, activation=‘sigmoid’)) optimizer = tf.keras.optimizers.SGD(0.2) model.compile(loss=‘binary_crossentropy’, optimizer=optimizer) model.summary() # Optimize in the usual way but in reality you are using GPUs. model.fit(dataset, epochs=5, steps_per_epoch=10) Note that each batch of the given input is divided equally among the multiple GPUs. For instance, if using MirroredStrategy() with two GPUs, each batch of size 256 will be divided among the two GPUs, with each of them receiving 128 input examples for each step. In addition, note that each GPU will optimize on the received batches and the TensorFlow backend will combine all these independent optimizations on our behalf. In short, using multiple GPUs is very easy and requires minimal changes to the tf.Keras code used for a single server. MultiWorkerMirroredStrategy This strategy implements synchronous distributed training across multiple workers, each one with potentially multiple GPUs. As of September 2019 the strategy works only with Estimators and it has experimental support for tf.Keras. This strategy should be used if you are aiming at scaling beyond a single machine with high performance. Data must be loaded with tf.Dataset and shared across workers so that each worker can read a unique subset. TPUStrategy This strategy implements synchronous distributed training on TPUs. TPUs are Google’s specialized ASICs chips designed to significantly accelerate machine learning workloads in a way often more efficient than GPUs. According to this public information (https://github.com/tensorflow/tensorflow/issues/24412): “the gist is that we intend to announce support for TPUStrategy alongside Tensorflow 2.1. Tensorflow 2.0 will work under limited use-cases but has many improvements (bug fixes, performance improvements) that we’re including in Tensorflow 2.1, so we don’t consider it ready yet.” ParameterServerStrategy This strategy implements either multi-GPU synchronous local training or asynchronous multi-machine training. For local training on one machine, the variables of the models are placed on the CPU and operations are replicated across all local GPUs. For multi-machine training, some machines are designated as workers and some as parameter servers with the variables of the model placed on parameter servers. Computation is replicated across all GPUs of all workers. Multiple workers can be set up with the environment variable TF_CONFIG as in the following example: os.environ[“TF_CONFIG”] = json.dumps({ “cluster”: { “worker”: [“host1:port”, “host2:port”, “host3:port”], “ps”: [“host4:port”, “host5:port”] }, “task”: {“type”: “worker”, “index”: 1} }) In this article, we have seen how it is possible to train models using distributed GPUs, multiple machines, and TPUs in a very simple way with very few additional lines of code. Learn how to build machine and deep learning systems with the newly released TensorFlow 2 and Keras for the lab, production, and mobile devices with Deep Learning with TensorFlow 2 and Keras, Second Edition by Antonio Gulli, Amita Kapoor and Sujit Pal. About the Authors Antonio Gulli is a software executive and business leader with a passion for establishing and managing global technological talent, innovation, and execution. He is an expert in search engines, online services, machine learning, information retrieval, analytics, and cloud computing. Amita Kapoor is an Associate Professor in the Department of Electronics, SRCASW, University of Delhi and has been actively teaching neural networks and artificial intelligence for the last 20 years. She is an active member of ACM, AAAI, IEEE, and INNS. She has co-authored two books. Sujit Pal is a technology research director at Elsevier Labs, working on building intelligent systems around research content and metadata. His primary interests are information retrieval, ontologies, natural language processing, machine learning, and distributed processing. He is currently working on image classification and similarity using deep learning models. He writes about technology on his blog at Salmon Run.

0
0
6646

article-image-how-to-create-tensors-in-pytorch

Expert Network

20 Apr 2021

6 min read

How to Create Tensors in PyTorch

Expert Network

20 Apr 2021

6 min read

A tensor is the fundamental building block of all DL toolkits. The name sounds rather mystical, but the underlying idea is that a tensor is a multi-dimensional array. Building analogy with school math, one single number is like a point, which is zero-dimensional, while a vector is one-dimensional like a line segment, and a matrix is a two-dimensional object. Three-dimensional number collections can be represented by a parallelepiped of numbers, but they don't have a separate name in the same way as a matrix. We can keep this term for collections of higher dimensions, which are named multi-dimensional arrays. Another thing to note about tensors used in DL is that they are only partially related to tensors used in tensor calculus or tensor algebra. In DL, tensor is any multi-dimensional array, but in mathematics, tensor is a mapping between vector spaces, which might be represented as a multi-dimensional array in some cases but has much more semantical payload behind it. Mathematicians usually frown at everybody who uses well-established mathematical terms to name different things, so, be warned. Figure 1: Going from a single number to an n-dimension tensor This article is an excerpt from the book Deep Reinforcement Learning Hands-On - Second Edition by Maxim Lapan. This book is an updated and expanded version of the bestselling guide to the very latest RL tools and techniques. In this article, we’ll discuss the fundamental building block of all DL toolkits, tensor. Creation of tensors If you're familiar with the NumPy library, then you already know that its central purpose is the handling of multi-dimensional arrays in a generic way. In NumPy, such arrays aren't called tensors, but they are in fact tensors. Tensors are used very widely in scientific computations as generic storage for data. For example, a color image could be encoded as a 3D tensor with dimensions of width, height, and color plane. Apart from dimensions, a tensor is characterized by the type of its elements. There are eight types supported by PyTorch: three float types (16-bit, 32-bit, and 64-bit) and five integer types (8-bit signed, 8-bit unsigned, 16-bit, 32-bit, and 64-bit). Tensors of different types are represented by different classes, with the most commonly used being torch.FloatTensor (corresponding to a 32-bit float), torch.ByteTensor (an 8-bit unsigned integer), and torch.LongTensor (a 64-bit signed integer). The rest can be found in the PyTorch documentation. There are three ways to create a tensor in PyTorch: By calling a constructor of the required type. By converting a NumPy array or a Python list into a tensor. In this case, the type will be taken from the array's type. By asking PyTorch to create a tensor with specific data for you. For example, you can use the torch.zeros() function to create a tensor filled with zero values. To give you examples of these methods, let's look at a simple session: >>> import torch >>> import numpy as np >>> a = torch.FloatTensor(3, 2) >>> a tensor([[4.1521e+09, 4.5796e-41], [ 1.9949e-20, 3.0774e-41], [ 4.4842e-44, 0.0000e+00]]) Here, we imported both PyTorch and NumPy and created an uninitialized tensor of size 3×2. By default, PyTorch allocates memory for the tensor, but doesn't initialize it with anything. To clear the tensor's content, we need to use its operation: >> a.zero_() tensor([[ 0., 0.], [ 0., 0.], [ 0., 0.]]) There are two types of operation for tensors: inplace and functional. Inplace operations have an underscore appended to their name and operate on the tensor's content. After this, the object itself is returned. The functional equivalent creates a copy of the tensor with the performed modification, leaving the original tensor untouched. Inplace operations are usually more efficient from a performance and memory point of view. Another way to create a tensor by its constructor is to provide a Python iterable (for example, a list or tuple), which will be used as the contents of the newly created tensor: >>> torch.FloatTensor([[1,2,3],[3,2,1]]) tensor([[ 1., 2., 3.], [ 3., 2., 1.]]) Here we are creating the same tensor with zeroes using NumPy: >>> n = np.zeros(shape=(3, 2)) >>> n array([[ 0., 0.], [ 0., 0.], [ 0., 0.]]) >>> b = torch.tensor(n) >>> b tensor([[ 0., 0.], [ 0., 0.], [ 0., 0.]], dtype=torch.float64) The torch.tensor method accepts the NumPy array as an argument and creates a tensor of appropriate shape from it. In the preceding example, we created a NumPy array initialized by zeros, which created a double (64-bit float) array by default. So, the resulting tensor has the DoubleTensor type (which is shown in the preceding example with the dtype value). Usually, in DL, double precision is not required and it adds an extra memory and performance overhead. The common practice is to use the 32-bit float type, or even the 16-bit float type, which is more than enough. To create such a tensor, you need to specify explicitly the type of NumPy array: >>> n = np.zeros(shape=(3, 2), dtype=np.float32) >>> torch.tensor(n) tensor([[ 0., 0.], [ 0., 0.], [ 0., 0.]]) As an option, the type of the desired tensor could be provided to the torch.tensor function in the dtype argument. However, be careful, since this argument expects to get a PyTorch type specification, not the NumPy one. PyTorch types are kept in the torch package, for example, torch.float32 and torch.uint8. >>> n = np.zeros(shape=(3,2)) >>> torch.tensor(n, dtype=torch.float32) tensor([[ 0., 0.], [ 0., 0.], [ 0., 0.]]) In this article, we saw a quick overview of tensor, the fundamental building block of all DL toolkits. We talked about tensor and how to create it in the PyTorch library. Discover ways to increase efficiency of RL methods both from theoretical and engineering perspective with the book Deep Reinforcement Learning Hands-on, Second Edition by Maxim Lapan.  About the Author Maxim Lapan is a deep learning enthusiast and independent researcher. He has spent 15 years working as a software developer and systems architect. His projects have ranged from low-level Linux kernel driver development to performance optimization and the design of distributed applications working on thousands of servers.  With his areas of expertise including big data, machine learning, and large parallel distributed HPC and non-HPC systems, Maxim is able to explain complicated concepts using simple words and vivid examples. His current areas of interest are practical applications of deep learning, such as deep natural language processing and deep reinforcement learning. Maxim lives in Moscow, Russian Federation, with his family. 

0
0
26494

Summarizing Data with OpenAI ChatGPT

Data Cleaning Made Easy with ChatGPT

Responding to Generative AI from an Ethical Standpoint

Introduction to LLaMA

ChatGPT for Information Retrieval and Competitive Intelligence

Customize ChatGPT for Specific Tasks Using Effective Prompts – Shot Learning

4 Ways to Treat a Hallucinating AI with Prompt Engineering

Learning Essential Linux Commands for Navigating the Shell Effectively

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Understanding the Foundation of Protocol-oriented Design

Trending Topics

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Exploring the Strategy Behavioral Design Pattern in Node.js

Scientific Analysis of Donald Trump’s Tweets on COVID-19 with Transformers

Distributed training in TensorFlow 2.x

How to Create Tensors in PyTorch