This article is an excerpt from the book, Machine Learning with Microsoft Power BI, by Greg Beaumont. This book is designed for data scientists and BI professionals seeking to improve their existing solutions and workloads using AI.
Data description generation plays a vital role in understanding complex datasets, but it can be a time-consuming task. Enter ChatGPT, an advanced AI language model developed by OpenAI. Trained on extensive text data, ChatGPT demonstrates impressive capabilities in understanding and generating human-like responses. In this article, we explore how ChatGPT can revolutionize data analysis by expediting the creation of accurate and coherent data descriptions. We delve into its training process, architecture, and potential applications in fields like research, journalism, and business analytics. While acknowledging limitations, we unveil the transformative potential of ChatGPT for data interpretation and knowledge dissemination.
Our first step will be to identify a suitable use case for leveraging the power of GPT models to generate descriptions of elements of FAA Wildlife Strike data. Our objective is to unlock the potential of external data by creating prompts for GPT models that can provide detailed information and insights about the data we are working with. Through this use case, we will explore the value that GPT models can bring to the table when it comes to data analysis and interpretation.
For example, a description of the FAA Wildlife Strike Database by ChatGPT might look like this:
Figure 1 – OpenAI ChatGPT description of FAA Wildlife Strike Database
Within your solution using the FAA Wildlife Strike database, you have data that could be tied to external data using the GPT models. A few examples include additional information about:
When the scoring process for a large number of separate rows in a dataset is automated, we can use a GPT model to generate descriptive text for each individual row. It is worth noting that ChatGPT's approach varies from this, as it operates as a chatbot that calls upon different GPT models and integrates past conversations into future answers. Despite the differences in how GPT models will be used in the solution, ChatGPT can still serve as a valuable tool for testing various use cases.
When using GPT models, the natural language prompts that are used to ask questions and give instructions will impact the context of the generated text. Prompt engineering is a topic that has surged in popularity for OpenAI and LLMs. The following prompts will provide different answers when using “dogs” as a topic for a GPT query:
When planning for your use of OpenAI on large volumes of data, you should test and evaluate your prompt engineering strategy. For this book, the use cases will be kept simple since the goal is to teach tool integration with Power BI. Prompt engineering expertise will probably be the topic of many books and blogs this year. You can test different requests for a description of an FAA Region in the data:
Figure 2 – Testing the utility of describing an FAA Region using OpenAI ChatGPT
You can also combine different data elements for a more detailed description. The following example combines data fields with a question to ask “Tell me about the Species in State in Month”:
Figure 3 – Using ChatGPT to test a combination of data about Species, State, and Month
There are many different options to consider. To combine a few fields of data and provide useful context about the data, you decide to plan a use case for describing the aircraft and operator. An example can be tested with the following formula in OpenAI ChatGPT such as “Tell me about the airplane model Aircraft operated by the Operator in three sentences." Here is an example using data from a single row of the FAA Wildlife Strike database:
Figure 4 – Information about an airplane in the fleet of an operator as described by OpenAI ChatGPT
From a prompt engineering perspective, asking this question for multiple reports in the FAA Wildlife Strike database would require running the following natural language query on each row of data (column names are depicted with brackets):
Tell me about the airplane model [Aircraft] operated by [Operator] in three sentences:
This article explores how ChatGPT expedites the generation of accurate and coherent data descriptions. Unveiling its training process, architecture, and applications in research, journalism, and business analytics, we showcase how ChatGPT revolutionizes data interpretation and knowledge dissemination. Acknowledging limitations, we highlight the transformative power of this AI technology in enhancing data analysis and decision-making.
Greg Beaumont is a Data Architect at Microsoft; Greg is an expert in solving complex problems and creating value for customers. With a focus on the healthcare industry, Greg works closely with customers to plan enterprise analytics strategies, evaluate new tools and products, conduct training sessions and hackathons, and architect solutions that improve the quality of care and reduce costs. With years of experience in data architecture and a passion for innovation, Greg has a unique ability to identify and solve complex challenges. He is a trusted advisor to his customers and is always seeking new ways to drive progress and help organizations thrive. For more than 15 years, Greg has worked with healthcare customers who strive to improve patient outcomes and find opportunities for efficiencies. He is a veteran of the Microsoft data speaker network and has worked with hundreds of customers on their data management and analytics strategies.
You can follow Greg on his LinkedIn