Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Summarizing Data with OpenAI ChatGPT

Save for later
  • 4 min read
  • 02 Jun 2023

article-image

This article is an excerpt from the book, Machine Learning with Microsoft Power BI, by Greg Beaumont. This book is designed for data scientists and BI professionals seeking to improve their existing solutions and workloads using AI.

 

In the ever-expanding landscape of data analysis, the ability to summarize vast amounts of information concisely and accurately is invaluable. Enter ChatGPT, an advanced AI language model developed by OpenAI. In this article, we delve into the realm of data summarization with ChatGPT, exploring how this powerful tool can revolutionize the process of distilling complex datasets into concise and informative summaries.

Numerous databases feature free text fields that comprise entries from a diverse array of sources, including survey results, physician notes, feedback forms, and comments regarding incident reports for the FAA Wildlife Strike database that we have used in this book. These text entry fields represent a wide range of content, from structured data to unstructured data, making it challenging to extract meaning from them without the assistance of sophisticated natural language processing tools.

 

The Remarks field of the FAA Wildlife Strike database contains text that was presumably entered by people involved in filling out the incident form about an aircraft striking wildlife. A few examples of the remarks for recent entries are shown in Power BI in the following screenshot:

 

summarizing-data-with-openai-chatgpt-img-0

Figure 1 – Examples of Remarks from the FAA Wildlife Strike Database

 

You will notice that the remarks have a great deal of variability in the format of the content, the length of the content, and the acronyms that were used. Testing one of the entries by simply adding a statement at the beginning to “Summarize the following:” yields the following result:

 

summarizing-data-with-openai-chatgpt-img-1

Figure 2 – Summarizing the remarks for a single incident using ChatGPT

 

Summarizing data for a less detailed Remarks field yields the following results:

 

summarizing-data-with-openai-chatgpt-img-2

Figure 3 – Summarization of a sparsely populated results field

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime

 

In order to obtain uniform summaries from the FAA Wildlife Strike data's Remarks field, one must consider entries that vary in robustness, sparsity, completeness of sentences, and the presence of acronyms and quick notes. The workshop accompanying this technical book is your chance to experiment with various data fields and explore diverse outcomes. Both the book and the Packt GitHub site will utilize a standardized format as input to a GPT model that can incorporate event data and produce a consistent summary for each row. An example of the format is as follows: 

 

Summarize the following in three sentences: A [Operator] [Aircraft] struck a [Species]. Remarks on the FAA report were: [Remarks].

 

Using data from an FAA Wildlife Strike Database event to test this approach in OpenAI ChatGPT is shown in the following screenshot:

 

summarizing-data-with-openai-chatgpt-img-3

Figure 4 – OpenAI ChatGPT testing a summarization of the remarks field

 

Next, you test another scenario that had more robust text in the Remarks field:

 

summarizing-data-with-openai-chatgpt-img-4

Figure 5 – Another scenario with robust remarks tested using OpenAI ChatGPT

 

Summary

This article explores how ChatGPT can revolutionize the process of condensing complex datasets into concise and informative summaries. By leveraging its powerful language generation capabilities, ChatGPT enables researchers, analysts, and decision-makers to quickly extract key insights and make informed decisions. Dive into the world of data summarization with ChatGPT and unlock new possibilities for efficient data analysis and knowledge extraction.

 

Author Bio:

Greg Beaumont is a Data Architect at Microsoft; Greg is an expert in solving complex problems and creating value for customers. With a focus on the healthcare industry, Greg works closely with customers to plan enterprise analytics strategies, evaluate new tools and products, conduct training sessions and hackathons, and architect solutions that improve the quality of care and reduce costs. With years of experience in data architecture and a passion for innovation, Greg has a unique ability to identify and solve complex challenges. He is a trusted advisor to his customers and is always seeking new ways to drive progress and help organizations thrive. For more than 15 years, Greg has worked with healthcare customers who strive to improve patient outcomes and find opportunities for efficiencies. He is a veteran of the Microsoft data speaker network and has worked with hundreds of customers on their data management and analytics strategies.