Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon

Getting Started with ChatGPT Advanced Data Analysis- Part 2

Save for later
  • 10 min read
  • 08 Nov 2023

article-image

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!

Introduction

ChatGPT Advanced Data Analysis is an invaluable tool that significantly speeds up the analysis and processing of our data and our files. In the first part of this post, we showcased how to use this feature to generate a CSV file containing randomly generated values. In addition to this, we demonstrated how to utilize different prompts in order to process data stored in files and generate visualizations. In this second part, we’ll build on top of what we learned already and work on more complex examples and scenarios.

If you’re looking for the link to the first part, here it is: Getting Started with ChatGPT Advanced Data Analysis- Part 1

That said, we will tackle the Example 03, Example 04, and Current Limitations sections in this post:

  • Example 01 — Generating a CSV file (discussed in Part 1)
  • Example 02 — Analyzing an uploaded CSV file, transforming the data, and generating charts  (discussed in Part 1)
  • Example 03 — Processing and analyzing an iPython notebook file
  • Example 04 — Processing and analyzing the contents of a ZIP file
  • Current Limitations

While working on the hands-on examples in this post, you’ll be surprised that ChatGPT Advanced Data Analysis is able to process and analyze various types of files such as Jupyter Notebook (.ipynb) and even ZIP files containing multiple files. As you dive into the features of ChatGPT Advanced Data Analysis, you'll discover that it serves as a valuable addition to your data analysis toolkit. This will help you unlock various ways to optimize your workflow and significantly accelerate data-driven decision-making.

Without further ado, let’s begin!

Example 03: Processing and analyzing an iPython notebook file

In this third example, we will process and analyze an iPython notebook file containing blocks of code for deploying machine learning models. Here, we’ll use an existing iPython notebook file I prepared while writing my 1st book “Machine Learning with Amazon SageMaker Cookbook”. If you're wondering what an iPython notebook file is, it's essentially a document produced by the Jupyter Notebook app, which contains both computer code (like Python) and rich text elements (paragraphs, equations, figures, links, etc.). These notebooks are both human-readable documents containing the analysis description and the results (like visualizations) as well as executable code that can be run to perform data analysis. It’s popular among data scientists and researchers for creating and sharing documents that contain live code, equations, visualizations, and narrative text.

Now that we have a better idea of what iPython notebook (.ipynb) files are, let’s proceed with our 3rd example:

STEP # 01: Open a new browser tab. Navigate to the following link and Download the .ipynb file to your local machine by clicking Download raw file:

https://github.com/PacktPublishing/Machine-Learning-with-Amazon-SageMaker-Cookbook/blob/master/Chapter09/03%20-%20Hosting%20multiple%20models%20with%20multi-model%20endpoints.ipynb

getting-started-with-chatgpt-advanced-data-analysis-part-2-img-0

Image 17 — Downloading the IPython Notebook .ipynb file

To download the file, simply click the download button highlighted in Image 17. This should download the .ipynb file to your local machine.

STEP # 02: Navigate back to the browser tab where you have your ChatGPT session open and create a new chat session by clicking + New Chat. Make sure to select Advanced Data Analysis under the list of options available under GPT-4:

getting-started-with-chatgpt-advanced-data-analysis-part-2-img-1

Image 18 — Using Advanced Data Analysis

STEP # 03: Upload the .ipynb file from your local machine to the new chat session and then run the following prompt:

What's the ML instance type used in the example?

This should yield the following response:

getting-started-with-chatgpt-advanced-data-analysis-part-2-img-2

Image 19 — Analyzing the uploaded file and identifying what ML instance type is used in the example

If you’re wondering what a machine learning (ML) instance is, you can think of it as a server or computer running specific machine learning workloads (such as training and serving machine learning models). Given that running these ML instances could be expensive, it’s best if we estimate the cost of running these instances!

STEP # 04: Next, run the following prompt to locate the block of code where the ML instance type is mentioned or used:

Print the code block(s) where this ML instance type is used
getting-started-with-chatgpt-advanced-data-analysis-part-2-img-3

Image 20 — Locating the block of code where the ML instance type is used

Cool, right? Here, we can see that ChatGPT can help us identify blocks of code using the right set of prompts. Make sure to verify the results and files produced by ChatGPT as you might find discrepancies or errors.

STEP # 05: Run the following prompt to update the ML instance used in the previous block of code:

Update the code block and use an ml.g5.2xlarge instead.
getting-started-with-chatgpt-advanced-data-analysis-part-2-img-4

Image 21 — Using ChatGPT to perform code modification instructions

Here, we can see that ChatGPT can easily perform code modification instructions as well. Note that ChatGPT is not limited to simply replacing certain portions of code blocks. It is also capable of generating code from scratch! In addition to this, it is capable of reading blocks of code as well.

STEP # 06: Run the following prompt to generate a chart comparing the estimated cost per month when running ml.t2.medium and ml.g5.2xlarge inference endpoint instances:

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
Generate a chart comparing the estimated cost per month when running an ml.t2.medium vs an ml.g5.2xlarge SageMaker inference endpoint instance
getting-started-with-chatgpt-advanced-data-analysis-part-2-img-5

Image 22 — Estimated monthly cost comparison of running an ml.t2.medium instance vs ml.g5.2xlarge instance

Make sure to always verify the results and files produced by ChatGPT as you might find discrepancies or errors.

Now, let’s proceed with our final example.

Example 04: Processing and analyzing the contents of a ZIP file

In this final example, we will compare the estimated cost per month of running the ML instances in each of the chapters of my book “Machine Learning with Amazon SageMaker Cookbook”. Of course, the assumption in this example is that we’ll be running the ML instances for an entire month. In reality, we’ll only be running these examples for a few seconds (to at most a few minutes).

STEP # 01: Navigate to the following link:

https://github.com/PacktPublishing/Machine-Learning-with-Amazon-SageMaker-Cookbook

STEP # 02:Click Code and then click Download ZIP.

getting-started-with-chatgpt-advanced-data-analysis-part-2-img-6

Image 23 — Downloading the ZIP file containing the files of the repository

This will download a ZIP file containing all the files inside the repository to your local machine.

STEP # 03: Create a new chat session by clicking + New Chat. Make sure to select Advanced Data Analysis under the list of options available under GPT-4:

getting-started-with-chatgpt-advanced-data-analysis-part-2-img-7

Image 24 — Choosing Advanced Data Analysis

STEP # 04: Upload the downloaded Zip file from an earlier step to the new chat session (using the + button). Enter the following prompt to compare the estimated cost per month associated with running each of the examples per chapter:

Analyze the contents of the ZIP file and perform the following:

- for each of the directories, identify the ML instance types used
- compare the estimated cost per month associated to running the ML instance types in the examples stored in each directory
- group the estimated cost per month per chapter directory

This should process the contents of the ZIP file we uploaded and yield the following response:

getting-started-with-chatgpt-advanced-data-analysis-part-2-img-8

Image 25 — Comparing the estimated cost per month per chapter

Wow, that seems expensive! Of course, we will NOT be running these resources for an entire month! In my first book “Machine Learning with Amazon SageMaker Cookbook”, the resources in each of the examples and recipes are only run for a few seconds to at most a few minutes and deleted almost right away. Since we only pay for what we use in Amazon Web Services (AWS), it should only cost a few dollars to complete all the examples in the book.

Note that this example can be further improved by utilizing and uploading a spreadsheet with the actual price per hour of each of these instances. In addition to this, it is important to note that there are other cost factors not taken into account in this example as only the cost of running the instances are included. That said, we should also take into account the cost associated with the storage costs associated with the storage volumes attached to the instances, as well as the estimated charges for using other cloud services and resources in the account.

STEP # 05: Finally, run the following prompt to compare the estimated monthly cost per chapter when running the examples of the book:

Generate a bar chart comparing the estimated monthly cost per chapter

This should yield the following response:

getting-started-with-chatgpt-advanced-data-analysis-part-2-img-9

Image 26 — Bar chart comparing the estimated monthly cost for running the ML instance types per chapter

Cool, right? Here, we can see a bar chart that helps us compare the estimated monthly cost of running the examples in each chapter. Again, this is just for demonstration purposes as we will only be running the ML instances for a few seconds to at most a few minutes. This would mean that the actual cost would only be a tiny fraction of the overall monthly cost. In addition to this, it is important to note that there are other cost factors not taken into account in this example as only the cost of running the instances are included. Given that we’re just demonstrating the power of ChatGPT Advanced Data Analysis in this post, this simplified example should do the trick! Finally, make sure to always verify the results and files produced by ChatGPT as you might find discrepancies or errors.

Current Limitations

Before we end this tutorial, it is essential that we mention some of the current limitations (as of writing) when using ChatGPT Advanced Data Analysis. First, there is a file size limitation which restricts users to only uploading files up to a maximum size of 500 MB per file. This could affect those trying to analyze large datasets since they’ll be forced to divide larger files into smaller portions. In addition to this, ChatGPT retains uploaded files only during the active conversation and for an additional three hours after the conversation has been paused. Files are automatically deleted which would require users to re-upload the files to continue the analysis. Finally, we need to be aware that the execution of the instructions is done inside a sandboxed environment. This means that we are currently unable to have external integrations and perform real-time searches. Given that ChatGPT Advanced Data Analysis is still an experimental feature (that is, in Beta mode), there may still be a few limitations and issues being resolved behind the scenes. Of course, by the time you read this post, it may no longer be in Beta!

That’s pretty much it. At this point, you should have a great idea on what you can accomplish using ChatGPT Advanced Data Analysis. Feel free to try different prompts and experiment with various scenarios to help you discover innovative ways to visualize and interpret your data for better decision-making.

Author Bio

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO of 3 Australian-owned companies and also served as the Director for Software Development and Engineering for multiple e-commerce startups in the past. Years ago, he and his team won 1st place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and he has been sharing his knowledge in several international conferences to discuss practical strategies on machine learning, engineering, security, and management. He is also the author of the books "Machine Learning with Amazon SageMaker Cookbook", "Machine Learning Engineering on AWS", and "Building and Automating Penetration Testing Labs in the Cloud". Due to his proven track record in leading digital transformation within organizations, he has been recognized as one of the prestigious Orange Boomerang: Digital Leader of the Year 2023 award winners.