Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Power BI Machine Learning and OpenAI

You're reading from   Power BI Machine Learning and OpenAI Explore data through business intelligence, predictive analytics, and text generation

Arrow left icon
Product type Paperback
Published in May 2023
Publisher Packt
ISBN-13 9781837636150
Length 308 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Greg Beaumont Greg Beaumont
Author Profile Icon Greg Beaumont
Greg Beaumont
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Part 1: Data Exploration and Preparation
2. Chapter 1: Requirements, Data Modeling, and Planning FREE CHAPTER 3. Chapter 2: Preparing and Ingesting Data with Power Query 4. Chapter 3: Exploring Data Using Power BI and Creating a Semantic Model 5. Chapter 4: Model Data for Machine Learning in Power BI 6. Part 2: Artificial Intelligence and Machine Learning Visuals and Publishing to the Power BI Service
7. Chapter 5: Discovering Features Using Analytics and AI Visuals 8. Chapter 6: Discovering New Features Using R and Python Visuals 9. Chapter 7: Deploying Data Ingestion and Transformation Components to the Power BI Cloud Service 10. Part 3: Machine Learning in Power BI
11. Chapter 8: Building Machine Learning Models with Power BI 12. Chapter 9: Evaluating Trained and Tested ML Models 13. Chapter 10: Iterating Power BI ML models 14. Chapter 11: Applying Power BI ML Models 15. Part 4: Integrating OpenAI with Power BI
16. Chapter 12: Use Cases for OpenAI 17. Chapter 13: Using OpenAI and Azure OpenAI in Power BI Dataflows 18. Chapter 14: Project Review and Looking Forward 19. Index 20. Other Books You May Enjoy

Reviewing the source data

You begin your journey by digging into the source data that you will be using for your project. Let’s get started!

Accessing the data

The source data that you will be using for this book is real data from the United States FAA. The data contains reports filed when aircraft struck wildlife. There is a website providing details, documentation, updates, and access instructions at this URL: https://wildlife.faa.gov/home. The URL (and all URLs) will also be linked from the affiliated GitHub site at https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI in case changes are made after this book has been published.

If you’d prefer to follow along using the finished version of the content from this chapter rather than building it all step by step, you can download the PBIT version of the file at the Packt GitHub site folder for Chapter 1: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-01.

Within the FAA Wildlife Strike Database website, you can navigate to this link and run basic queries against the data and familiarize yourself with the content: https://wildlife.faa.gov/search. There are also two files linked from this page that you can reference while reviewing the source data.

At the time of this book’s writing, the second heading on the web page is titled Download the FAA Wildlife Strike Database, and it has a link titled Download that allows you to download the entire historical database along with a reference file. You can download the files from the FAA site for the purposes of this book. There will also be a Power BI PBIT file containing the results of the efforts of this chapter at the GitHub repository. A PBIT file is a Power BI template that can be populated with the files that you download from the Packt GitHub site. If the FAA data ever becomes unavailable, you can still proceed with the contents from the GitHub site to recreate the contents of every chapter.

The files you will be using from the FAA are as follows:

Exploring the FAA Wildlife Strike report data

The wildlife.accdb file is in an Access file format that can be opened with many different tools including Microsoft Access, Microsoft Excel, Power BI, and many more. For the purpose of this book, you will open it using Power BI Desktop. Power BI Desktop is available as a free download at this link: https://powerbi.microsoft.com/en-us/downloads/.

  1. First, open up Power BI Desktop. Once it is open on your desktop, select the Get data drop-down menu from the ribbon and click on More… as shown in the following screenshot:
Figure 1.1 – Connecting to data with Power BI Desktop

Figure 1.1 – Connecting to data with Power BI Desktop

  1. Next, within the Get data window, select Access database and click Connect:
Figure 1.2 – Access database connector in Power BI

Figure 1.2 – Access database connector in Power BI

  1. Select the Access database file that was downloaded and unzipped from the FAA Wildlife Strike Database, named wildlife:
Figure 1.3 – The wildlife file shows up in Power BI

Figure 1.3 – The wildlife file shows up in Power BI

  1. Select the STRIKE_REPORTS table and click Transform Data:
Figure 1.4 – Preview of the data before making transformations

Figure 1.4 – Preview of the data before making transformations

  1. The Power Query window will open in Power BI Desktop with a preview of the FAA Wildlife Strike data. On the ribbon, select the View header for Data Preview, and then check the boxes for Column quality, Column distribution, and Column profile. These features will provide some insights for the data preview, that helps you explore and understand the data:
Figure 1.5 – Data Preview features in Power Query

Figure 1.5 – Data Preview features in Power Query

In Figure 1.5, notice that the first column, INDEX_NR, is highlighted. You can see that none of the values are empty, none have errors, and in Column statistics at the bottom of Figure 1.6, every value is a unique integer. The name INDEX_NR gives it away, but this column is the unique identifier for each row of data.

Let’s review another column in Power Query. Go ahead and highlight TIME_OF_DAY. As you can see in Figure 1.6, there are four distinct values and about 12% are blank. Blank values are an important consideration for this solution. Non-empty values include terms such as Day, Dawn, Dusk, and Night. What does an empty value mean? Was the field left blank by the person filing the report? Was it not entered properly into the system? You’ll revisit this topic later in the book.

Figure 1.6 – Column statistics help with understanding data

Figure 1.6 – Column statistics help with understanding data

Since there are over 100 columns in the FAA Wildlife Strike reports’ data, we won’t discuss all of them in this chapter. That being said, reviewing each and every column would be a great way to review the data for errors, empty fields, distribution of values, and more. For the purposes of this chapter, go ahead and open up the read_me.xls file that was included with the ZIP file from the FAA. The first sheet is Column Name and contains the names and descriptive data about the columns in the wildlife.accdb file. Most of the columns fall into one of the following categories:

  • Date and time fields detailing the dates, times, and years for different events related to each report
  • Descriptive information about the event such as height of contact, latitudes and longitudes, originating airports, and flight numbers
  • Descriptive information about the aircraft such as ownership, aircraft type and manufacturer, number of engines, location of engines, and so on
  • Estimates of the damage due to the strike such as costs, costs adjusted for inflation, damage location on the aircraft, and more
  • Information about the wildlife struck by the aircraft including species, size, quantities hit, and so on

Once you’ve finished browsing the report data, close the read_me.xlsx document on your desktop, and then connect to it from Power BI per the following steps. The document version used in this book can be downloaded from the Packt GitHub site here: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-01.

  1. Click on Excel Workbook in the left-hand panel:
Figure 1.7 – Excel Workbook is a new source of data

Figure 1.7 – Excel Workbook is a new source of data

  1. Select the read_me file from the browser and click Open:
Figure 1.8 – Excel file ready to open in Power Query

Figure 1.8 – Excel file ready to open in Power Query

  1. Tick the Aircraft Type, Engine Codes, and Engine Position boxes. Then, click OK.
Figure 1.9 – Sheets in Power Query can be individually selected

Figure 1.9 – Sheets in Power Query can be individually selected

After clicking OK and importing the three sheets, notice that Aircraft Type, Engine Codes, and Engine Position are now available in Power Query as three separate tables of data:

Figure 1.10 – Three new tables are previewed in Power Query

Figure 1.10 – Three new tables are previewed in Power Query

The three tables contain descriptive information about values that exist in the FAA Wildlife Strike reports’ data:

  • Aircraft Type: A table that maps the Aircraft Code to a description such as Airplane, Helicopter, or Glider
  • Engine Codes: Information about engine manufacturer and model numbers
  • Engine Position: Details about the location of an engine on the aircraft

For all three of these tables, you’ll notice that there are some unnecessary rows and blank values. You will address these later in the book, so there is no need to make any modifications in Power Query at this time.

Once you’ve browsed the different columns from all the different tables in Power Query, click Close & Apply to import the data into Power BI and save it, per the following diagram:

Figure 1.11 – The Close & Apply button will import data into Power BI

Figure 1.11 – The Close & Apply button will import data into Power BI

Once the data is imported, you can save your Power BI Desktop file as a .pbix file. A copy of the PBIT file named Chapter 1 Template.pbit, which can be populated with the data and then saved as a PBIX, can be found at this GitHub link: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-01.

You have been reading a chapter from
Power BI Machine Learning and OpenAI
Published in: May 2023
Publisher: Packt
ISBN-13: 9781837636150
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image