Reviewing the source data
You begin your journey by digging into the source data that you will be using for your project. Let’s get started!
Accessing the data
The source data that you will be using for this book is real data from the United States FAA. The data contains reports filed when aircraft struck wildlife. There is a website providing details, documentation, updates, and access instructions at this URL: https://wildlife.faa.gov/home. The URL (and all URLs) will also be linked from the affiliated GitHub site at https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI in case changes are made after this book has been published.
If you’d prefer to follow along using the finished version of the content from this chapter rather than building it all step by step, you can download the PBIT version of the file at the Packt GitHub site folder for Chapter 1: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-01.
Within the FAA Wildlife Strike Database website, you can navigate to this link and run basic queries against the data and familiarize yourself with the content: https://wildlife.faa.gov/search. There are also two files linked from this page that you can reference while reviewing the source data.
At the time of this book’s writing, the second heading on the web page is titled Download the FAA Wildlife Strike Database, and it has a link titled Download that allows you to download the entire historical database along with a reference file. You can download the files from the FAA site for the purposes of this book. There will also be a Power BI PBIT file containing the results of the efforts of this chapter at the GitHub repository. A PBIT file is a Power BI template that can be populated with the files that you download from the Packt GitHub site. If the FAA data ever becomes unavailable, you can still proceed with the contents from the GitHub site to recreate the contents of every chapter.
The files you will be using from the FAA are as follows:
wildlife.accdb
: This contains all of the historical FAA Wildlife Strike reports. You can also download a copy of the file that is identical to the book from the Packt GitHub site: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-01.read_me.xls
: This contains descriptive information about the data in thewildlife.accdb
database file. An.xlsx
version of the file is available on the Packt GitHub site, too.
Exploring the FAA Wildlife Strike report data
The wildlife.accdb
file is in an Access file format that can be opened with many different tools including Microsoft Access, Microsoft Excel, Power BI, and many more. For the purpose of this book, you will open it using Power BI Desktop. Power BI Desktop is available as a free download at this link: https://powerbi.microsoft.com/en-us/downloads/.
- First, open up Power BI Desktop. Once it is open on your desktop, select the Get data drop-down menu from the ribbon and click on More… as shown in the following screenshot:
Figure 1.1 – Connecting to data with Power BI Desktop
- Next, within the Get data window, select Access database and click Connect:
Figure 1.2 – Access database connector in Power BI
- Select the Access database file that was downloaded and unzipped from the FAA Wildlife Strike Database, named wildlife:
Figure 1.3 – The wildlife file shows up in Power BI
- Select the STRIKE_REPORTS table and click Transform Data:
Figure 1.4 – Preview of the data before making transformations
- The Power Query window will open in Power BI Desktop with a preview of the FAA Wildlife Strike data. On the ribbon, select the View header for Data Preview, and then check the boxes for Column quality, Column distribution, and Column profile. These features will provide some insights for the data preview, that helps you explore and understand the data:
Figure 1.5 – Data Preview features in Power Query
In Figure 1.5, notice that the first column, INDEX_NR, is highlighted. You can see that none of the values are empty, none have errors, and in Column statistics at the bottom of Figure 1.6, every value is a unique integer. The name INDEX_NR gives it away, but this column is the unique identifier for each row of data.
Let’s review another column in Power Query. Go ahead and highlight TIME_OF_DAY. As you can see in Figure 1.6, there are four distinct values and about 12% are blank. Blank values are an important consideration for this solution. Non-empty values include terms such as Day, Dawn, Dusk, and Night. What does an empty value mean? Was the field left blank by the person filing the report? Was it not entered properly into the system? You’ll revisit this topic later in the book.
Figure 1.6 – Column statistics help with understanding data
Since there are over 100 columns in the FAA Wildlife Strike reports’ data, we won’t discuss all of them in this chapter. That being said, reviewing each and every column would be a great way to review the data for errors, empty fields, distribution of values, and more. For the purposes of this chapter, go ahead and open up the read_me.xls
file that was included with the ZIP file from the FAA. The first sheet is Column Name and contains the names and descriptive data about the columns in the wildlife.accdb
file. Most of the columns fall into one of the following categories:
- Date and time fields detailing the dates, times, and years for different events related to each report
- Descriptive information about the event such as height of contact, latitudes and longitudes, originating airports, and flight numbers
- Descriptive information about the aircraft such as ownership, aircraft type and manufacturer, number of engines, location of engines, and so on
- Estimates of the damage due to the strike such as costs, costs adjusted for inflation, damage location on the aircraft, and more
- Information about the wildlife struck by the aircraft including species, size, quantities hit, and so on
Once you’ve finished browsing the report data, close the read_me.xlsx
document on your desktop, and then connect to it from Power BI per the following steps. The document version used in this book can be downloaded from the Packt GitHub site here: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-01.
- Click on Excel Workbook in the left-hand panel:
Figure 1.7 – Excel Workbook is a new source of data
- Select the read_me file from the browser and click Open:
Figure 1.8 – Excel file ready to open in Power Query
- Tick the Aircraft Type, Engine Codes, and Engine Position boxes. Then, click OK.
Figure 1.9 – Sheets in Power Query can be individually selected
After clicking OK and importing the three sheets, notice that Aircraft Type, Engine Codes, and Engine Position are now available in Power Query as three separate tables of data:
Figure 1.10 – Three new tables are previewed in Power Query
The three tables contain descriptive information about values that exist in the FAA Wildlife Strike reports’ data:
- Aircraft Type: A table that maps the Aircraft Code to a description such as Airplane, Helicopter, or Glider
- Engine Codes: Information about engine manufacturer and model numbers
- Engine Position: Details about the location of an engine on the aircraft
For all three of these tables, you’ll notice that there are some unnecessary rows and blank values. You will address these later in the book, so there is no need to make any modifications in Power Query at this time.
Once you’ve browsed the different columns from all the different tables in Power Query, click Close & Apply to import the data into Power BI and save it, per the following diagram:
Figure 1.11 – The Close & Apply button will import data into Power BI
Once the data is imported, you can save your Power BI Desktop file as a .pbix
file. A copy of the PBIT file named Chapter 1 Template.pbit
, which can be populated with the data and then saved as a PBIX, can be found at this GitHub link: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-01.