A mini project on AWS Data Lake
In this section, we will build a completely new data lake using the AWS Data Lake solution. We will first review the business use case; then we will build the cluster, ingest data, and process the data; finally, we will analyze it using QuickSight.
Mini use case business context
For this mini use case, we are going to analyze air quality data from various states in the USA and see whether there is any relationship between population trends and air quality over time. Let's review the source datasets for this project.
Air quality index
The Environmental Protection Agency (EPA) calculates the air quality index based on the concentration of pollutants. The following are the key measures tracked to determine air quality:
Ozone (O3)
Carbon monoxide (CO)
Sulfur dioxide (SO2)
Nitrogen dioxide (NO2)
Inhalable particulates (PM10)
Fine particulates (PM2.5)
Lead (Pb)
To make analysis easier, EPA tracks the number of days in a year with good air quality for every major city in USA...