Performing exploratory data analysis in Python
Before you can clean your data, you need to know what your data looks like. As a data engineer, you are not the domain expert and are not the end user of the data, but you should know what the data will be used for and what valid data would look like. For example, you do not need to be a demographer to know that an age
field should not be negative, and the frequency of values over 100 should be low.
Downloading the data
In this chapter, you will use real e-scooter data from the City of Albuquerque. The data contains trips taken using e-scooters from May to July 22, 2019. You will need to download the e-scooter data from https://github.com/PaulCrickard/escooter/blob/master/scooter.csv. The repository also contains the original Excel file as well as some other summary files provided by the City of Albuquerque.
Basic data exploration
Before you can clean your data, you have to know what your data looks like. The process of understanding...