Cleaning data
Let's move on to the 3-cleaning_data.ipynb
notebook for our discussion of data cleaning. As usual, we will begin by importing pandas
and reading in our data. For this section, we will be using the nyc_temperatures.csv
file, which contains the maximum daily temperature (TMAX
), minimum daily temperature (TMIN
), and the average daily temperature (TAVG
) from the LaGuardia Airport station in New York City for October 2018:
>>> import pandas as pd >>> df = pd.read_csv('data/nyc_temperatures.csv') >>> df.head()
We retrieved long format data from the API; for our analysis, we want wide format data, but we will address that in the Pivoting DataFrames section, later in this chapter:
For now, we will focus on making little tweaks to the data that will make it easier for us to use: renaming columns, converting each column into the most appropriate data type, sorting...