Let's move on to the 3-cleaning_data.ipynb notebook for our discussion on data cleaning. We will begin by importing pandas and reading in the data/nyc_temperatures.csv file, which contains the maximum daily temperature (TMAX), minimum daily temperature (TMIN), and the average daily temperature (TAVG) from the LaGuardia airport station in New York City for October 2018:
>>> import pandas as pd
>>> df = pd.read_csv('data/nyc_temperatures.csv')
>>> df.head()
The data we retrieved from the API is in the long format; for our analysis, we want it in the wide format, but we will address that in the Pivoting DataFrames section later this chapter:
attributes | datatype | date | station | value | |
---|---|---|---|---|---|
0 | H,,S, | TAVG | 2018-10-01T00:00:00 | GHCND:USW00014732 | 21.2 |
1 | ,,W,2400 | TMAX | 2018-10-01T00:00:00 | GHCND:USW00014732 | 25.6 |
2 | ,,W,2400 | TMIN... |