Chapter 02: Statistical Visualizations Using Matplotlib and Seaborn
Activity 4: Line Graphs with the Object-Oriented API and Pandas DataFrames
Import the required libraries in the Jupyter notebook and read the dataset from the Auto-MPG dataset repository:
%matplotlib inline import matplotlib as mpl import matplotlib.pyplot as pltimport numpy as np import pandas as pd url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data" df = pd.read_csv(url)
Provide the column names to simplify the dataset, as illustrated here:
column_names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'year', 'origin', 'name']
Now read the new dataset with column names and display it:
df = pd.read_csv(url, names= column_names, delim_whitespace=True) df.head()
The plot is as follows:
Convert the horsepower and year data types to float and integer using the following command:
df.loc[df.horsepower == '?', 'horsepower'] = np.nan df['horsepower'] = pd.to_numeric(df['horsepower']) df['full_date'] = pd.to_datetime(df.year, format='%y') df['year'] = df['full_date'].dt.year
Let's display the data types:
df.dtypes
The output is as follows:
Now plot the average horsepower per year using the following command:
df.groupby('year')['horsepower'].mean().plot()
The output is as follows:
Activity 5: Understanding Relationships of Variables Using Scatter Plots
Import the required libraries into the Jupyter notebook and read the dataset from the Auto-MPG dataset repository:
%matplotlib inline import pandas as pd import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt import seaborn as sns url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data" df = pd.read_csv(url)
Provide the column names to simplify the dataset as illustrated here:
column_names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'year', 'origin', 'name']
Now read the new dataset with column names and display it:
df = pd.read_csv(url, names= column_names, delim_whitespace=True) df.head()
The plot is as follows:
Now plot the scatter plot using the scatter method:
fig, ax = plt.subplots() ax.scatter(x = df['horsepower'], y=df['weight'])
The output will be as follows:
Activity 6: Exporting a Graph to a File on Disk
Import the required libraries in the Jupyter notebook and read the dataset from the Auto-MPG dataset repository:
%matplotlib inline import pandas as pd import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt import seaborn as sns url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data" df = pd.read_csv(url)
Provide the column names to simplify the dataset, as illustrated here:
column_names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'year', 'origin', 'name']
Now read the new dataset with column names and display it:
df = pd.read_csv(url, names= column_names, delim_whitespace=True)
Create a bar plot using the following command:
fig, ax = plt.subplots() df.weight.plot(kind='hist', ax=ax)
The output is as follows:
Export it to a PNG file using the savefig function:
fig.savefig('weight_hist.png')
Activity 7: Complete Plot Design
Import the required libraries in the Jupyter notebook and read the dataset from the Auto-MPG dataset repository:
%matplotlib inline import pandas as pd import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt import seaborn as sns url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data" df = pd.read_csv(url)
Provide the column names to simplify the dataset, as illustrated here:
column_names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'year', 'origin', 'name']
Now read the new dataset with column names and display it:
df = pd.read_csv(url, names= column_names, delim_whitespace=True)
Perform GroupBy on year and cylinders, and unset the option to use them as indexes:
df_g = df.groupby(['year', 'cylinders'], as_index=False)
Calculate the average miles per gallon over the grouping and set year as index:
df_g = df_g.mpg.mean()
Set year as the DataFrame index:
df_g = df_g.set_index(df_g.year)
Create the figure and axes using the object-oriented API:
import matplotlib.pyplot as plt fig, axes = plt.subplots()
Group the df_g dataset by cylinders and plot the miles per gallon variable using the axes created with size (10,8):
df = df.convert_objects(convert_numeric=True) df_g = df.groupby(['year', 'cylinders'], as_index=False).horsepower.mean() df_g = df_g.set_index(df_g.year)
Set the title, x label, and y label on the axes:
fig, axes = plt.subplots() df_g.groupby('cylinders').horsepower.plot(axes=axes, figsize=(12,10)) _ = axes.set( title="Average car power per year", xlabel="Year", ylabel="Power (horsepower)" )
The output is as follows:
Include legends, as follows:
axes.legend(title='Cylinders', fancybox=True)
Save the figure to disk as a PNG file:
fig.savefig('mpg_cylinder_year.png')