Tweaking Plot Parameters
Looking at the last figure in our previous section, we find that the legend is not appropriately placed. We can tweak the plot parameters to adjust the placements of the legends and the axis labels, as well as change the font-size and rotation of the tick labels.
Exercise 11: Tweaking the Plot Parameters of a Grouped Bar Plot
In this exercise, we'll tweak the plot parameters, for example, hue
, of a grouped bar plot. We'll see how to place legends and axis labels in the right places and also explore the rotation feature:
- Import the necessary modules—in this case, only
seaborn
:#Import seaborn import seaborn as sns
- Load the dataset:
diamonds_df = sns.load_dataset('diamonds')
- Use the
hue
parameter to plot nested groups:ax = sns.barplot(x="cut", y="price", hue='color', data=diamonds_df)
The output is as follows:
Figure 1.26: Nested bar plot with the hue parameter
- Place the legend appropriately on the bar plot:
ax = sns.barplot(x='cut', y='price', hue='color', data=diamonds_df) ax.legend(loc='upper right',ncol=4)
The output is as follows:
Figure 1.27: Grouped bar plot with legends placed appropriately
In the preceding
ax.legend()
call, thencol
parameter denotes the number of columns into which values in the legend are to be organized, and theloc
parameter specifies the location of the legend and can take any one of eight values (upper left, lower center, and so on). - To modify the axis labels on the x axis and y axis, input the following code:
ax = sns.barplot(x='cut', y='price', hue='color', data=diamonds_df) ax.legend(loc='upper right', ncol=4) ax.set_xlabel('Cut', fontdict={'fontsize' : 15}) ax.set_ylabel('Price', fontdict={'fontsize' : 15})
The output is as follows:
Figure 1.28: Grouped bar plot with modified labels
- Similarly, use this to modify the font-size and rotation of the x axis of the tick labels:
ax = sns.barplot(x='cut', y='price', hue='color', data=diamonds_df) ax.legend(loc='upper right',ncol=4) # set fontsize and rotation of x-axis tick labels ax.set_xticklabels(ax.get_xticklabels(), fontsize=13, rotation=30)
The output is as follows:
Figure 1.29: Grouped bar plot with the rotation feature of the labels
The rotation feature is particularly useful when the tick labels are long and crowd up together on the x axis.
Annotations
Another useful feature to have in plots is the annotation feature. In the following exercise, we'll make a simple bar plot more informative by adding some annotations.Suppose we want to add more information to the plot about ideally cut diamonds. We can do this in the following exercise:
Exercise 12: Annotating a Bar Plot
In this exercise, we will annotate a bar plot, generated using the catplot
function of seaborn
, using a note right above the plot. Let's see how:
- Import the necessary modules:
import matplotlib.pyplot as plt import seaborn as sns
- Load the
diamonds
dataset:diamonds_df = sns.load_dataset('diamonds')
- Generate a bar plot using
catplot
function of theseaborn
library:ax = sns.catplot("cut", data=diamonds_df, aspect=1.5, kind="count", color="b")
The output is as follows:
Figure 1.30: Bar plot with seaborn's catplot function
- Annotate the column belonging to the
Ideal
category:# get records in the DataFrame corresponding to ideal cut ideal_group = diamonds_df.loc[diamonds_df['cut']=='Ideal']
- Find the location of the x coordinate where the annotation has to be placed:
# get the location of x coordinate where the annotation has to be placed x = ideal_group.index.tolist()[0]
- Find the location of the y coordinate where the annotation has to be placed:
# get the location of y coordinate where the annotation has to be placed y = len(ideal_group)
- Print the location of the x and y co-ordinates:
print(x) print(y)
The output is:
0 21551
- Annotate the plot with a note:
# annotate the plot with any note or extra information sns.catplot("cut", data=diamonds_df, aspect=1.5, kind="count", color="b") plt.annotate('excellent polish and symmetry ratings;\nreflects almost all the light that enters it', xy=(x,y), xytext=(x+0.3, y+2000), arrowprops=dict(facecolor='red'))
The output is as follows:
Figure 1.31: Annotated bar plot
Now, there seem to be a lot of parameters in the annotate
function, but worry not! Matplotlib's https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.annotate.html official documentation covers all the details. For instance, the xy
parameter denotes the point (x,y
) on the figure to annotate. xytext
denotes the position (x,y) to place the text at. If None
, it defaults to xy
. Note that we added an offset of .3
for x and 2000
for y (since y is close to 20,000
) for the sake of readability of the text. The color of the arrow is specified using the arrowprops
parameter in the annotate
function.
There are several other bells and whistles associated with visualization libraries in Python, some of which we will see as we progress in the book. At this stage, we will go through a chapter activity to revise the concepts in this chapter.
So far, we have seen how to generate two simple plots using seaborn
and pandas
—histograms and bar plots:
- Histograms: Histograms are useful for understanding the statistical distribution of a numerical feature in a given dataset. They can be generated using the
hist()
function inpandas
anddistplot()
inseaborn
. - Bar plots: Bar plots are useful for gaining insight into the values taken by a categorical feature in a given dataset. They can be generated using the
plot(kind='bar'
) function inpandas
and thecatplot(kind='count')
, andbarplot()
functions inseaborn
.
With the help of various considerations arising in the process of plotting these two types of visualizations, we presented some basic concepts in data visualization:
- Formatting legends to present labels for different elements in the plot with
loc
and other parameters in thelegend
function - Changing the properties of tick labels, such as font-size, and rotation, with parameters in the
set_xticklabels()
andset_yticklabels()
functions - Adding annotations for additional information with the
annotate()
function
Activity 1: Analyzing Different Scenarios and Generating the Appropriate Visualization
We'll be working with the 120 years of
Olympic History
dataset acquired by Randi Griffin from https://www.sports-reference.com/ and made available on the GitHub repository of this book. Your assignment is to identify the top five sports based on the largest number of medals awarded in the year 2016, and then perform the following analysis:
- Generate a plot indicating the number of medals awarded in each of the top five sports in 2016.
- Plot a graph depicting the distribution of the age of medal winners in the top five sports in 2016.
- Find out which national teams won the largest number of medals in the top five sports in 2016.
- Observe the trend in the average weight of male and female athletes winning in the top five sports in 2016.
High-Level Steps
- Download the dataset and format it as a pandas DataFrame.
- Filter the DataFrame to only include the rows corresponding to medal winners from 2016.
- Find out the medals awarded in 2016 for each sport.
- List the top five sports based on the largest number of medals awarded. Filter the DataFrame one more time to only include the records for the top five sports in 2016.
- Generate a bar plot of record counts corresponding to each of the top five sports.
- Generate a histogram for the
Age
feature of all medal winners in the top five sports (2016). - Generate a bar plot indicating how many medals were won by each country's team in the top five sports in 2016.
- Generate a bar plot indicating the average weight of players, categorized based on gender, winning in the top five sports in 2016.
The expected output should be:
After Step 1:
Figure 1.32: Olympics dataset
After Step 2:
Figure 1.33: Filtered Olympics DataFrame
After Step 3:
Figure 1.34: The number of medals awarded
After Step 4:
Figure 1.35: Olympics DataFrame
After Step 5:
Figure 1.36: Generated bar plot
After Step 6:
Figure 1.37: Histogram plot with the Age feature
After Step 7:
Figure 1.38: Bar plot with the number of medals won
After Step 8:
Figure 1.39: Bar plot with the average weight of players
The bar plot indicates the highest athlete weight in rowing, followed by swimming, and then the other remaining sports. The trend is similar across both male and female players.
Note
The solution steps can be found on page 254.