Search icon CANCEL
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Interactive Data Visualization with Python - Second Edition

You're reading from  Interactive Data Visualization with Python - Second Edition

Product type Book
Published in Apr 2020
Publisher
ISBN-13 9781800200944
Pages 362 pages
Edition 2nd Edition
Languages
Authors (4):
Abha Belorkar Abha Belorkar
Profile icon Abha Belorkar
Sharath Chandra Guntuku Sharath Chandra Guntuku
Profile icon Sharath Chandra Guntuku
Shubhangi Hora Shubhangi Hora
Profile icon Shubhangi Hora
Anshu Kumar Anshu Kumar
Profile icon Anshu Kumar
View More author details

Table of Contents (9) Chapters

Preface 1. Introduction to Visualization with Python – Basic and Customized Plotting 2. Static Visualization – Global Patterns and Summary Statistics 3. From Static to Interactive Visualization 4. Interactive Visualization of Data across Strata 5. Interactive Visualization of Data across Time 6. Interactive Visualization of Geographical Data 7. Avoiding Common Pitfalls to Create Interactive Visualizations Appendix

Tweaking Plot Parameters

Looking at the last figure in our previous section, we find that the legend is not appropriately placed. We can tweak the plot parameters to adjust the placements of the legends and the axis labels, as well as change the font-size and rotation of the tick labels.

Exercise 11: Tweaking the Plot Parameters of a Grouped Bar Plot

In this exercise, we'll tweak the plot parameters, for example, hue, of a grouped bar plot. We'll see how to place legends and axis labels in the right places and also explore the rotation feature:

  1. Import the necessary modules—in this case, only seaborn:
    #Import seaborn
    import seaborn as sns
  2. Load the dataset:
    diamonds_df = sns.load_dataset('diamonds')
  3. Use the hue parameter to plot nested groups:
    ax = sns.barplot(x="cut", y="price", hue='color', data=diamonds_df)

    The output is as follows:

    Figure 1.26: Nested bar plot with the hue parameter
    Figure 1.26: Nested bar plot with the hue parameter
  4. Place the legend appropriately on the bar plot:
    ax = sns.barplot(x='cut', y='price', hue='color', data=diamonds_df)
    ax.legend(loc='upper right',ncol=4)

    The output is as follows:

    Figure 1.27: Grouped bar plot with legends placed appropriately
    Figure 1.27: Grouped bar plot with legends placed appropriately

    In the preceding ax.legend() call, the ncol parameter denotes the number of columns into which values in the legend are to be organized, and the loc parameter specifies the location of the legend and can take any one of eight values (upper left, lower center, and so on).

  5. To modify the axis labels on the x axis and y axis, input the following code:
    ax = sns.barplot(x='cut', y='price', hue='color', data=diamonds_df)
    ax.legend(loc='upper right', ncol=4)
    ax.set_xlabel('Cut', fontdict={'fontsize' : 15})
    ax.set_ylabel('Price', fontdict={'fontsize' : 15})

    The output is as follows:

    Figure 1.28: Grouped bar plot with modified labels
    Figure 1.28: Grouped bar plot with modified labels
  6. Similarly, use this to modify the font-size and rotation of the x axis of the tick labels:
    ax = sns.barplot(x='cut', y='price', hue='color', data=diamonds_df)
    ax.legend(loc='upper right',ncol=4)
    # set fontsize and rotation of x-axis tick labels
    ax.set_xticklabels(ax.get_xticklabels(), fontsize=13, rotation=30)

    The output is as follows:

Figure 1.29: Grouped bar plot with the rotation feature of the labels
Figure 1.29: Grouped bar plot with the rotation feature of the labels

The rotation feature is particularly useful when the tick labels are long and crowd up together on the x axis.

Annotations

Another useful feature to have in plots is the annotation feature. In the following exercise, we'll make a simple bar plot more informative by adding some annotations.Suppose we want to add more information to the plot about ideally cut diamonds. We can do this in the following exercise:

Exercise 12: Annotating a Bar Plot

In this exercise, we will annotate a bar plot, generated using the catplot function of seaborn, using a note right above the plot. Let's see how:

  1. Import the necessary modules:
    import matplotlib.pyplot as plt
    import seaborn as sns
  2. Load the diamonds dataset:
    diamonds_df = sns.load_dataset('diamonds')
  3. Generate a bar plot using catplot function of the seaborn library:
    ax = sns.catplot("cut", data=diamonds_df, aspect=1.5, kind="count", color="b")

    The output is as follows:

    Figure 1.30: Bar plot with seaborn's catplot function
    Figure 1.30: Bar plot with seaborn's catplot function
  4. Annotate the column belonging to the Ideal category:
    # get records in the DataFrame corresponding to ideal cut
    ideal_group = diamonds_df.loc[diamonds_df['cut']=='Ideal']
  5. Find the location of the x coordinate where the annotation has to be placed:
    # get the location of x coordinate where the annotation has to be placed
    x = ideal_group.index.tolist()[0]
  6. Find the location of the y coordinate where the annotation has to be placed:
    # get the location of y coordinate where the annotation has to be placed
    y = len(ideal_group)
  7. Print the location of the x and y co-ordinates:
    print(x)
    print(y)

    The output is:

    0
    21551
  8. Annotate the plot with a note:
    # annotate the plot with any note or extra information
    sns.catplot("cut", data=diamonds_df, aspect=1.5, kind="count", color="b")
    plt.annotate('excellent polish and symmetry ratings;\nreflects almost all the light that enters it', xy=(x,y), xytext=(x+0.3, y+2000), arrowprops=dict(facecolor='red'))

    The output is as follows:

    Figure 1.31: Annotated bar plot
Figure 1.31: Annotated bar plot

Now, there seem to be a lot of parameters in the annotate function, but worry not! Matplotlib's https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.annotate.html official documentation covers all the details. For instance, the xy parameter denotes the point (x,y) on the figure to annotate. xytext denotes the position (x,y) to place the text at. If None, it defaults to xy. Note that we added an offset of .3 for x and 2000 for y (since y is close to 20,000) for the sake of readability of the text. The color of the arrow is specified using the arrowprops parameter in the annotate function.

There are several other bells and whistles associated with visualization libraries in Python, some of which we will see as we progress in the book. At this stage, we will go through a chapter activity to revise the concepts in this chapter.

So far, we have seen how to generate two simple plots using seaborn and pandas—histograms and bar plots:

  • Histograms: Histograms are useful for understanding the statistical distribution of a numerical feature in a given dataset. They can be generated using the hist() function in pandas and distplot() in seaborn.
  • Bar plots: Bar plots are useful for gaining insight into the values taken by a categorical feature in a given dataset. They can be generated using the plot(kind='bar') function in pandas and the catplot(kind='count'), and barplot() functions in seaborn.

With the help of various considerations arising in the process of plotting these two types of visualizations, we presented some basic concepts in data visualization:

  • Formatting legends to present labels for different elements in the plot with loc and other parameters in the legend function
  • Changing the properties of tick labels, such as font-size, and rotation, with parameters in the set_xticklabels() and set_yticklabels() functions
  • Adding annotations for additional information with the annotate() function

Activity 1: Analyzing Different Scenarios and Generating the Appropriate Visualization

We'll be working with the 120 years of Olympic History dataset acquired by Randi Griffin from https://www.sports-reference.com/ and made available on the GitHub repository of this book. Your assignment is to identify the top five sports based on the largest number of medals awarded in the year 2016, and then perform the following analysis:

  1. Generate a plot indicating the number of medals awarded in each of the top five sports in 2016.
  2. Plot a graph depicting the distribution of the age of medal winners in the top five sports in 2016.
  3. Find out which national teams won the largest number of medals in the top five sports in 2016.
  4. Observe the trend in the average weight of male and female athletes winning in the top five sports in 2016.

High-Level Steps

  1. Download the dataset and format it as a pandas DataFrame.
  2. Filter the DataFrame to only include the rows corresponding to medal winners from 2016.
  3. Find out the medals awarded in 2016 for each sport.
  4. List the top five sports based on the largest number of medals awarded. Filter the DataFrame one more time to only include the records for the top five sports in 2016.
  5. Generate a bar plot of record counts corresponding to each of the top five sports.
  6. Generate a histogram for the Age feature of all medal winners in the top five sports (2016).
  7. Generate a bar plot indicating how many medals were won by each country's team in the top five sports in 2016.
  8. Generate a bar plot indicating the average weight of players, categorized based on gender, winning in the top five sports in 2016.

The expected output should be:

After Step 1:

Figure 1.32: Olympics dataset
Figure 1.32: Olympics dataset

After Step 2:

Figure 1.33: Filtered Olympics DataFrame
Figure 1.33: Filtered Olympics DataFrame

After Step 3:

Figure 1.34: The number of medals awarded
Figure 1.34: The number of medals awarded

After Step 4:

Figure 1.35: Olympics DataFrame
Figure 1.35: Olympics DataFrame

After Step 5:

Figure 1.36: Generated bar plot
Figure 1.36: Generated bar plot

After Step 6:

Figure 1.37: Histogram plot with the Age feature
Figure 1.37: Histogram plot with the Age feature

After Step 7:

Figure 1.38: Bar plot with the number of medals won
Figure 1.38: Bar plot with the number of medals won

After Step 8:

Figure 1.39: Bar plot with the average weight of players
Figure 1.39: Bar plot with the average weight of players

The bar plot indicates the highest athlete weight in rowing, followed by swimming, and then the other remaining sports. The trend is similar across both male and female players.

Note

The solution steps can be found on page 254.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}