Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Science  with Python

You're reading from   Data Science with Python Combine Python with machine learning principles to discover hidden patterns in raw data

Arrow left icon
Product type Paperback
Published in Jul 2019
Publisher Packt
ISBN-13 9781838552862
Length 426 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (3):
Arrow left icon
Rohan Chopra Rohan Chopra
Author Profile Icon Rohan Chopra
Rohan Chopra
Mohamed Noordeen Alaudeen Mohamed Noordeen Alaudeen
Author Profile Icon Mohamed Noordeen Alaudeen
Mohamed Noordeen Alaudeen
Aaron England Aaron England
Author Profile Icon Aaron England
Aaron England
Arrow right icon
View More author details
Toc

Table of Contents (10) Chapters Close

About the Book 1. Introduction to Data Science and Data Pre-Processing FREE CHAPTER 2. Data Visualization 3. Introduction to Machine Learning via Scikit-Learn 4. Dimensionality Reduction and Unsupervised Learning 5. Mastering Structured Data 6. Decoding Images 7. Processing Human Language 8. Tips and Tricks of the Trade 1. Appendix

Chapter 2: Data Visualization

Activity 2: Line Plot

Solution:

  1. Create a list of 6 strings for each month, January through June, and save it as x using:

    x = ['January','February','March','April','May','June']

  2. Create a list of 6 values for 'Items Sold' that starts at 1000 and increases by 200, so the final value is 2000 and save it as y as follows:

    y = [1000, 1200, 1400, 1600, 1800, 2000]

  3. Plot y ('Items Sold') by x ('Month') with a dotted blue line and star markers using the following:

    plt.plot(x, y, '*:b')

  4. Set the x-axis to 'Month' using the following code:

    plt.xlabel('Month')

  5. Set the y-axis to 'Items Sold' as follows:

    plt.ylabel('Items Sold')

  6. To set the title to read 'Items Sold has been Increasing Linearly', refer to the following code:

    plt.title('Items Sold has been Increasing Linearly')

    Check out the following screenshot for the resultant output:

Figure 2.33: Line plot of items sold by month
Figure 2.33: Line plot of items sold by month

Activity 3: Bar Plot

Solution:

  1. Create a list of five strings for x containing the names of NBA franchises with the most titles using the following code:

    x = ['Boston Celtics','Los Angeles Lakers', 'Chicago Bulls', 'Golden State Warriors', 'San Antonio Spurs']

  2. Create a list of five values for y containing values for 'Titles Won' that correspond with the strings in x using the following code:

    y = [17, 16, 6, 6, 5]

  3. Place x and y into a data frame with the column names 'Team' and 'Titles', respectively, as follows:

    import pandas as pd

     

    df = pd.DataFrame({'Team': x,

                       'Titles': y})

  4. To sort the data frame descending by 'Titles' and save it as df_sorted, refer to the following code:

    df_sorted = df.sort_values(by=('Titles'), ascending=False)

    Note

    If we sort with ascending=True, the plot will have larger values to the right. Since we want the larger values on the left, we will be using ascending=False.

  5. Make a programmatic title and save it as title by first finding the team with the most titles and saving it as the team_with_most_titles object using the following code:

    team_with_most_titles = df_sorted['Team'][0]

  6. Then, retrieve the number of titles for the team with the most titles using the following code:

    most_titles = df_sorted['Titles'][0]

  7. Lastly, create a string that reads 'The Boston Celtics have the most titles with 17' using the following code:

    title = 'The {} have the most titles with {}'.format(team_with_most_titles, most_titles)

  8. Use a bar graph to plot the number of titles by team using the following code:

    import matplotlib.pyplot as plt

    plt.bar(df_sorted['Team'], df_sorted['Titles'], color='red')

  9. Set the x-axis label to 'Team' using the following:

    plt.xlabel('Team')

  10. Set the y-axis label to 'Number of Championships' using the following:

    plt.ylabel('Number of Championships')

  11. To prevent the x tick labels from overlapping by rotating them 45 degrees, refer to the following code:

    plt.xticks(rotation=45)

  12. Set the title of the plot to the programmatic title object we created as follows:

    plt.title(title)

  13. Save the plot to our current working directory as 'Titles_by_Team.png' using the following code:

    plt.savefig('Titles_by_Team)

  14. Print the plot using plt.show(). To understand this better, check out the following output screenshot:
    Figure 2.34: The bar plot of the number of titles held by an NBA team
    Figure 2.34: The bar plot of the number of titles held by an NBA team

    Note

    When we print the plot to the console using plt.show(), it appears as intended; however, when we open the file we created titled 'Titles_by_Team.png', we see that it crops the x tick labels.

    The following figure displays the bar plot with the cropped x tick labels.

    Figure 2.35: 'Titles_by_Team.png' with x tick labels cropped
    Figure 2.35: 'Titles_by_Team.png' with x tick labels cropped
  15. To fix the cropping issue, add bbox_inches='tight' as an argument inside of plt.savefig() as follows:

    plt.savefig('Titles_by_Team', bbox_inches='tight')

  16. Now, when we open the saved 'Titles_by_Team.png' file from our working directory, we see that the x tick labels are not cropped.

    Check out the following output for the final result:

Figure 2.36: 'Titles_by_Team.png' without cropped x tick labels
Figure 2.36: 'Titles_by_Team.png' without cropped x tick labels

Activity 4: Multiple Plot Types Using Subplots

Solution:

  1. Import the 'Items_Sold_by_Week.csv' file and save it as the Items_by_Week data frame object using the following code:

    import pandas as pd

     

    Items_by_Week = pd.read_csv('Items_Sold_by_Week.csv')

  2. Import the 'Weight_by_Height.csv' file and save it as the Weight_by_Height data frame object as follows:

    Weight_by_Height = pd.read_csv('Weight_by_Height.csv')

  3. Generate an array of 100 normally distributed numbers to use as data for the histogram and box-and-whisker plots and save it as y using the following code:

    y = np.random.normal(loc=0, scale=0.1, size=100)

  4. To generate a figure with six subplots organized in three rows and two columns that do not overlap refer to the following code:

    import matplotlib.pyplot as plt

     

    fig, axes = plt.subplots(nrows=3, ncols=2)

    plt.tight_layout()

  5. Set the respective axes' titles to match those in Figure 2.32 using the following code:

    axes[0,0].set_title('Line')

    axes[0,1].set_title('Bar')

    axes[1,0].set_title('Horizontal Bar')

    axes[1,1].set_title('Histogram')

    axes[2,0].set_title('Scatter')

    axes[2,1].set_title('Box-and-Whisker')

    Figure 2.37: Titled, non-overlapping empty subplots
    Figure 2.37: Titled, non-overlapping empty subplots
  6. On the 'Line', 'Bar', and 'Horizontal Bar' axes, plot 'Items_Sold' by 'Week' from 'Items_by_Week' using:

    axes[0,0].plot(Items_by_Week['Week'], Items_by_Week['Items_Sold'])

    axes[0,1].bar(Items_by_Week['Week'], Items_by_Week['Items_Sold'])

    axes[1,0].barh(Items_by_Week['Week'], Items_by_Week['Items_Sold'])

    See the resultant output in the following figure:

    Figure 2.38: Line, bar, and horizontal bar plots added
    Figure 2.38: Line, bar, and horizontal bar plots added
  7. On the 'Histogram' and 'Box-and-Whisker' axes, plot the array of 100 normally distributed numbers using the following code:

    axes[1,1].hist(y, bins=20)axes[2,1].boxplot(y)

    The resultant output is displayed here:

    Figure 2.39: The histogram and box-and-whisker added
    Figure 2.39: The histogram and box-and-whisker added
  8. Plot 'Weight' by 'Height' on the 'Scatterplot' axes from the 'Weight_by_Height' data frame using the following code:

    axes[2,0].scatter(Weight_by_Height['Height'], Weight_by_Height['Weight'])

    See the figure here for the resultant output:

    Figure 2.40: Scatterplot added
    Figure 2.40: Scatterplot added
  9. Label the x- and y-axis for each subplot using axes[row, column].set_xlabel('X-Axis Label') and axes[row, column].set_ylabel('Y-Axis Label'), respectively.

    See the figure here for the resultant output:

    Figure 2.41: X and y axes have been labeled
    Figure 2.41: X and y axes have been labeled
  10. Increase the size of the figure with the figsize argument in the subplots function as follows:

    fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(8,8))

  11. Save the figure to the current working directory as 'Six_Subplots' using the following code:

    fig.savefig('Six_Subplots')

    The following figure displays the 'Six_Subplots.png' file:

Figure 2.42: The Six_Subplots.png file
Figure 2.42: The Six_Subplots.png file
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime