Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Pandas 1.x Cookbook

You're reading from   Pandas 1.x Cookbook Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python

Arrow left icon
Product type Paperback
Published in Feb 2020
Publisher Packt
ISBN-13 9781839213106
Length 626 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Theodore Petrou Theodore Petrou
Author Profile Icon Theodore Petrou
Theodore Petrou
Matthew Harrison Matthew Harrison
Author Profile Icon Matthew Harrison
Matthew Harrison
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Pandas Foundations 2. Essential DataFrame Operations FREE CHAPTER 3. Creating and Persisting DataFrames 4. Beginning Data Analysis 5. Exploratory Data Analysis 6. Selecting Subsets of Data 7. Filtering Rows 8. Index Alignment 9. Grouping for Aggregation, Filtration, and Transformation 10. Restructuring Data into a Tidy Form 11. Combining Pandas Objects 12. Time Series Analysis 13. Visualization with Matplotlib, Pandas, and Seaborn 14. Debugging and Testing Pandas 15. Other Books You May Enjoy
16. Index

Transposing the direction of a DataFrame operation

Many DataFrame methods have an axis parameter. This parameter controls the direction in which the operation takes place. Axis parameters can be 'index' (or 0) or 'columns' (or 1). I prefer the string versions are they are more explicit and tend to make the code easier to read.

Nearly all DataFrame methods default the axis parameter to 0, which applies to operations along the index. This recipe shows you how to invoke the same method along both axes.

How to do it...

  1. Read in the college dataset; the columns that begin with UGDS represent the percentage of the undergraduate students of a particular race. Use the filter method to select these columns:
    >>> college = pd.read_csv(
    ...     "data/college.csv", index_col="INSTNM"
    ... )
    >>> college_ugds = college.filter(like="UGDS_")
    >>> college_ugds.head()
                  UGDS_WHITE  UGDS_BLACK  ...  UGDS_NRA  UGDS_UNKN
    INSTNM                                ...                     
    Alabama A...      0.0333      0.9353  ...    0.0059     0.0138
    Universit...      0.5922      0.2600  ...    0.0179     0.0100
    Amridge U...      0.2990      0.4192  ...    0.0000     0.2715
    Universit...      0.6988      0.1255  ...    0.0332     0.0350
    Alabama S...      0.0158      0.9208  ...    0.0243     0.0137
    
  2. Now that the DataFrame contains homogenous column data, operations can be sensibly done both vertically and horizontally. The .count method returns the number of non-missing values. By default, its axis parameter is set to 0:
    >>> college_ugds.count()
    UGDS_WHITE    6874
    UGDS_BLACK    6874
    UGDS_HISP     6874
    UGDS_ASIAN    6874
    UGDS_AIAN     6874
    UGDS_NHPI     6874
    UGDS_2MOR     6874
    UGDS_NRA      6874
    UGDS_UNKN     6874
    dtype: int64
    

    The axis parameter is almost always set to 0. So, step 2 is equivalent to both college_ugds.count(axis=0) and college_ugds.count(axis='index').

  3. Changing the axis parameter to 'columns' changes the direction of the operation so that we get back a count of non-missing items in each row:
    >>> college_ugds.count(axis="columns").head()
    INSTNM
    Alabama A & M University               9
    University of Alabama at Birmingham    9
    Amridge University                     9
    University of Alabama in Huntsville    9
    Alabama State University               9
    dtype: int64
    
  4. Instead of counting non-missing values, we can sum all the values in each row. Each row of percentages should add up to 1. The .sum method may be used to verify this:
    >>> college_ugds.sum(axis="columns").head()
    INSTNM
    Alabama A & M University               1.0000
    University of Alabama at Birmingham    0.9999
    Amridge University                     1.0000
    University of Alabama in Huntsville    1.0000
    Alabama State University               1.0000
    dtype: float64
    
  5. To get an idea of the distribution of each column, the .median method can be used:
    >>> college_ugds.median(axis="index")
    UGDS_WHITE    0.55570
    UGDS_BLACK    0.10005
    UGDS_HISP     0.07140
    UGDS_ASIAN    0.01290
    UGDS_AIAN     0.00260
    UGDS_NHPI     0.00000
    UGDS_2MOR     0.01750
    UGDS_NRA      0.00000
    UGDS_UNKN     0.01430
    dtype: float64
    

How it works...

The direction of operation on the axis is one of the more confusing aspects of pandas. Many pandas users have difficulty remembering the meaning of the axis parameter. I remember them by reminding myself that a Series only has one axis, the index (or 0). A DataFrame also has an index (axis 0) and columns (axis 1).

There's more...

The .cumsum method with axis=1 accumulates the race percentages across each row. It gives a slightly different view of the data. For example, it is very easy to see the exact percentage of white and black students for each school:

>>> college_ugds_cumsum = college_ugds.cumsum(axis=1)
>>> college_ugds_cumsum.head()
              UGDS_WHITE  UGDS_BLACK  ...  UGDS_NRA  UGDS_UNKN
INSTNM                                ...                     
Alabama A...      0.0333      0.9686  ...    0.9862     1.0000
Universit...      0.5922      0.8522  ...    0.9899     0.9999
Amridge U...      0.2990      0.7182  ...    0.7285     1.0000
Universit...      0.6988      0.8243  ...    0.9650     1.0000
Alabama S...      0.0158      0.9366  ...    0.9863     1.0000
You have been reading a chapter from
Pandas 1.x Cookbook - Second Edition
Published in: Feb 2020
Publisher: Packt
ISBN-13: 9781839213106
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime