Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Pandas Cookbook

You're reading from   Pandas Cookbook Recipes for Scientific Computing, Time Series Analysis and Data Visualization using Python

Arrow left icon
Product type Paperback
Published in Oct 2017
Publisher Packt
ISBN-13 9781784393878
Length 532 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Theodore Petrou Theodore Petrou
Author Profile Icon Theodore Petrou
Theodore Petrou
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Pandas Foundations FREE CHAPTER 2. Essential DataFrame Operations 3. Beginning Data Analysis 4. Selecting Subsets of Data 5. Boolean Indexing 6. Index Alignment 7. Grouping for Aggregation, Filtration, and Transformation 8. Restructuring Data into a Tidy Form 9. Combining Pandas Objects 10. Time Series Analysis 11. Visualization with Matplotlib, Pandas, and Seaborn

Chaining Series methods together

In Python, every variable is an object, and all objects have attributes and methods that refer to or return more objects. The sequential invocation of methods using the dot notation is referred to as method chaining. Pandas is a library that lends itself well to method chaining, as many Series and DataFrame methods return more Series and DataFrames, upon which more methods can be called.

Getting ready

To motivate method chaining, let's take a simple English sentence and translate the chain of events into a chain of methods. Consider the sentence, A person drives to the store to buy food, then drives home and prepares, cooks, serves, and eats the food before cleaning the dishes.

A Python version of this sentence might take the following form:

>>> person.drive('store')\
.buy('food')\
.drive('home')\
.prepare('food')\
.cook('food')\
.serve('food')\
.eat('food')\
.cleanup('dishes')

In the preceding code, the person is the object calling each of the methods, just as the person is performing all of the actions in the original sentence. The parameter passed to each of the methods specifies how the method operates.

Although it is possible to write the entire method chain in a single unbroken line, it is far more palatable to write a single method per line. Since Python does not normally allow a single expression to be written on multiple lines, you need to use the backslash line continuation character. Alternatively, you may wrap the whole expression in parentheses. To improve readability even more, place each method directly under the dot above it. This recipe shows similar method chaining with pandas Series.

How to do it...

  1. Load in the movie dataset, and select two columns as a distinct Series:
>>> movie = pd.read_csv('data/movie.csv')
>>> actor_1_fb_likes = movie['actor_1_facebook_likes']
>>> director = movie['director_name']
  1. One of the most common methods to append to the chain is the head method. This suppresses long output. For shorter chains, there isn't as great a need to place each method on a different line:
>>> director.value_counts().head(3)
Steven Spielberg 26 Woody Allen 22 Clint Eastwood 20 Name: director_name, dtype: int64
  1. A common way to count the number of missing values is to chain the sum method after isnull:
>>> actor_1_fb_likes.isnull().sum()
7
  1. All the non-missing values of actor_1_fb_likes should be integers as it is impossible to have a partial Facebook like. Any numeric columns with missing values must have their data type as float. If we fill missing values from actor_1_fb_likes with zeros, we can then convert it to an integer with the astype method:
>>> actor_1_fb_likes.dtype
dtype('float64')

>>> actor_1_fb_likes.fillna(0)\
.astype(int)\
.head()
0 1000 1 40000 2 11000 3 27000 4 131 Name: actor_1_facebook_likes, dtype: int64

How it works...

Method chaining is possible with all Python objects since each object method must return another object that itself will have more methods. It is not necessary for the method to return the same type of object.

Step 2 first uses value_counts to return a Series and then chains the head method to select the first three elements. The final returned object is a Series, which could also have had more methods chained on it.

In step 3, the isnull method creates a boolean Series. Pandas numerically evaluates False/True as 0/1, so the sum method returns the number of missing values.

Each of the three chained methods in step 4 returns a Series. It may not seem intuitive, but the astype method returns an entirely new Series with a different data type

There's more...

Instead of summing up the booleans in step 3 to find the total number of missing values, we can take the mean of the Series to get the percentage of values that are missing:

>>> actor_1_fb_likes.isnull().mean()
0.0014

As was mentioned at the beginning of the recipe, it is possible to use parentheses instead of the backslash for multi-line code. Step 4 may be rewritten this way:

>>> (actor_1_fb_likes.fillna(0)
.astype(int)
.head())

Not all programmers like the use of method chaining, as there are some downsides. One such downside is that debugging becomes difficult. None of the intermediate objects produced during the chain are stored in a variable, so if there is an unexpected result, it will be difficult to trace the exact location in the chain where it occurred.

The example at the start of the recipe may be rewritten so that the result of each method gets preserved as/in a unique variable. This makes tracking bugs much easier, as you can inspect the object at each step:

>>> person1 = person.drive('store')
>>> person2 = person1.buy('food')
>>> person3 = person2.drive('home')
>>> person4 = person3.prepare('food')
>>> person5 = person4.cook('food')
>>> person6 = person5.serve('food')
>>> person7 = person6.eat('food')
>>> person8 = person7.cleanup('dishes')
You have been reading a chapter from
Pandas Cookbook
Published in: Oct 2017
Publisher: Packt
ISBN-13: 9781784393878
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image