Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Pandas Cookbook

You're reading from   Pandas Cookbook Recipes for Scientific Computing, Time Series Analysis and Data Visualization using Python

Arrow left icon
Product type Paperback
Published in Oct 2017
Publisher Packt
ISBN-13 9781784393878
Length 532 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Theodore Petrou Theodore Petrou
Author Profile Icon Theodore Petrou
Theodore Petrou
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Pandas Foundations 2. Essential DataFrame Operations FREE CHAPTER 3. Beginning Data Analysis 4. Selecting Subsets of Data 5. Boolean Indexing 6. Index Alignment 7. Grouping for Aggregation, Filtration, and Transformation 8. Restructuring Data into a Tidy Form 9. Combining Pandas Objects 10. Time Series Analysis 11. Visualization with Matplotlib, Pandas, and Seaborn

Working with operators on a Series

There exist a vast number of operators in Python for manipulating objects. Operators are not objects themselves, but rather syntactical structures and keywords that force an operation to occur on an object. For instance, when the plus operator is placed between two integers, Python will add them together. See more examples of operators in the following code:

>>> 5 + 9   # plus operator example adds 5 and 9
14

>>> 4 ** 2 # exponentiation operator raises 4 to the second power
16

>>> a = 10 # assignment operator assigns 10 to a

>>> 5 <= 9 # less than or equal to operator returns a boolean
True

Operators can work for any type of object, not just numerical data. These examples show different objects being operated on:

>>> 'abcde' + 'fg' 
'abcdefg'

>>> not (5 <= 9)
False

>>> 7 in [1, 2, 6]
False

>>> set([1,2,3]) & set([2,3,4])
set([2,3])

Visit tutorials point (http://bit.ly/2u5g5Io) to see a table of all the basic Python operators. Not all operators are implemented for every object. These examples all produce errors when using an operator:

>>> [1, 2, 3] - 3
TypeError: unsupported operand type(s) for -: 'list' and 'int'

>>> a = set([1,2,3])
>>> a[0]
TypeError: 'set' object does not support indexing

Series and DataFrame objects work with most of the Python operators.

Getting ready

In this recipe, a variety of operators will be applied to different Series objects to produce a new Series with completely different values.

How to do it...

  1. Select the imdb_score column as a Series:
>>> movie = pd.read_csv('data/movie.csv')
>>> imdb_score = movie['imdb_score']
>>> imdb_score
0 7.9 1 7.1 2 6.8 ... 4913 6.3 4914 6.3 4915 6.6 Name: imdb_score, Length: 4916, dtype: float64
  1. Use the plus operator to add one to each Series element:
>>> imdb_score + 1
0 8.9 1 8.1 2 7.8 ... 4913 7.3 4914 7.3 4915 7.6 Name: imdb_score, Length: 4916, dtype: float64
  1. The other basic arithmetic operators minus (-), multiplication (*), division (/), and exponentiation (**) work similarly with scalar values. In this step, we will multiply the series by 2.5:
>>> imdb_score * 2.5
0 19.75 1 17.75 2 17.00 ... 4913 15.75 4914 15.75 4915 16.50 Name: imdb_score, Length: 4916, dtype: float64
  1. Python uses two consecutive division operators (//) for floor division and the percent sign (%) for the modulus operator, which returns the remainder after a division. Series use these the same way:
>>> imdb_score // 7
0 1.0 1 1.0 2 0.0 ... 4913 0.0 4914 0.0 4915 0.0 Name: imdb_score, Length: 4916, dtype: float64
  1. There exist six comparison operators, greater than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), equal to (==), and not equal to (!=). Each comparison operator turns each value in the Series to True or False based on the outcome of the condition:
>>> imdb_score > 7
0 True 1 True 2 False ... 4913 False 4914 False 4915 False Name: imdb_score, Length: 4916, dtype: bool

>>> director = movie['director_name']
>>> director == 'James Cameron'
0 True 1 False 2 False ... 4913 False 4914 False 4915 False Name: director_name, Length: 4916, dtype: bool

How it works...

All the operators used in this recipe apply the same operation to each element in the Series. In native Python, this would require a for-loop to iterate through each of the items in the sequence before applying the operation. Pandas relies heavily on the NumPy library, which allows for vectorized computations, or the ability to operate on entire sequences of data without the explicit writing of for loops. Each operation returns a Series with the same index, but with values that have been modified by the operator.

There's more...

All of the operators used in this recipe have method equivalents that produce the exact same result. For instance, in step 1, imdb_score + 1 may be reproduced with the add method. Check the following code to see the method version of each step in the recipe:

>>> imdb_score.add(1)              # imdb_score + 1
>>> imdb_score.mul(2.5) # imdb_score * 2.5
>>> imdb_score.floordiv(7) # imdb_score // 7
>>> imdb_score.gt(7) # imdb_score > 7
>>> director.eq('James Cameron') # director == 'James Cameron'

Why does pandas offer a method equivalent to these operators? By its nature, an operator only operates in exactly one manner. Methods, on the other hand, can have parameters that allow you to alter their default functionality:

Operator Group Operator Series method name
Arithmetic +, -, *, /, //, %, ** add, sub, mul, div, floordiv, mod, pow
Comparison <, >, <=, >=, ==, !=

lt, gt, le, ge, eq, ne

You may be curious as to how a Python Series object, or any object for that matter, knows what to do when it encounters an operator. For example, how does the expression imdb_score * 2.5 know to multiply each element in the Series by 2.5? Python has a built-in, standardized way for objects to communicate with operators using special methods.

Special methods are what objects call internally whenever they encounter an operator. Special methods are defined in the Python data model, a very important part of the official documentation, and are the same for every object throughout the language. Special methods always begin and end with two underscores. For instance, the special method __mul__ is called whenever the multiplication operator is used. Python interprets the imdb_score * 2.5 expression as imdb_score.__mul__(2.5).

There is no difference between using the special method and using an operator as they are doing the exact same thing. The operator is just syntactic sugar for the special method.

See also

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime