You're reading from Pandas Cookbook Recipes for Scientific Computing, Time Series Analysis and Data Visualization using Python

Product type Paperback

Published in Oct 2017

Publisher Packt

ISBN-13 9781784393878

Length 532 pages

Edition 1st Edition

Languages

Python

Tools

Pandas

Concepts

Data Analysis

Author (1):

Theodore Petrou

View More author details

Working with operators on a Series

There exist a vast number of operators in Python for manipulating objects. Operators are not objects themselves, but rather syntactical structures and keywords that force an operation to occur on an object. For instance, when the plus operator is placed between two integers, Python will add them together. See more examples of operators in the following code:

>>> 5 + 9   # plus operator example adds 5 and 9
14

>>> 4 ** 2  # exponentiation operator raises 4 to the second power
16

>>> a = 10  # assignment operator assigns 10 to a

>>> 5 <= 9  # less than or equal to operator returns a boolean
True

Operators can work for any type of object, not just numerical data. These examples show different objects being operated on:

>>> 'abcde' + 'fg' 
'abcdefg'

>>> not (5 <= 9)
False

>>> 7 in [1, 2, 6]
False

>>> set([1,2,3]) & set([2,3,4])
set([2,3])

Visit tutorials point (http://bit.ly/2u5g5Io) to see a table of all the basic Python operators. Not all operators are implemented for every object. These examples all produce errors when using an operator:

>>> [1, 2, 3] - 3
TypeError: unsupported operand type(s) for -: 'list' and 'int'

>>> a = set([1,2,3])     
>>> a[0]               
TypeError: 'set' object does not support indexing

Series and DataFrame objects work with most of the Python operators.

Getting ready

In this recipe, a variety of operators will be applied to different Series objects to produce a new Series with completely different values.

How to do it...

Select the imdb_score column as a Series:

>>> movie = pd.read_csv('data/movie.csv')
>>> imdb_score = movie['imdb_score']
>>> imdb_score
0       7.9
1       7.1
2       6.8
       ... 
4913    6.3
4914    6.3
4915    6.6
Name: imdb_score, Length: 4916, dtype: float64

Use the plus operator to add one to each Series element:

>>> imdb_score + 1
0       8.9
1       8.1
2       7.8
       ... 
4913    7.3
4914    7.3
4915    7.6
Name: imdb_score, Length: 4916, dtype: float64

The other basic arithmetic operators minus (-), multiplication (*), division (/), and exponentiation (**) work similarly with scalar values. In this step, we will multiply the series by 2.5:

>>> imdb_score * 2.5
0       19.75
1       17.75
2       17.00
        ...  
4913    15.75
4914    15.75
4915    16.50
Name: imdb_score, Length: 4916, dtype: float64

Python uses two consecutive division operators (//) for floor division and the percent sign (%) for the modulus operator, which returns the remainder after a division. Series use these the same way:

>>> imdb_score // 7
0       1.0
1       1.0
2       0.0
       ... 
4913    0.0
4914    0.0
4915    0.0
Name: imdb_score, Length: 4916, dtype: float64

There exist six comparison operators, greater than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), equal to (==), and not equal to (!=). Each comparison operator turns each value in the Series to True or False based on the outcome of the condition:

>>> imdb_score > 7
0        True
1        True
2       False
        ...  
4913    False
4914    False
4915    False
Name: imdb_score, Length: 4916, dtype: bool

>>> director = movie['director_name']
>>> director == 'James Cameron'
0        True
1       False
2       False
        ...  
4913    False
4914    False
4915    False
Name: director_name, Length: 4916, dtype: bool

How it works...

All the operators used in this recipe apply the same operation to each element in the Series. In native Python, this would require a for-loop to iterate through each of the items in the sequence before applying the operation. Pandas relies heavily on the NumPy library, which allows for vectorized computations, or the ability to operate on entire sequences of data without the explicit writing of for loops. Each operation returns a Series with the same index, but with values that have been modified by the operator.

There's more...

All of the operators used in this recipe have method equivalents that produce the exact same result. For instance, in step 1, imdb_score + 1 may be reproduced with the add method. Check the following code to see the method version of each step in the recipe:

>>> imdb_score.add(1)              # imdb_score + 1
>>> imdb_score.mul(2.5)            # imdb_score * 2.5
>>> imdb_score.floordiv(7)         # imdb_score // 7
>>> imdb_score.gt(7)               # imdb_score > 7
>>> director.eq('James Cameron')   # director == 'James Cameron'

Why does pandas offer a method equivalent to these operators? By its nature, an operator only operates in exactly one manner. Methods, on the other hand, can have parameters that allow you to alter their default functionality:

Operator Group	Operator	Series method name
Arithmetic	`+`, `-`, ``, `/`, `//`, `%`, `*`	`add`, `sub`, `mul`, `div`, `floordiv`, `mod`, `pow`
Comparison	`<`, `>`, `<=`, `>=`, `==`, `!=`	`lt`, `gt`, `le`, `ge`, `eq`, `ne`

You may be curious as to how a Python Series object, or any object for that matter, knows what to do when it encounters an operator. For example, how does the expression imdb_score * 2.5 know to multiply each element in the Series by 2.5? Python has a built-in, standardized way for objects to communicate with operators using special methods.

Special methods are what objects call internally whenever they encounter an operator. Special methods are defined in the Python data model, a very important part of the official documentation, and are the same for every object throughout the language. Special methods always begin and end with two underscores. For instance, the special method __mul__ is called whenever the multiplication operator is used. Python interprets the imdb_score * 2.5 expression as imdb_score.__mul__(2.5).

There is no difference between using the special method and using an operator as they are doing the exact same thing. The operator is just syntactic sugar for the special method.

You're reading from Pandas Cookbook Recipes for Scientific Computing, Time Series Analysis and Data Visualization using Python

Table of Contents (12) Chapters

Working with operators on a Series

Getting ready

How to do it...

How it works...

There's more...

See also

Authors (1)

Other recommended products

Personalised recommendations for you

You're reading from Pandas Cookbook Recipes for Scientific Computing, Time Series Analysis and Data Visualization using Python

Table of Contents (12) Chapters

Unlock this book and the full library FREE for 7 days

Authors (1)

Other recommended products

Personalised recommendations for you