Masking DataFrame rows
The mask
method performs the exact opposite operation that the where
method does. By default, it creates missing values wherever the boolean condition is True
. In essence, it is literally masking, or covering up, values in your dataset.
Getting ready
In this recipe, we will mask all rows of the movie dataset that were made after 2010 and then filter all the rows with missing values.
How to do it...
- Read the
movie
dataset, set the movie title as the index, and create the criteria:
>>> movie = pd.read_csv('data/movie.csv', index_col='movie_title') >>> c1 = movie['title_year'] >= 2010 >>> c2 = movie['title_year'].isnull() >>> criteria = c1 | c2
- Use the
mask
method on a DataFrame to make all the values in rows with movies that were made from 2010 onward missing. Any movie that originally had a missing value fortitle_year
is also masked:
>>> movie.mask(criteria).head()
- Notice how all the values in the third, fourth, and fifth rows...