Selecting the largest of each group by sorting
One of the most basic and common operations to perform during data analysis is to select rows containing the largest value of some column within a group. For instance, this would be like finding the highest-rated film of each year or the highest-grossing film by content rating. To accomplish this task, we need to sort the groups as well as the column used to rank each member of the group, and then extract the highest member of each group.
In this recipe, we will find the highest-rated film of each year.
How to do it…
- Read in the movie dataset and slim it down to just the three columns we care about:
movie_title
,title_year
, andimdb_score
:>>> movie = pd.read_csv("data/movie.csv") >>> movie[["movie_title", "title_year", "imdb_score"]] movie_title ... 0 Avatar ... 1 ...