Slicing lexicographically
The .loc
indexer typically selects data based on the exact string label of the index. However, it also allows you to select data based on the lexicographic order of the values in the index. Specifically, .loc
allows you to select all rows with an index lexicographically using slice notation. This works only if the index is sorted.
Getting ready
In this recipe, you will first sort the index and then use slice notation inside the .loc
indexer to select all rows between two strings.
How to do it...
- Read in the college dataset, and set the institution name as the index:
>>> college = pd.read_csv('data/college.csv', index_col='INSTNM')
- Attempt to select all colleges with names lexicographically between
'Sp'
and'Su'
:
>>> college.loc['Sp':'Su']
KeyError: 'Sp'
- As the index is not sorted, the preceding command fails. Let's go ahead and sort the index:
>>> college = college.sort_index()
- Now, let's rerun the same command from step 2:
>>> college.loc...