Selecting with unique and sorted indexes
Index selection performance drastically improves when the index is unique or sorted. The prior recipe used an unsorted index that contained duplicates, which makes for relatively slow selections.
Getting ready
In this recipe, we use the college
dataset to form unique or sorted indexes to increase the performance of index selection. We will continue to compare the performance to boolean indexing as well.
How to do it...
- Read in the college dataset, create a separate DataFrame with
STABBR
as the index, and check whether the index is sorted:
>>> college = pd.read_csv('data/college.csv') >>> college2 = college.set_index('STABBR') >>> college2.index.is_monotonic False
- Sort the index from
college2
and store it as another object:
>>> college3 = college2.sort_index() >>> college3.index.is_monotonic True
- Time the selection of the state of Texas (TX) from all three DataFrames:
>>> %timeit college[college['STABBR...