Pandas indexes allow efficient lookup of values. If indexes did not exist, a linear search across all of our data would be required. Indexes create optimized shortcuts to specific data items using a direct lookup instead of a search process.
To begin examining the value of indexes we will use the following DataFrame of 10000 random numbers:
data:image/s3,"s3://crabby-images/50a99/50a99e1adc74c18650527bfc5ae5ee47528cac3e" alt=""
Suppose we want to look up the value of the random number where key==10099 (I explicitly picked this value as it is the last row in the DataFrame). We can do this using a Boolean selection.
data:image/s3,"s3://crabby-images/f5630/f56303bb8dcdb2ffe572cdf040bd3aa1fee5436a" alt=""
Conceptually, this is simple. But what if we want to do this repeatedly? This can be simulated in Python using the %timeit statement. The following code performs the lookup repeatedly and reports on the performance.
data:image/s3,"s3://crabby-images/e678b/e678bc5872482a3863534fb6ea5e79592a7b67f1" alt=""
This result states that there are 1,000 executions performed three times, and the fastest of those three took lookup...