Data selection in pandas DataFrames
In Chapter 3, Data Structures, we studied the two core pandas data structures, DataFrames
and Series
. There, we did some very basic data selection without digging into the details of how it works. In this section, we will do a deeper dive and explore the index, which is fundamental to many pandas
operations.
As you may recall when we introduced the idea of DataFrames, we drew analogies to spreadsheets. Let's revisit that analogy. Here is the same figure from Chapter 2, Data Structures (which is the data from Figure 5.1 but in a spreadsheet):
Here, we can see the same three columns of data that were shown in Figure 5.1, but we have annotated the key differences. In pandas, the standard row index starts at 0, while for most spreadsheets, it starts at row 1. This "0 indexing" is standard for Python. An index in pandas is a series of numbers or strings...