Pandas
Pandas is a library originally developed by Wes McKinney, which was designed to analyze datasets in a seamless and performant way. In recent years, this powerful library has seen an incredible growth and huge adoption by the Python community. In this section, we will introduce the main concepts and tools provided in this library, and we will use it to increase performance of various usecases that can't otherwise be addressed with NumPy's vectorized operations and broadcasting.
Pandas fundamentals
While NumPy deals mostly with arrays, Pandas main data structures are pandas.Series
, pandas.DataFrame
, and pandas.Panel
. In the rest of this chapter, we will abbreviate pandas
with pd
.
The main difference between a pd.Series
object and an np.array
is that a pd.Series
object associates a specific key to each element of an array. Let’s see how this works in practice with an example.
Let's assume that we are trying to test a new blood pressure drug, and we want to store, for each patient, whether...