Getting started with pandas
In the previous section, we introduced NumPy and its ability to efficiently store and work with a large array of data. We'll now introduce another widely used library in data science: pandas. This library is built on top of NumPy to provide convenient data structures able to efficiently store large datasets with labeled rows and columns. This is, of course, especially handy when working with most datasets representing real-world data that we want to analyze and use in data science projects.
To get started, we will, of course, install the library with the usual command:
$ pip install pandas
Once done, we can start to use it in a Python interpreter:
$ python >>> import pandas as pd
Just like we alias numpy
as np
, the convention is to alias pandas
as pd
when importing it.
Using pandas Series for one-dimensional data
The first pandas data structure we'll introduce is Series
. This data structure behaves very similarly to...