Loading data and managing data types
In this recipe, we show how to load a dataset from a CSV file into Python. The very same principles can be used for other file formats as well, as long as they are supported by `pandas`. Some popular formats include parquet, JSON, XLM, Excel, and feather.
`pandas` has a very consistent API, which makes finding its functions much easier. For example, all functions used for loading data from various sources have the following syntax `pd.read_xxx`, where `xxx` should be replaced by the file format.
We also show how certain data type conversions can significantly reduce the size that DataFrames take in the memory of our computers. This can be especially important when working with large datasets (GBs or TBs), which can simply not fit into memory unless we try to optimize its usage.
In order to present a more realistic scenario (including messy data, missing values, etc.) we applied some transformations to the original dataset. For more information on...