Loading and Wrangling Data with Pandas and NumPy
Data sources come in many formats: plain text files, CSVs, SQL databases, Excel files, and many more. We saw how to deal with some of these data sources in the last chapter, but there is one library in Python that takes the cake when it comes to data preparation: pandas
. The pandas
library is a core tool for a data scientist, and we will learn how to use it effectively in this chapter. We will learn about:
- Loading data from and saving data to several different data source types
- Some basic exploratory data analysis (EDA) and plotting with pandas
- Preparing and cleaning data for later use, including the imputation of missing data (filling in missing values) and outlier detection
- Essential data wrangling tools such as filtering,
groupby
, andreplace
Overall, this chapter will be another foundational chapter in your data science journey, giving you the tools necessary to get started working with data...