Chapter 1. Preparing the Data
In this chapter, we will cover the basic tasks of reading, storing, and cleaning data using Python and OpenRefine. You will learn the following recipes:
- Reading and writing CSV/TSV files with Python
- Reading and writing JSON files with Python
- Reading and writing Excel files with Python
- Reading and writing XML files with Python
- Retrieving HTML pages with pandas
- Storing and retrieving from a relational database
- Storing and retrieving from MongoDB
- Opening and transforming data with OpenRefine
- Exploring the data with OpenRefine
- Removing duplicates
- Using regular expressions and GREL to clean up the data
- Imputing missing observations
- Normalizing and standardizing features
- Binning the observations
- Encoding categorical variables