In this chapter, we will learn how to understand and prepare our dataset of banks for model development. We will answer questions regarding the number of variables we have and their quality. Descriptive analysis is crucial to understanding our data and for analyzing possible problems with the information quality. We will see how to deal with missing values, convert variables into different formats, and how to split our data to train and validate our predictive model.
Specifically, we will cover the following topics:
- Data overview
- Converting formats
- Sampling
- Dealing with missing and outliers values
- Implementing descriptive analysis