Data Cleaning and Pre-processing
Learning Objectives
By the end of this chapter, you will be able to:
- Perform the sort, rank, filter, subset, normalize, scale, and join operations in an R data frame.
- Identify and handle outliers, missing values, and duplicates gracefully using the MICE and rpart packages.
- Perform undersampling and oversampling on a dataset.
- Apply the concepts of ROSE and SMOTE to handle unbalanced data.
This chapter covers the important concepts of handling data and making the data ready for analysis.