Data preprocessing
In the data preprocessing step, we will be focusing on two things mainly: data type transformations and data normalization. Finally we will split the data into training and testing datasets for predictive modeling. You can access the code for this section in the data_preparation.R
file. We will be using some utility functions, which are mentioned in the following code snippet. Remember to load them up in memory by running them in the R console:
## data type transformations - factoring to.factors <- function(df, variables){ for (variable in variables){ df[[variable]] <- as.factor(df[[variable]]) } return(df) } ## normalizing - scaling scale.features <- function(df, variables){ for (variable in variables){ df[[variable]] <- scale(df[[variable]], center=T, scale=T) } return(df) }
The preceding functions operate on the data frame to transform the data. For data type transformations, we mainly perform factoring of the categorical variables,...