To recap, thus far we have successfully imputed our dataset—both our categorical and quantitative columns. At this point, you may be wondering, how do we utilize the categorical data with a machine learning algorithm?
Simply put, we need to transform this categorical data into numerical data. So far, we have ensured that the most common category was used to fill the missing values. Now that this is done, we need to take it a step further.Â
Any machine learning algorithm, whether it is a linear-regression or a KNN-utilizing Euclidean distance, requires numerical input features to learn from. There are several methods we can rely on to transform our categorical data into numerical data.