In this chapter, we learned how to implement a binary classification task using two approaches such as, an ML pipeline using the Random Forest algorithm and an secondly using the logistic regression method.Â
Both pipelines combined several stages of data analysis into one workflow. In both pipelines, we calculated metrics to give us an estimate of how well our classifier performed. Early on in our data analysis task, we introduced a data preprocessing step to get rid of rows that were missing attribute values that were filled in by a placeholder, ?. With 16 rows of unavailable attribute values eliminated and 683 rows with attribute values still available, we constructed a new DataFrame.
In each pipeline, we also created training, training, and validation datasets, followed by a training phase where we fit the models on training data. As with every ML task...