Applying the majority vote ensemble technique on predicted data
It is now time to finally draw our list, applying the majority vote technique we learned previously to our predictions. As done before, we are going to apply a threshold on values predicted from the logistic and SVM models, to map the original predictions on the [0,1] domain. Finally, with a piece of code really similar to the one we have seen before, let's create an ensemble_prediction
attribute, storing a final prediction defined from results coming from the three estimated models:
me_customer_list %>% mutate(logistic_threshold = case_when(as.numeric(logistic)>0.5 ~ 1, TRUE ~ 0), svm_threshold = case_when(as.numeric(svm)>0.5 ~ 1, TRUE ~ 0)) %>% mutate(ensemble_prediction = case_when(logistic_threshold+svm_threshold+ as.numeric(as.character(random_forest)) >=2 ~ 1, TRUE ~ 0)) -> me_customer_list_complete
Is this the list the internal audit team needs from us?
Not quite; there is one more computation required...