Model performance comparison
The effectiveness of the techniques we’ve discussed so far can be highly dependent on the dataset they are applied to. In this section, we will conduct a comprehensive comparative analysis that compares the various techniques we have discussed so far while using the logistic regression model as a baseline. For a comprehensive review of the complete implementation, please consult the accompanying notebook available on GitHub.
The analysis spans four distinct datasets, each with its own characteristics and challenges:
- Synthetic data with Sep: 0.5: A simulated dataset with moderate separation between classes, serving as a baseline to understand algorithm performance in simplified conditions.
- Synthetic data with Sep: 0.9: Another synthetic dataset, but with a higher degree of separation, allowing us to examine how algorithms perform as class distinguishability improves.
- Thyroid sick dataset: A real-world dataset (available to import...