Utilizing unsupervised methods of anomaly detection
If the hackers are conspicuous and distinct from our valid users, unsupervised methods may prove pretty effective. This is a good place to start before we have labeled data, or if the labeled data is difficult to gather or not guaranteed to be representative of the full spectrum we are looking to flag. Note that, in most cases, we won't have labeled data, so it is crucial that we are familiar with some unsupervised methods.
In our initial EDA, we identified the number of usernames with a failed login attempt in a given minute as a feature for anomaly detection. We will now test out some unsupervised anomaly detection algorithms, using this feature as the jumping-off point. Scikit-learn provides a few such algorithms. In this section, we will look at isolation forest and local outlier factor; a third method, using a one-class support vector machine (SVM), is in the Exercises section.
Before we can try out these methods,...