Using Isolation Forest to find anomalies
Isolation Forest is a relatively new machine learning technique for identifying anomalies. It has quickly become popular, partly because its algorithm is optimized to find anomalies, rather than normal values. It finds outliers by successive partitioning of the data until a data point has been isolated. Points that require fewer partitions to be isolated receive higher anomaly scores. This process turns out to be fairly easy on system resources. In this recipe, we demonstrate how to use it to detect outlier COVID-19 cases and deaths.
Getting ready
You will need scikit-learn and Matplotlib to run the code in this recipe. You can install them by entering pip install sklearn
and pip install matplotlib
in the terminal or powershell
(in Windows).
How to do it...
We will use Isolation Forest to find the countries whose attributes indicate that they are most anomalous:
- Load
pandas
,matplotlib
, and theStandardScaler
andIsolationForest...