In previous approaches, we started by defining what normal is, and then considered anything that doesn't conform to this as outliers. The isolation forest algorithm follows a different approach. Since the outliers are few and different, they are easier to isolate from the rest. So, when building a forest of random trees, a sample that ends in leaf nodes early in a tree—that is, it did not need a lot of branching effort to be isolated—is more likely to be an outlier.
As a tree-based ensemble, this algorithm shares many hyperparameters with its counterparts, such as the number of random trees to build (n_estimators), the ratio of samples to use when building each tree (max_samples), the ratio of features to consider when building each tree (max_features), and whether to sample with a replacement or not (bootstrap). You can also build the trees in parallel using all the available CPUs on your machine by setting...