We have seen that it is possible to use all different types of algorithms (such as supervised, unsupervised, and reinforcement learning), even in the implementation of network anomaly detection systems.
But how can we effectively train these algorithms in order to identify the anomalous traffic?
It will be necessary to first identify a training dataset that is representative of the traffic considered normal within a given organization.
To this end, we will have to adequately choose the representative features of our model.
The choice of features is of particular importance, as they provide a contextual value to the analyzed data, and consequently determine the reliability and accuracy of our detection system.
In fact, choosing features that are not characterized by high correlation with possible anomalous behaviors translates into high error rates...