Dealing with categorical data in collective anomalies
As an another illustrative example, consider a sequence of actions occurring in a computer, as shown below:
: : : http-web, buffer-overflow, http-web, http-web, smtp-mail, ftp, http-web, ssh, smtp- mail, http-web, ssh, buffer-overflow, ftp, http-web, ftp, smtp-mail,http-web : : :
The highlighted sequence of events (buffer-overflow
, ssh
, ftp
) corresponds to a typical, web-based attack by a remote machine followed by the copying of data from the host computer to a remote destination via ftp
. It should be noted that this collection of events is an anomaly, but the individual events are not anomalies when they occur in other locations in the sequence.
These types of categorical data can be transformed into numeric data by assigning a particular number for each command. If the following mapping is applied to transform categorical data to numeric data:
Command |
Numeric Representation |
---|---|
|
1 |
|
2 |
|
3 |
|
4 |
... |