Exercise – UEBA anomaly detection
In this exercise, we consider a dataset of network activity of an account and a device in the network. We extract features from this simplified network log and determine whether there is any anomalous traffic that could point to stolen credentials, policy violations, or anything similar.
We use a dataset that represents a simplified version of network traffic data that could be seen in the logs of an endpoint agent or a network device. We load it as follows:
df_ueba = pd.read_csv('ueba.csv')
The contents of the UEBA dataset are shown in the following figure:
Figure 8.1 – DataFrame with UEBA data
We can observe that the dataset includes a snippet of network traffic for one day from the device of a user named John Doe
.
From the figure, the columns represent the following:
Username
: The username of the device userHostIP
: The IP address of the host deviceRemoteIP
: The IP...