Exploring the CIDDS-001 dataset
This section will explore the dataset and get more insights about feature importance and scaling.
The CIDDS-001
dataset [1] is designed to train and evaluate anomaly-based network intrusion detection systems. It provides realistic traffic that includes up-to-date attacks to assess these systems. It was created by collecting and labeling 8,451,520 traffic flows in a virtual environment using OpenStack. Precisely, each row corresponds to a NetFlow connection, describing Internet Protocol (IP) traffic statistics, such as the number of bytes exchanged.
The following figure provides an overview of the simulated network environment in CIDDS-001
.
Figure 16.1 – Overview of the virtual network simulated by CIDDS-001
We see four different subnets (developer, office, management, and server) with their respective IP address ranges. All these subnets are linked to a single server connected to the internet through a firewall...