Data preprocessing is an essential step for a DL pipeline. The CPU utilization dataset is ready to be used in the training, but the KDD cup 1999 IDS dataset needs multilevel preprocessing that includes the following three steps:
- Splitting the data into three different protocol sets (application, transport, and network)
- Duplicate data removal, categorical data conversion, and normalization
- Feature selection (optional)
Using the following lines of code is a potential way of splitting the dataset into three datasets, namely Final_App_Layer, Final_Transport_Layer, and Final_Network_Layer:
#Importing all the required Libraries
import pandas as pd
IDSdata = pd.read_csv("kddcup.data_10_percent.csv",header = None,engine = 'python',sep=",")
# Add column header
IDSdata.columns = ["duration","protocol_type","service...