Exploratory data analysis
In this scenario, we have the benefit of access to labeled data (logs/attacks.csv
) and will use it to investigate how to distinguish between valid users and attackers. However, this is a luxury that we often don't have, especially once we leave the research phase and enter the application phase. In Chapter 11, Machine Learning Anomaly Detection, we will revisit this scenario, but begin without the labeled data for more of a challenge. As usual, we start with our imports and reading in the data:
>>> %matplotlib inline >>> import matplotlib.pyplot as plt >>> import numpy as np >>> import pandas as pd >>> import seaborn as sns >>> log = pd.read_csv( ...     'logs/log.csv', index_col='datetime', parse_dates=True ... )
The login attempts dataframe (log
) contains the date and time of each attempt in the datetime
column, the IP address it came from (source_ip...