We don't have labeled data yet, but we can still examine the data to see whether there is something that stands out. This data is different from the data in Chapter 8, Rule-Based Anomaly Detection. The hackers are smarter in this simulation—they don't always try as many users or stick with the same IP address every time. Let's see whether we can come up with some features that will help with anomaly detection by performing some EDA in the 1-EDA_unlabeled_data.ipynb notebook.
As usual, we begin with our imports. These will be the same for all notebooks, so it will be reproduced in this section only:
>>> %matplotlib inline
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> import pandas as pd
>>> import seaborn as sns
Next, we read in the 2018 logs from the logs table in the SQLite database...