Generating malware detection features
In ML, features are the data that you use to create a model. You analyze features to look for patterns of various sorts. The Checking data validity section of Chapter 6, Detecting and Analyzing Anomalies, shows you one kind of analysis. However, in the case of the Chapter 6 example and all of the other examples in the book so far, you were viewing data that humans can easily understand. This section talks about a new kind of data hidden in the confines of malware. Consequently, you’re moving from the realm of human-recognizable data to that of machine-recognizable data. The interesting thing is that your ML model won’t care about what kind of data you use to build a model, the only need is for enough data of the right kind to build a statistically sound model to use to locate malware.
Working with a first step example
To actually work with malware, you need a system that has appropriate safety measures in place, such as a virtual...