Understanding the project
When starting a project, we need a purpose – that is, a goal we want to reach at the end. After all, knowing the problem is part of the solution. Like Lewis Carrol wrote in his book Alice’s Adventures in Wonderland, the Bunny says to Alice that if she does not know where she wants to go, any path will lead her there.
So, let’s begin by understanding the project, or where we want to go.
The dataset
The input data for this project is the Spambase Data Set (https://tinyurl.com/23xwdcah), which can be found in the UCI datasets repository. See the citation information in the Further reading section at the end of this chapter for more.
It contains 4,601 observations and 57 explanatory variables. Out of those, 48 features are floating numbers representing the percentage value, from 0 to 100, of specific words associated with spam and their percentage present in the message. There are six other variables with special characters such...