After a hand-calculating spam email detection example, as promised, we are going to code it through a genuine dataset, taken from the Enron email dataset http://www.aueb.gr/users/ion/data/enron-spam/. The specific dataset we are using can be directly downloaded via http://www.aueb.gr/users/ion/data/enron-spam/preprocessed/enron1.tar.gz. You can either unzip it using a software or run the command line tar -xvz enron1.tar.gz in the Terminal. The uncompressed folder includes a folder of ham email text files and a folder of spam email text files, as well as a summary description of the database:
enron1/
ham/
0001.1999-12-10.farmer.ham.txt
0002.1999-12-13.farmer.ham.txt
......
......
5172.2002-01-11.farmer.ham.txt
spam/
0006.2003-12-18.GP.spam.txt
0008.2003-12-18.GP.spam.txt
......
......
5171.2005-09-06.GP.spam.txt
Summary.txt
Given a dataset for...