E-mail subject line tester
Spam is junk e-mail, which is understood as Unsolicited Bulk Email (UBE). E-mails can be blocked before they are delivered to the recipient based on e-mail filter reports. The e-mail filter scans the subject line of e-mails for spam or ham (e-mail that is not spam is often called ham). One of the e-mail filters is the e-mail subject line filter. Over 35 percent of spam mails are detected from the subject line of an e-mail.
An E-mail Subject Line Tester is a simple program that will define whether a certain subject line in an e-mail is spam or not. In this chapter, we will program a Naïve Bayes classifier from scratch. The example will classify whether a subject line is a spam or not with a very simple code by breaking the subject lines into a list of relevant words that will be used as a feature vector in the algorithm. In order to do this, we will use the SpamAssassin public dataset. SpamAssasin includes three categories: spam, easy ham, and hard ham. In this...