Two pillars of Google's Gmail service stand out. These are an Inbox folder, receiving benign or wanted email messages, and a Spam folder, receiving unsolicited, junk emails, or simply spam.
The emphasis of this chapter is on identifying spam and classifying it as such. It explores the following topics concerning spam detection:
- What are the techniques of separating spam from ham?
- If spam filtering is one suitable technique, how can it be formalized as a supervised learning classification task?
- Why is a certain algorithm better than another for spam filtering, and in what respect?
- Where are the tangible benefits of effective spam filtering most felt?
This chapter implements a spam filtering data analysis pipeline.
Implementing a spam classifier with Scala and machine learning (ML) is the overall learning objective...