The most important development objective of this chapter is to perform spam classification tasks with the following algorithms:
- Stop word remover
- Naive Bayes
- Inverse document frequency
- Hashing trick transformer
- Normalizer
The practical goal of our spam classification task is this: Given a new incoming document, say, a collection of random emails from either Inbox or Spam, the classifier must be able to identify spam in the corpus. After all, this is the basis of an effective classifier. The real-world benefit behind developing this classifier to give our readers experience of developing their own spam filters. After learning how to put together the classifier, we will develop it.
The implementation steps are in the next section. This takes us straight into the development of Scala code in a Spark environment. Given that Spark allows us...