In this chapter, we discussed how text mining is different than traditional attribute-based learning, requiring a lot of pre-processing steps to transform written natural language into feature vectors. Further, we discussed how to leverage Mallet, a Java-based library for NLP by applying it to two real-life problems. First, we modeled topics in a news corpus using the LDA model to build a model that is able to assign a topic to new document. We also discussed how to build a Naive Bayesian spam-filtering classifier using the BoW representation.
This chapter concludes the technical demonstrations of how to apply various libraries to solve machine-learning tasks. As we weren't able to cover more interesting applications and give further details at many points, the next chapter gives some further pointers on how to continue learning and dive deeper into particular topics...