Chapter 10. Text Mining with Mallet – Topic Modeling and Spam Detection
In this chapter, we will first discuss what text mining is, what kind of analysis is it able to offer, and why you might want to use it in your application. We will then discuss how to work with Mallet, a Java library for natural language processing, covering data import and text pre-processing. Afterwards, we will look into two text mining applications: topic modeling, where we will discuss how text mining can be used to identify topics found in the text documents without reading them individually; and spam detection, where we will discuss how to automatically classify text documents into categories.
This chapter will cover the following topics:
- Introducing text mining
- Installing and working with Mallet
- Topic modeling
- Spam detection
Introducing text mining
Text mining, or text analytics, refers to the process of automatically extracting high-quality information from text documents, most often written in natural...