In this chapter, we'll first discuss what text mining is, what kind of analysis it is able to offer, and why you might want to use it in your application. We'll then discuss how to work with Mallet, a Java library for natural-language processing, covering data import and text pre-processing. Afterward, we will look into two text-mining applications: topic modeling, where we will discuss how text mining can be used to identify topics found in text documents without reading them individually, and spam detection, where we will discuss how to automatically classify text documents into categories.
This chapter will cover the following topics:
- Introducing text mining
- Installing and working with Mallet
- Topic modeling
- Spam detection