Chapter 6. Retrieving Information from Text Data
In this chapter, we will cover the following recipes:
- Detecting tokens (words) using Java
- Detecting sentences using Java
- Detecting tokens (words) and sentences using OpenNLP
- Retrieving lemma and part-of-speech and recognizing named entities from tokens using Stanford CoreNLP
- Measuring text similarity with Cosine Similarity measure using Java 8
- Extracting topics from text documents using Mallet
- Classifying text documents using Mallet
- Classifying text documents using Weka