The Java core SDK does not provide ready techniques for detecting Part-Of-Speech (POS). This necessitates using specialized NLP APIs. Tags are an important part of identifying POS. A tag is typically an abbreviation such as NN, which specifies that the corresponding word is a noun. There are different sets of tags, which vary somewhat by API. We will reference these lists as they are encountered.
In this chapter, we will cover the following recipes:
- Finding POS using tagging
- Using a chunker to find POS
- Using a tag dictionary
- Finding POS using the Penn Treebank
- Finding POS from textese
- Using a pipeline to perform tagging
- Using a hidden Markov model to perform POS
- Training a specialized POS model