The Penn Treebank published a set of English POS tags used by many taggers. We will be using the Stanford NLP API to demonstrate how this set of tags can be used to find POS elements in text. We will be using a Penn Treebank tag set file, wsj-0-18-bidirectional-distsim.tagger, for this recipe. It has been trained on a series of Wall Street Journal articles.
Finding POS using the Penn Treebank
Getting ready
To prepare, we need to do the following:
- Create a new Maven project.
- Download the following JAR files:
- stanford-corenlp-full-2018-10-05.zip: It can be found at https://stanfordnlp.github.io/CoreNLP/download.html. Extract the file, stanford-corenlp-3.9.2.jar.
- stanford-postagger-2018-10-16.zip: It can be found at https:...