Partial parsing with regular expressions
So far, we've only been parsing noun phrases. But RegexpParser
supports grammars with multiple phrase types, such as verb phrases and prepositional phrases. We can put the rules we've learned to use and define a grammar that can be evaluated against the conll2000
corpus, which has NP
, VP
, and PP
phrases.
How to do it...
Now, we will define a grammar to parse three phrase types. For noun phrases, we have a ChunkRule
class that looks for an optional determiner followed by one or more nouns. We then have a MergeRule
class for adding an adjective to the front of a noun chunk. For prepositional phrases, we simply chunk any IN
word, such as in
or on
. For verb phrases, we chunk an optional modal word (such as should
) followed by a verb.
Note
Each grammar rule is followed by a #
comment. This comment is passed into each rule as the description. Comments are optional, but they can be helpful notes for understanding what the rule does, and will be included in trace...