Chapter 5. Extracting Chunks
In this chapter, we will cover the following recipes:
- Chunking and chinking with regular expressions
- Merging and splitting chunks with regular expressions
- Expanding and removing chunks with regular expressions
- Partial parsing with regular expressions
- Training a tagger-based chunker
- Classification-based chunking
- Extracting named entities
- Extracting proper noun chunks
- Extracting location chunks
- Training a named entity chunker
- Training a chunker with NLTK-Trainer
Introduction
Chunk extraction, or partial parsing, is the process of extracting short phrases from a part-of-speech tagged sentence. This is different from full parsing in that we're interested in standalone chunks, or phrases, instead of full parse trees (for more on parse trees, see https://en.wikipedia.org/wiki/Parse_tree). The idea is that meaningful phrases can be extracted from a sentence by looking for particular patterns of part-of-speech tags.
As in Chapter 4, Part-of-speech Tagging, we&apos...