Creating a categorized chunk corpus reader
NLTK provides a CategorizedPlaintextCorpusReader
and CategorizedTaggedCorpusReader
class, but there's no categorized corpus reader for chunked corpora. So in this recipe, we're going to make one.
Getting ready
Refer to the earlier recipe, Creating a chunked phrase corpus, for an explanation of ChunkedCorpusReader
, and refer to the previous recipe for details on CategorizedPlaintextCorpusReader
and CategorizedTaggedCorpusReader
, both of which inherit from CategorizedCorpusReader
.
How to do it...
We'll create a class called CategorizedChunkedCorpusReader
that inherits from both CategorizedCorpusReader
and ChunkedCorpusReader
. It is heavily based on the CategorizedTaggedCorpusReader
class, and also provides three additional methods for getting categorized chunks. The following code is found in catchunked.py
:
from nltk.corpus.reader import CategorizedCorpusReader, ChunkedCorpusReader class CategorizedChunkedCorpusReader(CategorizedCorpusReader, ChunkedCorpusReader...