Chunking refers to dividing the input text into pieces, which are based on any random condition. This is different from tokenization in the sense that there are no constraints, and the chunks do not need to be meaningful at all. This is used very frequently during text analysis. While dealing with large text documents, it's better to do it in chunks.
Dividing text using chunking
How to do it...
Let's look at how to divide text by using chunking:
- Create a new Python file and import the following packages (the full code is in the chunking.py file that's already been provided to you):
import numpy as np
nltk.download('brown') from nltk.corpus import brown
- Let's define a function to split the...