Dividing text using chunking
The chunking procedure can be used to divide the large text into small, meaningful words.
How to do it...
- Develop and import the following packages using Python:
import numpy as np from nltk.corpus import brown
- Describe a function that divides text into chunks:
# Split a text into chunks def splitter(content, num_of_words): words = content.split(' ') result = []
- Initialize the following programming lines to get the assigned variables:
current_count = 0 current_words = []
- Start the iteration using words:
for word in words: current_words.append(word) current_count += 1
- After getting the essential amount of words, reorganize the variables:
if current_count == num_of_words: result.append(' '.join(current_words)) current_words = [] current_count = 0
- Attach the chunks to the output variable:
result.append(' '.join(current_words)) return result
- Import the data of
Brown corpus
and consider the first10000...