2. Basic Feature Extraction Methods
Activity 2: Extracting General Features from Text
Solution
Let's extract general features from the given text. Follow these steps to implement this activity:
- Open a Jupyter notebook.
- Insert a new cell and add the following code to import the necessary libraries:
import pandas as pd from string import punctuation import nltk nltk.download('tagsets') from nltk.data import load nltk.download('averaged_perceptron_tagger') from nltk import pos_tag from nltk import word_tokenize from collections import Counter
- Now let's see what different kinds of PoS nltk provides. Add the following code to do this:
tagdict = load('help/tagsets/upenn_tagset.pickle') list(tagdict.keys())
The code generates the following output:
Figure 2.54: List of PoS
- The number of occurrences of each PoS is calculated by iterating through each document and annotating each word with the corresponding
pos
tag. Add the following...