1. Introduction to Natural Language Processing
Activity 1.01: Preprocessing of Raw Text
Solution
Let's perform preprocessing on a text corpus. To complete this activity, follow these steps:
- Open a Jupyter Notebook.
- Insert a new cell and add the following code to import the necessary libraries:
from nltk import download download('stopwords') download('wordnet') nltk.download('punkt') download('averaged_perceptron_tagger') from nltk import word_tokenize from nltk.stem.wordnet import WordNetLemmatizer from nltk.corpus import stopwords from autocorrect import Speller from nltk.wsd import lesk from nltk.tokenize import sent_tokenize from nltk import stem, pos_tag import string
- Read the content of
file.txt
and store it in a variable namedsentence
. Insert a new cell and add the following code to implement this:#load the text file into variable called sentence sentence = open("../data/file.txt", 'r').read...