1. Introduction to Natural Language Processing
Activity 1: Preprocessing of Raw Text
Solution
Let's perform preprocessing on a text corpus. To implement this activity, follow these steps:
- Open a Jupyter notebook.
- Insert a new cell and add the following code to import the necessary libraries:
import nltk nltk.download('punkt') nltk.download('averaged_perceptron_tagger') nltk.download('stopwords') nltk.download('wordnet') from nltk import word_tokenize from nltk.stem.wordnet import WordNetLemmatizer from nltk.corpus import stopwords from autocorrect import spell from nltk.wsd import lesk from nltk.tokenize import sent_tokenize import string
- Read the content of
file.txt
and store it in a variable named "sentence
". Insert a new cell and add the following code to implement this:sentence = open("data_ch1/file.txt", 'r').read()
- Apply tokenization on the given text corpus. Insert a new cell...