Performing rule-based text classification using keywords
In this recipe, we will use the keywords to classify the business and sport data. We will create a classifier with keywords that we will choose by ourselves from the frequency distributions from the previous recipe.
Getting ready
We will continue using classes from the sklearn
, numpy
, and nltk
packages that we used in the previous recipe.
How to do it…
In this recipe, we will use hand-picked business and sport vocabulary to create a keyword classifier that we will evaluate using the same method as the dummy classifier in the previous recipe. The steps for this recipe are as follows:
- Do the necessary imports:
import numpy as np import string from sklearn import preprocessing from sklearn.metrics import classification_report from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import CountVectorizer from itertools import repeat from nltk.probability import FreqDist...