Building a text classifier
Classifier units are normally considered to separate a database into various classes. The Naive Bayes classifier scheme is widely considered in literature to segregate the texts based on the trained model. This section of the chapter initially considers a text database with keywords; feature extraction extracts the key phrases from the text and trains the classifier system. Then, term frequency-inverse document frequency (tf-idf) transformation is implemented to specify the importance of the word. Finally, the output is predicted and printed using the classifier system.
How to do it...
- Include the following lines in a new Python file to add datasets:
from sklearn.datasets import fetch_20newsgroups category_mapping = {'misc.forsale': 'Sellings', 'rec.motorcycles': 'Motorbikes', 'rec.sport.baseball': 'Baseball', 'sci.crypt': 'Cryptography', 'sci.space': 'OuterSpace'} training_content = fetch_20newsgroups(subset='train', categories=category_mapping...