Applying of Naive Bayes
We will now create a pipeline that takes a tweet and determines whether it is relevant or not, based only on the content of that tweet.
To perform the word extraction, we will be using the spaCy, a library that contains a large number of tools for performing analysis on natural language. We will use spaCy in future chapters as well.
Note
To get spaCy on your computer, use pip to install the package: pip install spacy
If that doesn't work, see the spaCy installation instructions at https://spacy.io/ for information specific to your platform.
We are going to create a pipeline to extract the word features and classify the tweets using Naive Bayes. Our pipeline has the following steps:
- Transform the original text documents into a dictionary of counts using spaCy's word tokenization.
- Transform those dictionaries into a vector matrix using the
transformer in scikit-learn. This is necessary to enable the Naive Bayes classifier to read the feature values extracted...DictVectorizer