Getting useful features from models
One question you may ask is, what are the best features for determining if a tweet is relevant or not? We can extract this information from our Naive Bayes model and find out which features are the best individually, according to Naive Bayes.
First, we fit a new model. While the cross_val_score
gives us a score across different folds of cross-validated testing data, it doesn't easily give us the trained models themselves. To do this, we simply fit our pipeline with the tweets, creating a new model. The code is as follows:
model = pipeline.fit(tweets, labels)
Note
Note that we aren't really evaluating the model here, so we don't need to be as careful with the training/testing split. However, before you put these features into practice, you should evaluate on a separate test split. We skip over that here for the sake of clarity.
A pipeline gives you access to the individual steps through the named_steps
attribute and the name of the step (we defined these names...