NLP for trading
Once text data has been converted into numerical features using the NLP techniques discussed in the previous sections, text classification works just like any other classification task.
In this section, we will apply these preprocessing techniques to news articles, product reviews, and Twitter data and teach various classifiers to predict discrete news categories, review scores, and sentiment polarity.
First, we will introduce the naive Bayes model, a probabilistic classification algorithm that works well with the text features produced by a bag-of-words model.
The code samples for this section are in the notebook news_text_classification
.
The naive Bayes classifier
The naive Bayes algorithm is very popular for text classification because its low computational cost and memory requirements facilitate training on very large, high-dimensional datasets. Its predictive performance can compete with more complex models, provides a good baseline, and...