Naïve Bayes and text mining
The extraction of the most relevant features to build a model relies on discovery and data mining. For many applications, the data available to the scientist is unstructured text. The multinomial Naïve Bayes classifier is particularly suited for text mining.
The Naïve Bayes formula is quite effective to classify the following entities:
E-mail spams
Business news stories
Movie reviews
Technical papers per field of expertise
This third use case consists of predicting the direction of a stock given the financial news. There are two types of news that affects the stock of a particular company:
Macro trends: This consists of the economic or social news such as conflicts, economic trends, or labor market statistics
Micro updates: This consists of the financial or market news related to this specific company such as earnings, change in ownership, or press releases
Micro-economic news related to a specific company has the potential to affect the sentiment of investors toward...