CRF and text analytics
Most of the examples used to demonstrate the capabilities of conditional random fields are related to text mining, intrusion detection, or bioinformatics. Although these applications have a great commercial merit, they are not suitable as an introductory test case because they usually require a lengthy description of the model and the training process.
The feature functions model
For our example, we will select a simple problem: how to collect and aggregate an analyst's recommendation on any given stock from different sources with different formats.
Analysts at brokerage firms and investment funds routinely publish the list of recommendations or rating for any stock. These analysts used different rating schemes from buy/hold/sell; A, B, C rating; and stars rating to market perform/neutral/market underperform. For this example, the rating is normalized as follows:
- 0 for a strong sell, (or F or 1 star rating)
- 1 for sell (D, 2 stars, marker underperform)
- 2 for neutral...