Chapter 5. Classification – Detecting Poor Answers
Now that we are able to extract useful features from text, we can take on the challenge of building a classifier using real data. Let's go back to our imaginary website in Chapter 3, Clustering – Finding Related Posts, where users can submit questions and get them answered.
A continuous challenge for owners of these Q&A sites is to maintain a decent level of quality in the posted content. Websites such as stackoverflow.com take considerable efforts to encourage users to score questions and answers with badges and bonus points. Higher quality content is the result, as users are trying to spend more energy on carving out the question or crafting a possible answer.
One particular successful incentive is the possibility for the asker to flag one answer to their question as the accepted answer (again, there are incentives for the asker to flag such answers). This will result in more score points for the author of the flagged answer.
Would it...