Given a set of text documents and a set of predefined categories, the objective of text categorization is to assign each document to a category. The output can be a soft assignment or a hard assignment, depending on the problem. Soft assignment means that the category assignment is defined as a probability distribution over all categories.
There are a wide range of applications of text categorization in industry. The following are a few examples:
- Spam filtering: Given an email, classify it as spam or legitimate email.
- Sentiment classification: Given a review text (movie review, product review), identify the user polarity—whether its a positive or negative or neural review.
- Problem ticket assignment: Typically, in any industry, whenever a user faces an issue regarding any IT application or a software/hardware product, the fist step is to create a problem...