Training Sentiment Models
The end product of any sentiment analysis project is a sentiment model. This is an object containing a stored representation of the data on which it was trained. Such a model has the ability to predict sentiment values for text that it has not seen before. To develop a sentiment analysis model, the following steps should be taken:
- The document dataset must be split into train and test datasets. The test dataset is normally a fraction of the overall dataset. It is usually between 5% and 40% of the overall dataset, depending on the total number of examples available. If the amount of data is too large, then a smaller test dataset can be used.
- Next, the text should be preprocessed by stripping unwanted characters, removing stop words, and performing other common preprocessing steps.
- The text should be converted to numeric vector representations in order to extract the features. These representations are used for training machine learning models...