Making Predictions on Unseen Data
Now that you've trained your model on some data and assessed its performance on the test data, the next thing is to learn how to use this model to predict the sentiment for new data. That is the purpose of the model, after all – being able to predict the sentiment for data previously unseen by the model. Essentially, for any new review in the form of raw text, we should be able to classify its sentiment.
The key step for this would be to create a process/pipeline that converts the raw text into a format the predictive model understands. This would mean that the new text would need to undergo exactly the same preprocessing steps that were performed on the text data that was used to train the model. The function for preprocessing needs to return formatted text for any input raw text. The complexity of this function depends on the steps performed on the train data. If tokenization was the only preprocessing step performed, then the function...