Answer to question 1: We have seen that our trained model performs pretty well on the test set with an accuracy of 87%. Now, if we see the model versus iteration score and other parameters from the following graph, then we can see that our model was not overfitted:
Now, for the sentiment labeled sentences, the trained model did not perform well. There could be several reasons for that. For example, our model is trained with only the movie review dataset, but here, we try to force our model to perform on different types of datasets too, for example, Amazon and Yelp. Nevertheless, we have not tuned the hyperparameters carefully.
Answer to question 2: Yes, in fact, this will be very helpful. For this, we have to make sure that our programming environment is ready. In other words...