First, let's define the problem. Given a movie review (raw text), we have to classify that movie review as either positive or negative based on the words it contains, that is, sentiment. We do this by combining the Word2Vec model and LSTM: each word in a review is vectorized using the Word2Vec model and fed into an LSTM net. As stated earlier, we will train data in the Large Movie Review dataset. Now, here is the workflow of the overall project:
- First, we download the movie/product reviews dataset
- Then we create or reuse an existing Word2Vec model (for example, Google News word vectors)
- Then we load each review text and convert words to vectors and reviews to sequences of vectors
- Then we create and train the LSTM network
- Then we save the trained model
- Then we evaluate the model on the test set
- Then we restore the trained model and...