The following function will take our training data of 25,000 lists of integers, where each list is a review. In return, it spits out one-hot encoded vectors for each of the integer lists it received from our training set. Then, we simply redefine our training and test features by using this function to transform our integer lists into a 2D tensor of one-hot encoded review vectors:
import numpy as np
def vectorize_features(features):
#Define the number of total words in our corpus
#make an empty 2D tensor of shape (25000, 12000)
dimension=12000
review_vectors=np.zeros((len(features), dimension))
#interate over each review
#set the indices of our empty tensor to 1s
for location, feature in enumerate(features):
review_vectors[location, feature]=1
return review_vectors
x_train = vectorize_features(x_train)
x_test = vectorize_features(x_test)
You can see the result of...