Chapter 4: Introduction to convolutional networks
Activity 5: Sentiment Analysis on a real-life dataset
Solution:
- Import the necessary classes
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras import layers
from keras.preprocessing.sequence import pad_sequences
import numpy as np
import pandas as pd
- Define your variables and parameters.
epochs = 20
maxlen = 100
embedding_dim = 50
num_filters = 64
kernel_size = 5
batch_size = 32
- Import the data.
data = pd.read_csv('data/sentiment labelled sentences/yelp_labelled.txt',names=['sentence', 'label'], sep='\t')
data.head()
Printing this out on a Jupyter notebook should display:
Figure 4.27: Labelled dataset
- Select the 'sentence' and 'label' columns
sentences=data['sentence'].values
labels=data['label'].values
- Split your data into training and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
sentences, labels, test_size=0.30, random_state=1000)
- Tokenize
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(X_train)
X_train = tokenizer.texts_to_sequences(X_train)
X_test = tokenizer.texts_to_sequences(X_test)
vocab_size = len(tokenizer.word_index) + 1 #The vocabulary size has an additional 1 due to the 0 reserved index
- Pad in order to ensure that all sequences have the same length
X_train = pad_sequences(X_train, padding='post', maxlen=maxlen)
X_test = pad_sequences(X_test, padding='post', maxlen=maxlen)
- Create the model. Note that we use a sigmoid activation function on the last layer and the binary cross entropy for calculating loss. This is because we are doing a binary classification.
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model.add(layers.Conv1D(num_filters, kernel_size, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
The above code should yield
Figure 4.28: Model summary
The model can be visualized as follows as well:
Figure 4.29: Model visualization
- Train and test the model.
model.fit(X_train, y_train,
epochs=epochs,
verbose=False,
validation_data=(X_test, y_test),
batch_size=batch_size)
loss, accuracy = model.evaluate(X_train, y_train, verbose=False)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print("Testing Accuracy: {:.4f}".format(accuracy))
The accuracy output should be as follows: