Chapter 5: Autoencoders
Activity 8: Modeling Neurons with a ReLU Activation Function
Solution:
Import numpy and matplotlib:
import numpy as np import matplotlib.pyplot as plt
Allow latex symbols to be used in labels:
plt.rc('text', usetex=True)
Define the ReLU activation function as a Python function:
def relu(x): return np.max((0, x))
Define the inputs (x) and tunable weights (theta) for the neuron. In this example, the inputs (x) will be 100 numbers linearly spaced between -5 and 5. Set theta = 1:
theta = 1 x = np.linspace(-5, 5, 100) x
The output is as follows:
Compute the output (y):
y = [relu(_x * theta) for _x in x]
Plot the output of the neuron versus the input:
fig = plt.figure(figsize=(10, 7)) ax = fig.add_subplot(111) ax.plot(x, y) ax.set_xlabel('$x$', fontsize=22); ax.set_ylabel('$h(x\Theta)$', fontsize=22); ax.spines['left'].set_position(('data', 0)); ax.spines['top'].set_visible(False); ax.spines['right'].set_visible(False); ax.tick_params(axis='both', which='major', labelsize=22)
The output is as follows:
Now, set theta = 5 and recompute and store the output of the neuron:
theta = 5 y_2 = [relu(_x * theta) for _x in x]
Now, set theta = 0.2 and recompute and store the output of the neuron:
theta = 0.2 y_3 = [relu(_x * theta) for _x in x]
Plot the three different output curves of the neuron (theta = 1, theta = 5, theta = 0.2) on one graph:
fig = plt.figure(figsize=(10, 7)) ax = fig.add_subplot(111) ax.plot(x, y, label='$\Theta=1$'); ax.plot(x, y_2, label='$\Theta=5$', linestyle=':'); ax.plot(x, y_3, label='$\Theta=0.2$', linestyle='--'); ax.set_xlabel('$x\Theta$', fontsize=22); ax.set_ylabel('$h(x\Theta)$', fontsize=22); ax.spines['left'].set_position(('data', 0)); ax.spines['top'].set_visible(False); ax.spines['right'].set_visible(False); ax.tick_params(axis='both', which='major', labelsize=22); ax.legend(fontsize=22);
The output is as follows:
In this activity, we created a model of a ReLU-based artificial neural network neuron. We can see that the output of this neuron is very different to the sigmoid activation function. There is no saturation region for values greater than 0 because it simply returns the input value of the function. In the negative direction, there is a saturation region where only 0 will be returned if the input is less than 0. The ReLU function is an extremely powerful and commonly used activation function that has shown to be more powerful than the sigmoid function in some circumstances. ReLU is often a good first-choice activation function.
Activity 9: MNIST Neural Network
Solution:
In this activity, you will train a neural network to identify images in the MNIST dataset and reinforce your skills in training neural networks:
Import pickle, numpy, matplotlib, and the Sequential and Dense classes from Keras:
import pickle import numpy as np import matplotlib.pyplot as plt from keras.models import Sequential from keras.layers import Dense
Load the mnist.pkl file, which contains the first 10,000 images and corresponding labels from the MNIST dataset that are available in the accompanying source code. The MNIST dataset is a series of 28 x 28 grayscale images of handwritten digits 0 through 9. Extract the images and labels:
with open('mnist.pkl', 'rb') as f: data = pickle.load(f) images = data['images'] labels = data['labels']
Plot the first 10 samples along with the corresponding labels:
plt.figure(figsize=(10, 7)) for i in range(10): plt.subplot(2, 5, i + 1) plt.imshow(images[i], cmap='gray') plt.title(labels[i]) plt.axis('off')
The output is as follows:
Encode the labels using one hot encoding:
one_hot_labels = np.zeros((images.shape[0], 10)) for idx, label in enumerate(labels): one_hot_labels[idx, label] = 1 one_hot_labels
The output is as follows:
Prepare the images for input into a neural network. As a hint, there are two separate steps in this process:
images = images.reshape((-1, 28 ** 2)) images = images / 255.
Construct a neural network model in Keras that accepts the prepared images, has a hidden layer of 600 units with a ReLU activation function, and an output of the same number of units as classes. The output layer uses a softmax activation function:
model = Sequential([ Dense(600, input_shape=(784,), activation='relu'), Dense(10, activation='softmax'), ])
Compile the model using multiclass cross-entropy, stochastic gradient descent, and an accuracy performance metric:
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
Train the model. How many epochs are required to achieve at least 95% classification accuracy on the training data? Let's have a look:
model.fit(images, one_hot_labels, epochs=20)
The output is as follows:
15 epochs are required to achieve at least 95% classification accuracy on the training set.
In this example, we have measured the performance of the neural network classifier using the data that the classifier was trained with. In general, this method should not be used as it typically reports a higher level of accuracy than one should expect from the model. In supervised learning problems, there are a number of cross-validation techniques that should be used instead. As this is a book on unsupervised learning, cross-validation lies outside the scope of this book.
Activity 10: Simple MNIST Autoencoder
Solution:
Import pickle, numpy, and matplotlib, and the Model, Input, and Dense classes from Keras:
import pickle import numpy as np import matplotlib.pyplot as plt from keras.models import Model from keras.layers import Input, Dense
Load the images from the supplied sample of the MNIST dataset that is provided with the accompanying source code (mnist.pkl):
with open('mnist.pkl', 'rb') as f: images = pickle.load(f)['images']
Prepare the images for input into a neural network. As a hint, there are two separate steps in this process:
images = images.reshape((-1, 28 ** 2)) images = images / 255.
Construct a simple autoencoder network that reduces the image size to 10 x 10 after the encoding stage:
input_stage = Input(shape=(784,)) encoding_stage = Dense(100, activation='relu')(input_stage) decoding_stage = Dense(784, activation='sigmoid')(encoding_stage) autoencoder = Model(input_stage, decoding_stage)
Compile the autoencoder using a binary cross-entropy loss function and adadelta gradient descent:
autoencoder.compile(loss='binary_crossentropy', optimizer='adadelta')
Fit the encoder model:
autoencoder.fit(images, images, epochs=100)
The output is as follows:
Calculate and store the output of the encoding stage for the first five samples:
encoder_output = Model(input_stage, encoding_stage).predict(images[:5])
Reshape the encoder output to 10 x 10 (10 x 10 = 100) pixels and multiply by 255:
encoder_output = encoder_output.reshape((-1, 10, 10)) * 255
Calculate and store the output of the decoding stage for the first five samples:
decoder_output = autoencoder.predict(images[:5])
Reshape the output of the decoder to 28 x 28 and multiply by 255:
decoder_output = decoder_output.reshape((-1, 28, 28)) * 255
Plot the original image, the encoder output, and the decoder:
images = images.reshape((-1, 28, 28)) plt.figure(figsize=(10, 7)) for i in range(5): plt.subplot(3, 5, i + 1) plt.imshow(images[i], cmap='gray') plt.axis('off') plt.subplot(3, 5, i + 6) plt.imshow(encoder_output[i], cmap='gray') plt.axis('off') plt.subplot(3, 5, i + 11) plt.imshow(decoder_output[i], cmap='gray') plt.axis('off')
The output is as follows:
So far, we have shown how a simple single hidden layer in both the encoding and decoding stage can be used to reduce the data to a lower dimension space. We can also make this model more complicated by adding additional layers to both the encoding and the decoding stages.
Activity 11: MNIST Convolutional Autoencoder
Solution:
Import pickle, numpy, matplotlib, and the Model class from keras.models and import Input, Conv2D, MaxPooling2D, and UpSampling2D from keras.layers:
import pickle import numpy as np import matplotlib.pyplot as plt from keras.models import Model from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
Load the data:
with open('mnist.pkl', 'rb') as f: images = pickle.load(f)['images']
Rescale the images to have values between 0 and 1:
images = images / 255.
We need to reshape the images to add a single depth channel for use with convolutional stages. Reshape the images to have a shape of 28 x 28 x 1:
images = images.reshape((-1, 28, 28, 1))
Define an input layer. We will use the same shape input as an image:
input_layer = Input(shape=(28, 28, 1,))
Add a convolutional stage with 16 layers or filters, a 3 x 3 weight matrix, a ReLU activation function, and using same padding, which means the output has the same length as the input image:
hidden_encoding = Conv2D( 16, # Number of layers or filters in the weight matrix (3, 3), # Shape of the weight matrix activation='relu', padding='same', # How to apply the weights to the images )(input_layer)
Add a max pooling layer to the encoder with a 2 x 2 kernel:
encoded = MaxPooling2D((2, 2))(hidden_encoding)
Add a decoding convolutional layer:
hidden_decoding = Conv2D( 16, # Number of layers or filters in the weight matrix (3, 3), # Shape of the weight matrix activation='relu', padding='same', # How to apply the weights to the images )(encoded)
Add an upsampling layer:
upsample_decoding = UpSampling2D((2, 2))(hidden_decoding)
Add the final convolutional stage, using one layer as per the initial image depth:
decoded = Conv2D( 1, # Number of layers or filters in the weight matrix (3, 3), # Shape of the weight matrix activation='sigmoid', padding='same', # How to apply the weights to the images )(upsample_decoding)
Construct the model by passing the first and last layers of the network to the Model class:
autoencoder = Model(input_layer, decoded)
Display the structure of the model:
autoencoder.summary()
The output is as follows:
Compile the autoencoder using a binary cross-entropy loss function and adadelta gradient descent:
autoencoder.compile(loss='binary_crossentropy', optimizer='adadelta')
Now, let's fit the model; again, we pass the images as the training data and as the desired output. Train for 20 epochs as convolutional networks take a lot longer to compute:
autoencoder.fit(images, images, epochs=20)
The output is as follows:
Calculate and store the output of the encoding stage for the first five samples:
encoder_output = Model(input_layer, encoded).predict(images[:5])
Reshape the encoder output for visualization, where each image is X*Y in size:
encoder_output = encoder_output.reshape((-1, 14 * 14, 16))
Get the output of the decoder for the first five images:
decoder_output = autoencoder.predict(images[:5])
Reshape the decoder output to 28 x 28 in size:
decoder_output = decoder_output.reshape((-1, 28, 28))
Reshape the original images back to 28 x 28 in size:
images = images.reshape((-1, 28, 28))
Plot the original image, the mean encoder output, and the decoder:
plt.figure(figsize=(10, 7)) for i in range(5): plt.subplot(3, 5, i + 1) plt.imshow(images[i], cmap='gray') plt.axis('off') plt.subplot(3, 5, i + 6) plt.imshow(encoder_output[i], cmap='gray') plt.axis('off') plt.subplot(3, 5, i + 11) plt.imshow(decoder_output[i], cmap='gray') plt.axis('off')
The output is as follows:
At the end of this activity, you will have developed an autoencoder comprising convolutional layers within the neural network. Note the improvements made in the decoder representations. This architecture has a significant performance benefit over fully-connected neural network layers and is extremely useful in working with image-based datasets and generating artificial data samples.