Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Applied Unsupervised Learning with Python

You're reading from   Applied Unsupervised Learning with Python Discover hidden patterns and relationships in unstructured data with Python

Arrow left icon
Product type Paperback
Published in May 2019
Publisher
ISBN-13 9781789952292
Length 482 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (3):
Arrow left icon
Benjamin Johnston Benjamin Johnston
Author Profile Icon Benjamin Johnston
Benjamin Johnston
Christopher Kruger Christopher Kruger
Author Profile Icon Christopher Kruger
Christopher Kruger
Aaron Jones Aaron Jones
Author Profile Icon Aaron Jones
Aaron Jones
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Applied Unsupervised Learning with Python
Preface
1. Introduction to Clustering 2. Hierarchical Clustering FREE CHAPTER 3. Neighborhood Approaches and DBSCAN 4. Dimension Reduction and PCA 5. Autoencoders 6. t-Distributed Stochastic Neighbor Embedding (t-SNE) 7. Topic Modeling 8. Market Basket Analysis 9. Hotspot Analysis Appendix

Chapter 5: Autoencoders


Activity 8: Modeling Neurons with a ReLU Activation Function

Solution:

  1. Import numpy and matplotlib:

    import numpy as np
    import matplotlib.pyplot as plt
  2. Allow latex symbols to be used in labels:

    plt.rc('text', usetex=True)
  3. Define the ReLU activation function as a Python function:

    def relu(x):
        return np.max((0, x))
  4. Define the inputs (x) and tunable weights (theta) for the neuron. In this example, the inputs (x) will be 100 numbers linearly spaced between -5 and 5. Set theta = 1:

    theta = 1
    x = np.linspace(-5, 5, 100)
    x

    The output is as follows:

    Figure 5.35: Printing the inputs

  5. Compute the output (y):

    y = [relu(_x * theta) for _x in x]
  6. Plot the output of the neuron versus the input:

    fig = plt.figure(figsize=(10, 7))
    ax = fig.add_subplot(111)
    
    ax.plot(x, y)
    ax.set_xlabel('$x$', fontsize=22);
    ax.set_ylabel('$h(x\Theta)$', fontsize=22);
    ax.spines['left'].set_position(('data', 0));
    ax.spines['top'].set_visible(False);
    ax.spines['right'].set_visible(False);
    ax.tick_params(axis='both', which='major', labelsize=22)

    The output is as follows:

    Figure 5.36: Plot of the neuron versus input

  7. Now, set theta = 5 and recompute and store the output of the neuron:

    theta = 5
    y_2 = [relu(_x * theta) for _x in x]
  8. Now, set theta = 0.2 and recompute and store the output of the neuron:

    theta = 0.2
    y_3 = [relu(_x * theta) for _x in x]
  9. Plot the three different output curves of the neuron (theta = 1, theta = 5, theta = 0.2) on one graph:

    fig = plt.figure(figsize=(10, 7))
    ax = fig.add_subplot(111)
    
    ax.plot(x, y, label='$\Theta=1$');
    ax.plot(x, y_2, label='$\Theta=5$', linestyle=':');
    ax.plot(x, y_3, label='$\Theta=0.2$', linestyle='--');
    ax.set_xlabel('$x\Theta$', fontsize=22);
    ax.set_ylabel('$h(x\Theta)$', fontsize=22);
    ax.spines['left'].set_position(('data', 0));
    ax.spines['top'].set_visible(False);
    ax.spines['right'].set_visible(False);
    ax.tick_params(axis='both', which='major', labelsize=22);
    ax.legend(fontsize=22);

    The output is as follows:

    Figure 5.37: Three output curves of the neuron

In this activity, we created a model of a ReLU-based artificial neural network neuron. We can see that the output of this neuron is very different to the sigmoid activation function. There is no saturation region for values greater than 0 because it simply returns the input value of the function. In the negative direction, there is a saturation region where only 0 will be returned if the input is less than 0. The ReLU function is an extremely powerful and commonly used activation function that has shown to be more powerful than the sigmoid function in some circumstances. ReLU is often a good first-choice activation function.

Activity 9: MNIST Neural Network

Solution:

In this activity, you will train a neural network to identify images in the MNIST dataset and reinforce your skills in training neural networks:

  1. Import pickle, numpy, matplotlib, and the Sequential and Dense classes from Keras:

    import pickle
    import numpy as np
    import matplotlib.pyplot as plt
    from keras.models import Sequential
    from keras.layers import Dense
  2. Load the mnist.pkl file, which contains the first 10,000 images and corresponding labels from the MNIST dataset that are available in the accompanying source code. The MNIST dataset is a series of 28 x 28 grayscale images of handwritten digits 0 through 9. Extract the images and labels:

    with open('mnist.pkl', 'rb') as f:
        data = pickle.load(f)
        
    images = data['images']
    labels = data['labels']
  3. Plot the first 10 samples along with the corresponding labels:

    plt.figure(figsize=(10, 7))
    for i in range(10):
        plt.subplot(2, 5, i + 1)
        plt.imshow(images[i], cmap='gray')
        plt.title(labels[i])
        plt.axis('off')

    The output is as follows:

    Figure 5.38: First 10 samples

  4. Encode the labels using one hot encoding:

    one_hot_labels = np.zeros((images.shape[0], 10))
    
    for idx, label in enumerate(labels):
        one_hot_labels[idx, label] = 1
        
    one_hot_labels

    The output is as follows:

    Figure 5.39: Result of one hot encoding

  5. Prepare the images for input into a neural network. As a hint, there are two separate steps in this process:

    images = images.reshape((-1, 28 ** 2))
    images = images / 255.
  6. Construct a neural network model in Keras that accepts the prepared images, has a hidden layer of 600 units with a ReLU activation function, and an output of the same number of units as classes. The output layer uses a softmax activation function:

    model = Sequential([
        Dense(600, input_shape=(784,), activation='relu'),
        Dense(10, activation='softmax'),
    ])
  7. Compile the model using multiclass cross-entropy, stochastic gradient descent, and an accuracy performance metric:

    model.compile(loss='categorical_crossentropy',
                  optimizer='sgd',
                  metrics=['accuracy'])
  8. Train the model. How many epochs are required to achieve at least 95% classification accuracy on the training data? Let's have a look:

    model.fit(images, one_hot_labels, epochs=20)

    The output is as follows:

    Figure 5.40: Training the model

    15 epochs are required to achieve at least 95% classification accuracy on the training set.

In this example, we have measured the performance of the neural network classifier using the data that the classifier was trained with. In general, this method should not be used as it typically reports a higher level of accuracy than one should expect from the model. In supervised learning problems, there are a number of cross-validation techniques that should be used instead. As this is a book on unsupervised learning, cross-validation lies outside the scope of this book.

Activity 10: Simple MNIST Autoencoder

Solution:

  1. Import pickle, numpy, and matplotlib, and the Model, Input, and Dense classes from Keras:

    import pickle
    import numpy as np
    import matplotlib.pyplot as plt
    from keras.models import Model
    from keras.layers import Input, Dense
  2. Load the images from the supplied sample of the MNIST dataset that is provided with the accompanying source code (mnist.pkl):

    with open('mnist.pkl', 'rb') as f:
        images = pickle.load(f)['images']
  3. Prepare the images for input into a neural network. As a hint, there are two separate steps in this process:

    images = images.reshape((-1, 28 ** 2))
    images = images / 255.
  4. Construct a simple autoencoder network that reduces the image size to 10 x 10 after the encoding stage:

    input_stage = Input(shape=(784,))
    encoding_stage = Dense(100, activation='relu')(input_stage)
    decoding_stage = Dense(784, activation='sigmoid')(encoding_stage)
    autoencoder = Model(input_stage, decoding_stage)
  5. Compile the autoencoder using a binary cross-entropy loss function and adadelta gradient descent:

    autoencoder.compile(loss='binary_crossentropy',
                  optimizer='adadelta')
  6. Fit the encoder model:

    autoencoder.fit(images, images, epochs=100)

    The output is as follows:

    Figure 5.41: Training the model

  7. Calculate and store the output of the encoding stage for the first five samples:

    encoder_output = Model(input_stage, encoding_stage).predict(images[:5])
  8. Reshape the encoder output to 10 x 10 (10 x 10 = 100) pixels and multiply by 255:

    encoder_output = encoder_output.reshape((-1, 10, 10)) * 255
  9. Calculate and store the output of the decoding stage for the first five samples:

    decoder_output = autoencoder.predict(images[:5])
  10. Reshape the output of the decoder to 28 x 28 and multiply by 255:

    decoder_output = decoder_output.reshape((-1, 28, 28)) * 255
  11. Plot the original image, the encoder output, and the decoder:

    images = images.reshape((-1, 28, 28))
    plt.figure(figsize=(10, 7))
    for i in range(5):
        plt.subplot(3, 5, i + 1)
        plt.imshow(images[i], cmap='gray')
        plt.axis('off')
        plt.subplot(3, 5, i + 6)
        plt.imshow(encoder_output[i], cmap='gray')
        plt.axis('off')   
        
        plt.subplot(3, 5, i + 11)
        plt.imshow(decoder_output[i], cmap='gray')
        plt.axis('off')    

    The output is as follows:

    Figure 5.42: The original image, the encoder output, and the decoder

So far, we have shown how a simple single hidden layer in both the encoding and decoding stage can be used to reduce the data to a lower dimension space. We can also make this model more complicated by adding additional layers to both the encoding and the decoding stages.

Activity 11: MNIST Convolutional Autoencoder

Solution:

  1. Import pickle, numpy, matplotlib, and the Model class from keras.models and import Input, Conv2D, MaxPooling2D, and UpSampling2D from keras.layers:

    import pickle
    import numpy as np
    import matplotlib.pyplot as plt
    from keras.models import Model
    from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
  2. Load the data:

    with open('mnist.pkl', 'rb') as f:
        images = pickle.load(f)['images']
  3. Rescale the images to have values between 0 and 1:

    images = images / 255.
  4. We need to reshape the images to add a single depth channel for use with convolutional stages. Reshape the images to have a shape of 28 x 28 x 1:

    images = images.reshape((-1, 28, 28, 1))
  5. Define an input layer. We will use the same shape input as an image:

    input_layer = Input(shape=(28, 28, 1,))
  6. Add a convolutional stage with 16 layers or filters, a 3 x 3 weight matrix, a ReLU activation function, and using same padding, which means the output has the same length as the input image:

    hidden_encoding = Conv2D(
        16, # Number of layers or filters in the weight matrix
        (3, 3), # Shape of the weight matrix
        activation='relu',
        padding='same', # How to apply the weights to the images
    )(input_layer)
  7. Add a max pooling layer to the encoder with a 2 x 2 kernel:

    encoded = MaxPooling2D((2, 2))(hidden_encoding)
  8. Add a decoding convolutional layer:

    hidden_decoding = Conv2D(
        16, # Number of layers or filters in the weight matrix
        (3, 3), # Shape of the weight matrix
        activation='relu',
        padding='same', # How to apply the weights to the images
    )(encoded)
  9. Add an upsampling layer:

    upsample_decoding = UpSampling2D((2, 2))(hidden_decoding)
  10. Add the final convolutional stage, using one layer as per the initial image depth:

    decoded = Conv2D(
        1, # Number of layers or filters in the weight matrix
        (3, 3), # Shape of the weight matrix
        activation='sigmoid',
        padding='same', # How to apply the weights to the images
    )(upsample_decoding)
  11. Construct the model by passing the first and last layers of the network to the Model class:

    autoencoder = Model(input_layer, decoded)
  12. Display the structure of the model:

    autoencoder.summary()

    The output is as follows:

    Figure 5.43: Structure of model

  13. Compile the autoencoder using a binary cross-entropy loss function and adadelta gradient descent:

    autoencoder.compile(loss='binary_crossentropy',
                  optimizer='adadelta')
  14. Now, let's fit the model; again, we pass the images as the training data and as the desired output. Train for 20 epochs as convolutional networks take a lot longer to compute:

    autoencoder.fit(images, images, epochs=20)

    The output is as follows:

    Figure 5.44: Training the model

  15. Calculate and store the output of the encoding stage for the first five samples:

    encoder_output = Model(input_layer, encoded).predict(images[:5])
  16. Reshape the encoder output for visualization, where each image is X*Y in size:

    encoder_output = encoder_output.reshape((-1, 14 * 14, 16))
  17. Get the output of the decoder for the first five images:

    decoder_output = autoencoder.predict(images[:5])
  18. Reshape the decoder output to 28 x 28 in size:

    decoder_output = decoder_output.reshape((-1, 28, 28))
  19. Reshape the original images back to 28 x 28 in size:

    images = images.reshape((-1, 28, 28))
  20. Plot the original image, the mean encoder output, and the decoder:

    plt.figure(figsize=(10, 7))
    for i in range(5):
        plt.subplot(3, 5, i + 1)
        plt.imshow(images[i], cmap='gray')
        plt.axis('off')
        
        plt.subplot(3, 5, i + 6)
        plt.imshow(encoder_output[i], cmap='gray')
        plt.axis('off')   
        
        plt.subplot(3, 5, i + 11)
        plt.imshow(decoder_output[i], cmap='gray')
        plt.axis('off')        

    The output is as follows:

    Figure 5.45: The original image, the encoder output, and the decoder

At the end of this activity, you will have developed an autoencoder comprising convolutional layers within the neural network. Note the improvements made in the decoder representations. This architecture has a significant performance benefit over fully-connected neural network layers and is extremely useful in working with image-based datasets and generating artificial data samples.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime