Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
TensorFlow 2.0 Computer Vision Cookbook

You're reading from   TensorFlow 2.0 Computer Vision Cookbook Implement machine learning solutions to overcome various computer vision challenges

Arrow left icon
Product type Paperback
Published in Feb 2021
Publisher Packt
ISBN-13 9781838829131
Length 542 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Jesús Martínez Jesús Martínez
Author Profile Icon Jesús Martínez
Jesús Martínez
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Chapter 1: Getting Started with TensorFlow 2.x for Computer Vision 2. Chapter 2: Performing Image Classification FREE CHAPTER 3. Chapter 3: Harnessing the Power of Pre-Trained Networks with Transfer Learning 4. Chapter 4: Enhancing and Styling Images with DeepDream, Neural Style Transfer, and Image Super-Resolution 5. Chapter 5: Reducing Noise with Autoencoders 6. Chapter 6: Generative Models and Adversarial Attacks 7. Chapter 7: Captioning Images with CNNs and RNNs 8. Chapter 8: Fine-Grained Understanding of Images through Segmentation 9. Chapter 9: Localizing Elements in Images with Object Detection 10. Chapter 10: Applying the Power of Deep Learning to Videos 11. Chapter 11: Streamlining Network Implementation with AutoML 12. Chapter 12: Boosting Performance 13. Other Books You May Enjoy

Creating a multi-class classifier to play rock paper scissors

More often than not, we are interested in categorizing an image into more than two classes. As we'll see in this recipe, implementing a neural network to differentiate between many categories is fairly straightforward, and what better way to demonstrate this than by training a model that can play the widely known Rock Paper Scissors game?

Are you ready? Let's dive in!

Getting ready

We'll use the Rock-Paper-Scissors Images dataset, which is hosted on Kaggle at the following location: https://www.kaggle.com/drgfreeman/rockpaperscissors. To download it, you'll need a Kaggle account, so sign in or sign up accordingly. Then, unzip the dataset in a location of your preference. In this recipe, we assume the unzipped folder is inside the ~/.keras/datasets directory, under the name rockpaperscissors.

Here are some sample images:

Figure 2.2 – Example images of rock (left), paper (center), and scissors (right)

Figure 2.2 – Example images of rock (left), paper (center), and scissors (right)

Let's begin implementing.

How to do it…

The following steps explain how to train a multi-class Convolutional Neural Network (CNN) to distinguish between the three classes of the Rock Paper Scissors game:

  1. Import the required packages:
    import os
    import pathlib
    import glob
    import numpy as np
    import tensorflow as tf
    from sklearn.model_selection import train_test_split
    from tensorflow.keras import Model
    from tensorflow.keras.layers import *
    from tensorflow.keras.losses import CategoricalCrossentropy
  2. Define a list with the three classes, and also an alias to tf.data.experimental.AUTOTUNE, which we'll use later:
    CLASSES = ['rock', 'paper', 'scissors']
    AUTOTUNE = tf.data.experimental.AUTOTUNE

    The values in CLASSES match the names of the directories that contain the images for each class.

  3. Define a function to load an image and its label, given its file path:
    def load_image_and_label(image_path, target_size=(32, 32)):
        image = tf.io.read_file(image_path)
        image = tf.image.decode_jpeg(image, channels=3)
        image = tf.image.rgb_to_grayscale(image)
        image = tf.image.convert_image_dtype(image, 
                                             np.float32)
        image = tf.image.resize(image, target_size)
        label = tf.strings.split(image_path,os.path.sep)[-2]
        label = (label == CLASSES)  # One-hot encode.
        label = tf.dtypes.cast(label, tf.float32)
        return image, label

    Notice that we are one-hot encoding by comparing the name of the folder that contains the image (extracted from image_path) with the CLASSES list.

  4. Define a function to build the network architecture. In this case, it's a very simple and shallow one, which is enough for the problem we are solving:
    def build_network():
        input_layer = Input(shape=(32, 32, 1))
        x = Conv2D(filters=32,
                   kernel_size=(3, 3),
                   padding='same',
                   strides=(1, 1))(input_layer)
        x = ReLU()(x)
        x = Dropout(rate=0.5)(x)
        x = Flatten()(x)
        x = Dense(units=3)(x)
        output = Softmax()(x)
        return Model(inputs=input_layer, outputs=output)
  5. Define a function to, given a path to a dataset, return a tf.data.Dataset instance of images and labels, in batches and optionally shuffled:
    def prepare_dataset(dataset_path,
                        buffer_size,
                        batch_size,
                        shuffle=True):
        dataset = (tf.data.Dataset
                   .from_tensor_slices(dataset_path)
                   .map(load_image_and_label,
                        num_parallel_calls=AUTOTUNE))
        if shuffle:
            dataset.shuffle(buffer_size=buffer_size)
        dataset = (dataset
                   .batch(batch_size=batch_size)
                   .prefetch(buffer_size=buffer_size))
        return dataset
  6. Load the image paths into a list:
    file_patten = (pathlib.Path.home() / '.keras' / 
                   'datasets' /
                   'rockpaperscissors' / 'rps-cv-images' / 
                     '*' /
                   '*.png')
    file_pattern = str(file_patten)
    dataset_paths = [*glob.glob(file_pattern)]
  7. Create train, test, and validation subsets of image paths:
    train_paths, test_paths = train_test_split(dataset_paths,
                                              test_size=0.2,
                                            random_state=999)
    train_paths, val_paths = train_test_split(train_paths,
                                          test_size=0.2,
                                         random_state=999)
  8. Prepare the training, test, and validation datasets:
    BATCH_SIZE = 1024
    BUFFER_SIZE = 1024
    train_dataset = prepare_dataset(train_paths,
                                  buffer_size=BUFFER_SIZE,
                                    batch_size=BATCH_SIZE)
    validation_dataset = prepare_dataset(val_paths,
                                  buffer_size=BUFFER_SIZE,
                                   batch_size=BATCH_SIZE,
                                    shuffle=False)
    test_dataset = prepare_dataset(test_paths,
                                  buffer_size=BUFFER_SIZE,
                                   batch_size=BATCH_SIZE,
                                   shuffle=False)
  9. Instantiate and compile the model:
    model = build_network()
    model.compile(loss=CategoricalCrossentropy
                 (from_logits=True),
                  optimizer='adam',
                  metrics=['accuracy'])
  10. Fit the model for 250 epochs:
    EPOCHS = 250
    model.fit(train_dataset,
              validation_data=validation_dataset,
              epochs=EPOCHS)
  11. Evaluate the model on the test set:
    test_loss, test_accuracy = model.evaluate(test_dataset)

After 250 epochs, our network achieves around 93.5% accuracy on the test set. Let's understand what we just did.

How it works…

We started by defining the CLASSES list, which allowed us to quickly one-hot encode the labels of each image, based on the name of the directory where they were contained, as we observed in the body of the load_image_and_label() function. In this same function, we read an image from disk, decoded it from its JPEG format, converted it to grayscale (color information is not necessary in this problem), and then resized it to more manageable dimensions of 32x32x1.

build_network() creates a very simple and shallow CNN, comprising a single convolutional layer, activated with ReLU(), followed by an output, a fully connected layer of three neurons, corresponding to the number of categories in the dataset. Because this is a multi-class classification task, we use Softmax() to activate the outputs.

prepare_dataset() leverages the load_image_and_label() function defined previously to convert file paths into batches of image tensors and one-hot encoded labels.

Using the three functions explained here, we prepared three subsets of data, with the purpose of training, validating, and testing the neural network. We trained the model for 250 epochs, using the adam optimizer and CategoricalCrossentropy(from_logits=True) as our loss function (from_logits=True produces a bit more numerical stability).

Finally, we got around 93.5% accuracy on the test set. Based on these results, you could use this network as a component of a Rock Paper Scissors game to recognize the hand gestures of a player and react accordingly.

See also

For more information on the Rock-Paper-Scissors Images dataset, refer to the official Kaggle page where it's hosted: https://www.kaggle.com/drgfreeman/rockpaperscissors.

You have been reading a chapter from
TensorFlow 2.0 Computer Vision Cookbook
Published in: Feb 2021
Publisher: Packt
ISBN-13: 9781838829131
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image