You're reading from TensorFlow 2.0 Computer Vision Cookbook Implement machine learning solutions to overcome various computer vision challenges

Product type Paperback

Published in Feb 2021

Publisher Packt

ISBN-13 9781838829131

Length 542 pages

Edition 1st Edition

Languages

Python

Tools

OpenCV

Concepts

Computer Vision

Author (1):

Jesús Martínez

View More author details

Table of Contents (14) Chapters

Preface

1. Chapter 1: Getting Started with TensorFlow 2.x for Computer Vision

2. Chapter 2: Performing Image Classification FREE CHAPTER

3. Chapter 3: Harnessing the Power of Pre-Trained Networks with Transfer Learning

4. Chapter 4: Enhancing and Styling Images with DeepDream, Neural Style Transfer, and Image Super-Resolution

5. Chapter 5: Reducing Noise with Autoencoders

6. Chapter 6: Generative Models and Adversarial Attacks

7. Chapter 7: Captioning Images with CNNs and RNNs

8. Chapter 8: Fine-Grained Understanding of Images through Segmentation

9. Chapter 9: Localizing Elements in Images with Object Detection

10. Chapter 10: Applying the Power of Deep Learning to Videos

11. Chapter 11: Streamlining Network Implementation with AutoML

12. Chapter 12: Boosting Performance

13. Other Books You May Enjoy

Leave a review - let other readers know what you think

Creating a multi-class classifier to play rock paper scissors

More often than not, we are interested in categorizing an image into more than two classes. As we'll see in this recipe, implementing a neural network to differentiate between many categories is fairly straightforward, and what better way to demonstrate this than by training a model that can play the widely known Rock Paper Scissors game?

Are you ready? Let's dive in!

Getting ready

We'll use the Rock-Paper-Scissors Images dataset, which is hosted on Kaggle at the following location: https://www.kaggle.com/drgfreeman/rockpaperscissors. To download it, you'll need a Kaggle account, so sign in or sign up accordingly. Then, unzip the dataset in a location of your preference. In this recipe, we assume the unzipped folder is inside the ~/.keras/datasets directory, under the name rockpaperscissors.

Here are some sample images:

Figure 2.2 – Example images of rock (left), paper (center), and scissors (right)

Let's begin implementing.

How to do it…

The following steps explain how to train a multi-class Convolutional Neural Network (CNN) to distinguish between the three classes of the Rock Paper Scissors game:

Import the required packages:

import os
import pathlib
import glob
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras import Model
from tensorflow.keras.layers import *
from tensorflow.keras.losses import CategoricalCrossentropy

Define a list with the three classes, and also an alias to tf.data.experimental.AUTOTUNE, which we'll use later:
```
CLASSES = ['rock', 'paper', 'scissors']
AUTOTUNE = tf.data.experimental.AUTOTUNE
```
The values in CLASSES match the names of the directories that contain the images for each class.

Define a function to load an image and its label, given its file path:

def load_image_and_label(image_path, target_size=(32, 32)):
    image = tf.io.read_file(image_path)
    image = tf.image.decode_jpeg(image, channels=3)
    image = tf.image.rgb_to_grayscale(image)
    image = tf.image.convert_image_dtype(image, 
                                         np.float32)
    image = tf.image.resize(image, target_size)
    label = tf.strings.split(image_path,os.path.sep)[-2]
    label = (label == CLASSES)  # One-hot encode.
    label = tf.dtypes.cast(label, tf.float32)
    return image, label

Notice that we are one-hot encoding by comparing the name of the folder that contains the image (extracted from image_path) with the CLASSES list.

Define a function to build the network architecture. In this case, it's a very simple and shallow one, which is enough for the problem we are solving:

def build_network():
    input_layer = Input(shape=(32, 32, 1))
    x = Conv2D(filters=32,
               kernel_size=(3, 3),
               padding='same',
               strides=(1, 1))(input_layer)
    x = ReLU()(x)
    x = Dropout(rate=0.5)(x)
    x = Flatten()(x)
    x = Dense(units=3)(x)
    output = Softmax()(x)
    return Model(inputs=input_layer, outputs=output)

Define a function to, given a path to a dataset, return a tf.data.Dataset instance of images and labels, in batches and optionally shuffled:

def prepare_dataset(dataset_path,
                    buffer_size,
                    batch_size,
                    shuffle=True):
    dataset = (tf.data.Dataset
               .from_tensor_slices(dataset_path)
               .map(load_image_and_label,
                    num_parallel_calls=AUTOTUNE))
    if shuffle:
        dataset.shuffle(buffer_size=buffer_size)
    dataset = (dataset
               .batch(batch_size=batch_size)
               .prefetch(buffer_size=buffer_size))
    return dataset

Load the image paths into a list:

file_patten = (pathlib.Path.home() / '.keras' / 
               'datasets' /
               'rockpaperscissors' / 'rps-cv-images' / 
                 '*' /
               '*.png')
file_pattern = str(file_patten)
dataset_paths = [*glob.glob(file_pattern)]

Create train, test, and validation subsets of image paths:

train_paths, test_paths = train_test_split(dataset_paths,
                                          test_size=0.2,
                                        random_state=999)
train_paths, val_paths = train_test_split(train_paths,
                                      test_size=0.2,
                                     random_state=999)

Prepare the training, test, and validation datasets:

BATCH_SIZE = 1024
BUFFER_SIZE = 1024
train_dataset = prepare_dataset(train_paths,
                              buffer_size=BUFFER_SIZE,
                                batch_size=BATCH_SIZE)
validation_dataset = prepare_dataset(val_paths,
                              buffer_size=BUFFER_SIZE,
                               batch_size=BATCH_SIZE,
                                shuffle=False)
test_dataset = prepare_dataset(test_paths,
                              buffer_size=BUFFER_SIZE,
                               batch_size=BATCH_SIZE,
                               shuffle=False)

Instantiate and compile the model:

model = build_network()
model.compile(loss=CategoricalCrossentropy
             (from_logits=True),
              optimizer='adam',
              metrics=['accuracy'])

Fit the model for 250 epochs:

EPOCHS = 250
model.fit(train_dataset,
          validation_data=validation_dataset,
          epochs=EPOCHS)

Evaluate the model on the test set:

test_loss, test_accuracy = model.evaluate(test_dataset)

After 250 epochs, our network achieves around 93.5% accuracy on the test set. Let's understand what we just did.

How it works…

We started by defining the CLASSES list, which allowed us to quickly one-hot encode the labels of each image, based on the name of the directory where they were contained, as we observed in the body of the load_image_and_label() function. In this same function, we read an image from disk, decoded it from its JPEG format, converted it to grayscale (color information is not necessary in this problem), and then resized it to more manageable dimensions of 32x32x1.

build_network() creates a very simple and shallow CNN, comprising a single convolutional layer, activated with ReLU(), followed by an output, a fully connected layer of three neurons, corresponding to the number of categories in the dataset. Because this is a multi-class classification task, we use Softmax() to activate the outputs.

prepare_dataset() leverages the load_image_and_label() function defined previously to convert file paths into batches of image tensors and one-hot encoded labels.

Using the three functions explained here, we prepared three subsets of data, with the purpose of training, validating, and testing the neural network. We trained the model for 250 epochs, using the adam optimizer and CategoricalCrossentropy(from_logits=True) as our loss function (from_logits=True produces a bit more numerical stability).

Finally, we got around 93.5% accuracy on the test set. Based on these results, you could use this network as a component of a Rock Paper Scissors game to recognize the hand gestures of a player and react accordingly.