Creating a multi-class classifier to play rock paper scissors
More often than not, we are interested in categorizing an image into more than two classes. As we'll see in this recipe, implementing a neural network to differentiate between many categories is fairly straightforward, and what better way to demonstrate this than by training a model that can play the widely known Rock Paper Scissors game?
Are you ready? Let's dive in!
Getting ready
We'll use the Rock-Paper-Scissors Images
dataset, which is hosted on Kaggle at the following location: https://www.kaggle.com/drgfreeman/rockpaperscissors. To download it, you'll need a Kaggle account, so sign in or sign up accordingly. Then, unzip the dataset in a location of your preference. In this recipe, we assume the unzipped folder is inside the ~/.keras/datasets
directory, under the name rockpaperscissors
.
Here are some sample images:
Let's begin implementing.
How to do it…
The following steps explain how to train a multi-class Convolutional Neural Network (CNN) to distinguish between the three classes of the Rock Paper Scissors game:
- Import the required packages:
import os import pathlib import glob import numpy as np import tensorflow as tf from sklearn.model_selection import train_test_split from tensorflow.keras import Model from tensorflow.keras.layers import * from tensorflow.keras.losses import CategoricalCrossentropy
- Define a list with the three classes, and also an alias to
tf.data.experimental.AUTOTUNE
, which we'll use later:CLASSES = ['rock', 'paper', 'scissors'] AUTOTUNE = tf.data.experimental.AUTOTUNE
The values in
CLASSES
match the names of the directories that contain the images for each class. - Define a function to load an image and its label, given its file path:
def load_image_and_label(image_path, target_size=(32, 32)): image = tf.io.read_file(image_path) image = tf.image.decode_jpeg(image, channels=3) image = tf.image.rgb_to_grayscale(image) image = tf.image.convert_image_dtype(image, np.float32) image = tf.image.resize(image, target_size) label = tf.strings.split(image_path,os.path.sep)[-2] label = (label == CLASSES) # One-hot encode. label = tf.dtypes.cast(label, tf.float32) return image, label
Notice that we are one-hot encoding by comparing the name of the folder that contains the image (extracted from
image_path
) with theCLASSES
list. - Define a function to build the network architecture. In this case, it's a very simple and shallow one, which is enough for the problem we are solving:
def build_network(): input_layer = Input(shape=(32, 32, 1)) x = Conv2D(filters=32, kernel_size=(3, 3), padding='same', strides=(1, 1))(input_layer) x = ReLU()(x) x = Dropout(rate=0.5)(x) x = Flatten()(x) x = Dense(units=3)(x) output = Softmax()(x) return Model(inputs=input_layer, outputs=output)
- Define a function to, given a path to a dataset, return a
tf.data.Dataset
instance of images and labels, in batches and optionally shuffled:def prepare_dataset(dataset_path, buffer_size, batch_size, shuffle=True): dataset = (tf.data.Dataset .from_tensor_slices(dataset_path) .map(load_image_and_label, num_parallel_calls=AUTOTUNE)) if shuffle: dataset.shuffle(buffer_size=buffer_size) dataset = (dataset .batch(batch_size=batch_size) .prefetch(buffer_size=buffer_size)) return dataset
- Load the image paths into a list:
file_patten = (pathlib.Path.home() / '.keras' / 'datasets' / 'rockpaperscissors' / 'rps-cv-images' / '*' / '*.png') file_pattern = str(file_patten) dataset_paths = [*glob.glob(file_pattern)]
- Create train, test, and validation subsets of image paths:
train_paths, test_paths = train_test_split(dataset_paths, test_size=0.2, random_state=999) train_paths, val_paths = train_test_split(train_paths, test_size=0.2, random_state=999)
- Prepare the training, test, and validation datasets:
BATCH_SIZE = 1024 BUFFER_SIZE = 1024 train_dataset = prepare_dataset(train_paths, buffer_size=BUFFER_SIZE, batch_size=BATCH_SIZE) validation_dataset = prepare_dataset(val_paths, buffer_size=BUFFER_SIZE, batch_size=BATCH_SIZE, shuffle=False) test_dataset = prepare_dataset(test_paths, buffer_size=BUFFER_SIZE, batch_size=BATCH_SIZE, shuffle=False)
- Instantiate and compile the model:
model = build_network() model.compile(loss=CategoricalCrossentropy (from_logits=True), optimizer='adam', metrics=['accuracy'])
- Fit the model for
250
epochs:EPOCHS = 250 model.fit(train_dataset, validation_data=validation_dataset, epochs=EPOCHS)
- Evaluate the model on the test set:
test_loss, test_accuracy = model.evaluate(test_dataset)
After 250 epochs, our network achieves around 93.5% accuracy on the test set. Let's understand what we just did.
How it works…
We started by defining the CLASSES
list, which allowed us to quickly one-hot encode the labels of each image, based on the name of the directory where they were contained, as we observed in the body of the load_image_and_label()
function. In this same function, we read an image from disk, decoded it from its JPEG format, converted it to grayscale (color information is not necessary in this problem), and then resized it to more manageable dimensions of 32x32x1.
build_network()
creates a very simple and shallow CNN, comprising a single convolutional layer, activated with ReLU()
, followed by an output, a fully connected layer of three neurons, corresponding to the number of categories in the dataset. Because this is a multi-class classification task, we use Softmax()
to activate the outputs.
prepare_dataset()
leverages the load_image_and_label()
function defined previously to convert file paths into batches of image tensors and one-hot encoded labels.
Using the three functions explained here, we prepared three subsets of data, with the purpose of training, validating, and testing the neural network. We trained the model for 250 epochs, using the adam
optimizer and CategoricalCrossentropy(from_logits=True)
as our loss function (from_logits=True
produces a bit more numerical stability).
Finally, we got around 93.5% accuracy on the test set. Based on these results, you could use this network as a component of a Rock Paper Scissors game to recognize the hand gestures of a player and react accordingly.
See also
For more information on the Rock-Paper-Scissors Images
dataset, refer to the official Kaggle page where it's hosted: https://www.kaggle.com/drgfreeman/rockpaperscissors.