Creating a binary classifier to detect smiles
In its most basic form, image classification consists of discerning between two classes, or signaling the presence or absence of some trait. In this recipe, we'll implement a binary classifier that tells us whether a person in a photo is smiling.
Let's begin, shall we?
Getting ready
You'll need to install Pillow
, which is very easy with pip
:
$> pip install Pillow
We'll use the SMILEs
dataset, located here: https://github.com/hromi/SMILEsmileD. Clone or download a zipped version of the repository to a location of your preference. In this recipe, we assume the data is inside the ~/.keras/datasets
directory, under the name SMILEsmileD-master
:
Let's get started!
How to do it…
Follow these steps to train a smile classifier from scratch on the SMILEs
dataset:
- Import all necessary packages:
import os import pathlib import glob import numpy as np from sklearn.model_selection import train_test_split from tensorflow.keras import Model from tensorflow.keras.layers import * from tensorflow.keras.preprocessing.image import *
- Define a function to load the images and labels from a list of file paths:
def load_images_and_labels(image_paths): images = [] labels = [] for image_path in image_paths: image = load_img(image_path, target_size=(32,32), color_mode='grayscale') image = img_to_array(image) label = image_path.split(os.path.sep)[-2] label = 'positive' in label label = float(label) images.append(image) labels.append(label) return np.array(images), np.array(labels)
Notice that we are loading the images in grayscale, and we're encoding the labels by checking whether the word positive is in the file path of the image.
- Define a function to build the neural network. This model's structure is based on LeNet (you can find a link to LeNet's paper in the See also section):
def build_network(): input_layer = Input(shape=(32, 32, 1)) x = Conv2D(filters=20, kernel_size=(5, 5), padding='same', strides=(1, 1))(input_layer) x = ELU()(x) x = BatchNormalization()(x) x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2))(x) x = Dropout(0.4)(x) x = Conv2D(filters=50, kernel_size=(5, 5), padding='same', strides=(1, 1))(x) x = ELU()(x) x = BatchNormalization()(x) x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2))(x) x = Dropout(0.4)(x) x = Flatten()(x) x = Dense(units=500)(x) x = ELU()(x) x = Dropout(0.4)(x) output = Dense(1, activation='sigmoid')(x) model = Model(inputs=input_layer, outputs=output) return model
Because this is a binary classification problem, a single Sigmoid-activated neuron is enough in the output layer.
- Load the image paths into a list:
files_pattern = (pathlib.Path.home() / '.keras' / 'datasets' / 'SMILEsmileD-master' / 'SMILEs' / '*' / '*' / '*.jpg') files_pattern = str(files_pattern) dataset_paths = [*glob.glob(files_pattern)]
- Use the
load_images_and_labels()
function defined previously to load the dataset into memory:X, y = load_images_and_labels(dataset_paths)
- Normalize the images and compute the number of positive, negative, and total examples in the dataset:
X /= 255.0 total = len(y) total_positive = np.sum(y) total_negative = total - total_positive
- Create train, test, and validation subsets of the data:
(X_train, X_test, y_train, y_test) = train_test_split(X, y, test_size=0.2, stratify=y, random_state=999) (X_train, X_val, y_train, y_val) = train_test_split(X_train, y_train, test_size=0.2, stratify=y_train, random_state=999)
- Instantiate the model and compile it:
model = build_network() model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
- Train the model. Because the dataset is unbalanced, we are assigning weights to each class proportional to the number of positive and negative images in the dataset:
BATCH_SIZE = 32 EPOCHS = 20 model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=EPOCHS, batch_size=BATCH_SIZE, class_weight={ 1.0: total / total_positive, 0.0: total / total_negative })
- Evaluate the model on the test set:
test_loss, test_accuracy = model.evaluate(X_test, y_test)
After 20 epochs, the network should get around 90% accuracy on the test set. In the following section, we'll explain the previous steps.
How it works…
We just trained a network to determine whether a person is smiling or not in a picture. Our first big task was to take the images in the dataset and load them into a format suitable for our neural network. Specifically, the load_image_and_labels()
function is in charge of loading an image in grayscale, resizing it to 32x32x1, and then converting it into a numpy
array. To extract the label, we looked at the containing folder of each image: if it contained the word positive, we encoded the label as 1; otherwise, we encoded it as 0 (a trick we used here was casting a Boolean as a float, like this: float(label)
).
Next, we built the neural network, which is inspired by the LeNet architecture. The biggest takeaway here is that because this is a binary classification problem, we can use a single Sigmoid-activated neuron to discern between the two classes.
We then took 20% of the images to comprise our test set, and from the remaining 80% we took an additional 20% to create our validation set. With these three subsets in place, we proceeded to train the network over 20 epochs, using binary_crossentropy
as our loss function and rmsprop
as the optimizer.
To account for the imbalance in the dataset (out of the 13,165 images, only 3,690 contain smiling people, while the remaining 9,475 do not), we passed a class_weight
dictionary where we assigned a weight conversely proportional to the number of instances of each class in the dataset, effectively forcing the model to pay more attention to the 1.0 class, which corresponds to smile.
Finally, we achieved around 90.5% accuracy on the test set.
See also
For more information on the SMILEs
dataset, you can visit the official GitHub repository here: https://github.com/hromi/SMILEsmileD. You can read the LeNet paper here (it's pretty long, though): http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf.