Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
TensorFlow 2.0 Computer Vision Cookbook
TensorFlow 2.0 Computer Vision Cookbook

TensorFlow 2.0 Computer Vision Cookbook: Implement machine learning solutions to overcome various computer vision challenges

eBook
$29.99
Paperback
$43.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Table of content icon View table of contents Preview book icon Preview Book

TensorFlow 2.0 Computer Vision Cookbook

Chapter 2: Performing Image Classification

Computer vision is a vast field that takes inspiration from many places. Of course, this means that its applications are wide and varied. However, the biggest breakthroughs over the past decade, especially in the context of deep learning applied to visual tasks, have occurred in a particular domain known as image classification.

As the name suggests, image classification consists of the process of discerning what's in an image based on its visual content. Is there a dog or a cat in this image? What number is in this picture? Is the person in this photo smiling or not?

Because image classification is such an important and pervasive task in deep learning applied to computer vision, the recipes in this chapter will focus on the ins and outs of classifying images using TensorFlow 2.x.

We'll cover the following recipes:

  • Creating a binary classifier to detect smiles
  • Creating a multi-class classifier to play Rock Paper Scissors
  • Creating a multi-label classifier to label watches
  • Implementing ResNet from scratch
  • Classifying images with a pre-trained network using the Keras API
  • Classifying images with a pre-trained network using TensorFlow Hub
  • Using data augmentation to improve performance with the Keras API
  • Using data augmentation to improve performance with the tf.data and tf.image APIs

Technical requirements

Besides a working installation of TensorFlow 2.x, it's highly recommended to have access to a GPU, given that some of the recipes are very resource-intensive, making the use of a CPU an inviable option. In each recipe, you'll find the steps and dependencies needed to complete it in the Getting ready section. Finally, the code shown in this chapter is available in full here: https://github.com/PacktPublishing/Tensorflow-2.0-Computer-Vision-Cookbook/tree/master/ch2.

Check out the following link to see the Code in Action video:

https://bit.ly/3bOjqnU

Creating a binary classifier to detect smiles

In its most basic form, image classification consists of discerning between two classes, or signaling the presence or absence of some trait. In this recipe, we'll implement a binary classifier that tells us whether a person in a photo is smiling.

Let's begin, shall we?

Getting ready

You'll need to install Pillow, which is very easy with pip:

$> pip install Pillow

We'll use the SMILEs dataset, located here: https://github.com/hromi/SMILEsmileD. Clone or download a zipped version of the repository to a location of your preference. In this recipe, we assume the data is inside the ~/.keras/datasets directory, under the name SMILEsmileD-master:

Figure 2.1 – Positive (left) and negative (right) examples

Figure 2.1 – Positive (left) and negative (right) examples

Let's get started!

How to do it…

Follow these steps to train a smile classifier from scratch on the SMILEs dataset:

  1. Import all necessary packages:
    import os
    import pathlib
    import glob
    import numpy as np
    from sklearn.model_selection import train_test_split
    from tensorflow.keras import Model
    from tensorflow.keras.layers import *
    from tensorflow.keras.preprocessing.image import *
  2. Define a function to load the images and labels from a list of file paths:
    def load_images_and_labels(image_paths):
        images = []
        labels = []
        for image_path in image_paths:
            image = load_img(image_path, target_size=(32,32), 
                             color_mode='grayscale')
            image = img_to_array(image)
            label = image_path.split(os.path.sep)[-2]
            label = 'positive' in label
            label = float(label)
            images.append(image)
            labels.append(label)
        return np.array(images), np.array(labels)

    Notice that we are loading the images in grayscale, and we're encoding the labels by checking whether the word positive is in the file path of the image.

  3. Define a function to build the neural network. This model's structure is based on LeNet (you can find a link to LeNet's paper in the See also section):
    def build_network():
        input_layer = Input(shape=(32, 32, 1))
        x = Conv2D(filters=20,
                   kernel_size=(5, 5),
                   padding='same',
                   strides=(1, 1))(input_layer)
        x = ELU()(x)
        x = BatchNormalization()(x)
        x = MaxPooling2D(pool_size=(2, 2),
                         strides=(2, 2))(x)
        x = Dropout(0.4)(x)
        x = Conv2D(filters=50,
                   kernel_size=(5, 5),
                   padding='same',
                   strides=(1, 1))(x)
        x = ELU()(x)
        x = BatchNormalization()(x)
        x = MaxPooling2D(pool_size=(2, 2),
                         strides=(2, 2))(x)
        x = Dropout(0.4)(x)
        x = Flatten()(x)
        x = Dense(units=500)(x)
        x = ELU()(x)
        x = Dropout(0.4)(x)
        output = Dense(1, activation='sigmoid')(x)
        model = Model(inputs=input_layer, outputs=output)
        return model

    Because this is a binary classification problem, a single Sigmoid-activated neuron is enough in the output layer.

  4. Load the image paths into a list:
    files_pattern = (pathlib.Path.home() / '.keras' / 
                     'datasets' /
                     'SMILEsmileD-master' / 'SMILEs' / '*' 
                        / '*' / 
                     '*.jpg')
    files_pattern = str(files_pattern)
    dataset_paths = [*glob.glob(files_pattern)]
  5. Use the load_images_and_labels() function defined previously to load the dataset into memory:
    X, y = load_images_and_labels(dataset_paths)
  6. Normalize the images and compute the number of positive, negative, and total examples in the dataset:
    X /= 255.0
    total = len(y)
    total_positive = np.sum(y)
    total_negative = total - total_positive
  7. Create train, test, and validation subsets of the data:
    (X_train, X_test,
     y_train, y_test) = train_test_split(X, y,
                                         test_size=0.2,
                                         stratify=y,
                                         random_state=999)
    (X_train, X_val,
     y_train, y_val) = train_test_split(X_train, y_train,
                                        test_size=0.2,
                                        stratify=y_train,
                                        random_state=999)
  8. Instantiate the model and compile it:
    model = build_network()
    model.compile(loss='binary_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])
  9. Train the model. Because the dataset is unbalanced, we are assigning weights to each class proportional to the number of positive and negative images in the dataset:
    BATCH_SIZE = 32
    EPOCHS = 20
    model.fit(X_train, y_train,
              validation_data=(X_val, y_val),
              epochs=EPOCHS,
              batch_size=BATCH_SIZE,
              class_weight={
                  1.0: total / total_positive,
                  0.0: total / total_negative
              })
  10. Evaluate the model on the test set:
    test_loss, test_accuracy = model.evaluate(X_test, 
                                              y_test)

After 20 epochs, the network should get around 90% accuracy on the test set. In the following section, we'll explain the previous steps.

How it works…

We just trained a network to determine whether a person is smiling or not in a picture. Our first big task was to take the images in the dataset and load them into a format suitable for our neural network. Specifically, the load_image_and_labels() function is in charge of loading an image in grayscale, resizing it to 32x32x1, and then converting it into a numpy array. To extract the label, we looked at the containing folder of each image: if it contained the word positive, we encoded the label as 1; otherwise, we encoded it as 0 (a trick we used here was casting a Boolean as a float, like this: float(label)).

Next, we built the neural network, which is inspired by the LeNet architecture. The biggest takeaway here is that because this is a binary classification problem, we can use a single Sigmoid-activated neuron to discern between the two classes.

We then took 20% of the images to comprise our test set, and from the remaining 80% we took an additional 20% to create our validation set. With these three subsets in place, we proceeded to train the network over 20 epochs, using binary_crossentropy as our loss function and rmsprop as the optimizer.

To account for the imbalance in the dataset (out of the 13,165 images, only 3,690 contain smiling people, while the remaining 9,475 do not), we passed a class_weight dictionary where we assigned a weight conversely proportional to the number of instances of each class in the dataset, effectively forcing the model to pay more attention to the 1.0 class, which corresponds to smile.

Finally, we achieved around 90.5% accuracy on the test set.

See also

For more information on the SMILEs dataset, you can visit the official GitHub repository here: https://github.com/hromi/SMILEsmileD. You can read the LeNet paper here (it's pretty long, though): http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf.

Creating a multi-class classifier to play rock paper scissors

More often than not, we are interested in categorizing an image into more than two classes. As we'll see in this recipe, implementing a neural network to differentiate between many categories is fairly straightforward, and what better way to demonstrate this than by training a model that can play the widely known Rock Paper Scissors game?

Are you ready? Let's dive in!

Getting ready

We'll use the Rock-Paper-Scissors Images dataset, which is hosted on Kaggle at the following location: https://www.kaggle.com/drgfreeman/rockpaperscissors. To download it, you'll need a Kaggle account, so sign in or sign up accordingly. Then, unzip the dataset in a location of your preference. In this recipe, we assume the unzipped folder is inside the ~/.keras/datasets directory, under the name rockpaperscissors.

Here are some sample images:

Figure 2.2 – Example images of rock (left), paper (center), and scissors (right)

Figure 2.2 – Example images of rock (left), paper (center), and scissors (right)

Let's begin implementing.

How to do it…

The following steps explain how to train a multi-class Convolutional Neural Network (CNN) to distinguish between the three classes of the Rock Paper Scissors game:

  1. Import the required packages:
    import os
    import pathlib
    import glob
    import numpy as np
    import tensorflow as tf
    from sklearn.model_selection import train_test_split
    from tensorflow.keras import Model
    from tensorflow.keras.layers import *
    from tensorflow.keras.losses import CategoricalCrossentropy
  2. Define a list with the three classes, and also an alias to tf.data.experimental.AUTOTUNE, which we'll use later:
    CLASSES = ['rock', 'paper', 'scissors']
    AUTOTUNE = tf.data.experimental.AUTOTUNE

    The values in CLASSES match the names of the directories that contain the images for each class.

  3. Define a function to load an image and its label, given its file path:
    def load_image_and_label(image_path, target_size=(32, 32)):
        image = tf.io.read_file(image_path)
        image = tf.image.decode_jpeg(image, channels=3)
        image = tf.image.rgb_to_grayscale(image)
        image = tf.image.convert_image_dtype(image, 
                                             np.float32)
        image = tf.image.resize(image, target_size)
        label = tf.strings.split(image_path,os.path.sep)[-2]
        label = (label == CLASSES)  # One-hot encode.
        label = tf.dtypes.cast(label, tf.float32)
        return image, label

    Notice that we are one-hot encoding by comparing the name of the folder that contains the image (extracted from image_path) with the CLASSES list.

  4. Define a function to build the network architecture. In this case, it's a very simple and shallow one, which is enough for the problem we are solving:
    def build_network():
        input_layer = Input(shape=(32, 32, 1))
        x = Conv2D(filters=32,
                   kernel_size=(3, 3),
                   padding='same',
                   strides=(1, 1))(input_layer)
        x = ReLU()(x)
        x = Dropout(rate=0.5)(x)
        x = Flatten()(x)
        x = Dense(units=3)(x)
        output = Softmax()(x)
        return Model(inputs=input_layer, outputs=output)
  5. Define a function to, given a path to a dataset, return a tf.data.Dataset instance of images and labels, in batches and optionally shuffled:
    def prepare_dataset(dataset_path,
                        buffer_size,
                        batch_size,
                        shuffle=True):
        dataset = (tf.data.Dataset
                   .from_tensor_slices(dataset_path)
                   .map(load_image_and_label,
                        num_parallel_calls=AUTOTUNE))
        if shuffle:
            dataset.shuffle(buffer_size=buffer_size)
        dataset = (dataset
                   .batch(batch_size=batch_size)
                   .prefetch(buffer_size=buffer_size))
        return dataset
  6. Load the image paths into a list:
    file_patten = (pathlib.Path.home() / '.keras' / 
                   'datasets' /
                   'rockpaperscissors' / 'rps-cv-images' / 
                     '*' /
                   '*.png')
    file_pattern = str(file_patten)
    dataset_paths = [*glob.glob(file_pattern)]
  7. Create train, test, and validation subsets of image paths:
    train_paths, test_paths = train_test_split(dataset_paths,
                                              test_size=0.2,
                                            random_state=999)
    train_paths, val_paths = train_test_split(train_paths,
                                          test_size=0.2,
                                         random_state=999)
  8. Prepare the training, test, and validation datasets:
    BATCH_SIZE = 1024
    BUFFER_SIZE = 1024
    train_dataset = prepare_dataset(train_paths,
                                  buffer_size=BUFFER_SIZE,
                                    batch_size=BATCH_SIZE)
    validation_dataset = prepare_dataset(val_paths,
                                  buffer_size=BUFFER_SIZE,
                                   batch_size=BATCH_SIZE,
                                    shuffle=False)
    test_dataset = prepare_dataset(test_paths,
                                  buffer_size=BUFFER_SIZE,
                                   batch_size=BATCH_SIZE,
                                   shuffle=False)
  9. Instantiate and compile the model:
    model = build_network()
    model.compile(loss=CategoricalCrossentropy
                 (from_logits=True),
                  optimizer='adam',
                  metrics=['accuracy'])
  10. Fit the model for 250 epochs:
    EPOCHS = 250
    model.fit(train_dataset,
              validation_data=validation_dataset,
              epochs=EPOCHS)
  11. Evaluate the model on the test set:
    test_loss, test_accuracy = model.evaluate(test_dataset)

After 250 epochs, our network achieves around 93.5% accuracy on the test set. Let's understand what we just did.

How it works…

We started by defining the CLASSES list, which allowed us to quickly one-hot encode the labels of each image, based on the name of the directory where they were contained, as we observed in the body of the load_image_and_label() function. In this same function, we read an image from disk, decoded it from its JPEG format, converted it to grayscale (color information is not necessary in this problem), and then resized it to more manageable dimensions of 32x32x1.

build_network() creates a very simple and shallow CNN, comprising a single convolutional layer, activated with ReLU(), followed by an output, a fully connected layer of three neurons, corresponding to the number of categories in the dataset. Because this is a multi-class classification task, we use Softmax() to activate the outputs.

prepare_dataset() leverages the load_image_and_label() function defined previously to convert file paths into batches of image tensors and one-hot encoded labels.

Using the three functions explained here, we prepared three subsets of data, with the purpose of training, validating, and testing the neural network. We trained the model for 250 epochs, using the adam optimizer and CategoricalCrossentropy(from_logits=True) as our loss function (from_logits=True produces a bit more numerical stability).

Finally, we got around 93.5% accuracy on the test set. Based on these results, you could use this network as a component of a Rock Paper Scissors game to recognize the hand gestures of a player and react accordingly.

See also

For more information on the Rock-Paper-Scissors Images dataset, refer to the official Kaggle page where it's hosted: https://www.kaggle.com/drgfreeman/rockpaperscissors.

Creating a multi-label classifier to label watches

A neural network is not limited to modeling the distribution of a single variable. In fact, it can easily handle instances where each image has multiple labels associated with it. In this recipe, we'll implement a CNN to classify the gender and style/usage of watches.

Let's get started.

Getting ready

First, we must install Pillow:

$> pip install Pillow

Next, we'll use the Fashion Product Images (Small) dataset hosted in Kaggle, which, after signing in, you can download here: https://www.kaggle.com/paramaggarwal/fashion-product-images-small. In this recipe, we assume the data is inside the ~/.keras/datasets directory, under the name fashion-product-images-small. We'll only use a subset of the data, focused on watches, which we'll construct programmatically in the How to do it… section.

Here are some sample images:

Figure 2.3 – Example images

Figure 2.3 – Example images

Let's begin the recipe.

How to do it…

Let's review the steps to complete the recipe:

  1. Import the necessary packages:
    import os
    import pathlib
    from csv import DictReader
    import glob
    import numpy as np
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import MultiLabelBinarizer
    from tensorflow.keras.layers import *
    from tensorflow.keras.models import Model
    from tensorflow.keras.preprocessing.image import *
  2. Define a function to build the network architecture. First, implement the convolutional blocks:
    def build_network(width, height, depth, classes):
        input_layer = Input(shape=(width, height, depth))
        x = Conv2D(filters=32,
                   kernel_size=(3, 3),
                   padding='same')(input_layer)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = Conv2D(filters=32,
                   kernel_size=(3, 3),
                   padding='same')(x)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(rate=0.25)(x)
        x = Conv2D(filters=64,
                   kernel_size=(3, 3),
                   padding='same')(x)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = Conv2D(filters=64,
                   kernel_size=(3, 3),
                   padding='same')(x)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(rate=0.25)(x)

    Next, add the fully convolutional layers:

        x = Flatten()(x)
        x = Dense(units=512)(x)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = Dropout(rate=0.5)(x)
        x = Dense(units=classes)(x)
        output = Activation('sigmoid')(x)
        return Model(input_layer, output)
  3. Define a function to load all images and labels (gender and usage), given a list of image paths and a dictionary of metadata associated with each of them:
    def load_images_and_labels(image_paths, styles, 
                               target_size):
        images = []
        labels = []
        for image_path in image_paths:
            image = load_img(image_path, 
                             target_size=target_size)
            image = img_to_array(image)
            image_id = image_path.split(os.path.sep)[-
                                             1][:-4]
            image_style = styles[image_id]
            label = (image_style['gender'], 
                     image_style['usage'])
            images.append(image)
            labels.append(label)
        return np.array(images), np.array(labels)
  4. Set the random seed to guarantee reproducibility:
    SEED = 999
    np.random.seed(SEED)
  5. Define the paths to the images and the styles.csv metadata file:
    base_path = (pathlib.Path.home() / '.keras' / 
                 'datasets' /
                 'fashion-product-images-small')
    styles_path = str(base_path / 'styles.csv')
    images_path_pattern = str(base_path / 'images/*.jpg')
    image_paths = glob.glob(images_path_pattern)
  6. Keep only the Watches images for Casual, Smart Casual, and Formal usage, suited to Men and Women:
    with open(styles_path, 'r') as f:
        dict_reader = DictReader(f)
        STYLES = [*dict_reader]
        article_type = 'Watches'
        genders = {'Men', 'Women'}
        usages = {'Casual', 'Smart Casual', 'Formal'}
        STYLES = {style['id']: style
                  for style in STYLES
                  if (style['articleType'] == article_type 
                                               and
                      style['gender'] in genders and
                      style['usage'] in usages)}
    image_paths = [*filter(lambda p: 
                   p.split(os.path.sep)[-1][:-4]
                                     in STYLES.keys(),
                           image_paths)]
  7. Load the images and labels, resizing the images into a 64x64x3 shape:
    X, y = load_images_and_labels(image_paths, STYLES, 
                                  (64, 64))
  8. Normalize the images and multi-hot encode the labels:
    X = X.astype('float') / 255.0
    mlb = MultiLabelBinarizer()
    y = mlb.fit_transform(y)
  9. Create the train, validation, and test splits:
    (X_train, X_test,
     y_train, y_test) = train_test_split(X, y,
                                         stratify=y,
                                         test_size=0.2,
                                         
                                        random_state=SEED)
    (X_train, X_valid,
     y_train, y_valid) = train_test_split(X_train, y_train,
                                        stratify=y_train,
                                          test_size=0.2,
                                       random_state=SEED)
  10. Build and compile the network:
    model = build_network(width=64,
                          height=64,
                          depth=3,
                          classes=len(mlb.classes_))
    model.compile(loss='binary_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])
  11. Train the model for 20 epochs, in batches of 64 images at a time:
    BATCH_SIZE = 64
    EPOCHS = 20
    model.fit(X_train, y_train,
              validation_data=(X_valid, y_valid),
              batch_size=BATCH_SIZE,
              epochs=EPOCHS)
  12. Evaluate the model on the test set:
    result = model.evaluate(X_test, y_test, 
                           batch_size=BATCH_SIZE)
    print(f'Test accuracy: {result[1]}')

    This block prints as follows:

    Test accuracy: 0.90233546
  13. Use the model to make predictions on a test image, displaying the probability of each label:
    test_image = np.expand_dims(X_test[0], axis=0)
    probabilities = model.predict(test_image)[0]
    for label, p in zip(mlb.classes_, probabilities):
        print(f'{label}: {p * 100:.2f}%')

    That prints this:

    Casual: 100.00%
    Formal: 0.00%
    Men: 1.08%
    Smart Casual: 0.01%
    Women: 99.16%
  14. Compare the ground truth labels with the network's prediction:
    ground_truth_labels = np.expand_dims(y_test[0], 
                                         axis=0)
    ground_truth_labels = mlb.inverse_transform(ground_truth_labels)
    print(f'Ground truth labels: {ground_truth_labels}')

    The output is as follows:

    Ground truth labels: [('Casual', 'Women')]

Let's see how it all works in the next section.

How it works…

We implemented a smaller version of a VGG network, which is capable of performing multi-label, multi-class classification, by modeling independent distributions for the gender and usage metadata associated with each watch. In other words, we modeled two binary classification problems at the same time: one for gender, and one for usage. This is the reason we activated the outputs of the network with Sigmoid, instead of Softmax, and also why the loss function used is binary_crossentropy and not categorical_crossentropy.

We trained the aforementioned network over 20 epochs, on batches of 64 images at a time, obtaining a respectable 90% accuracy on the test set. Finally, we made a prediction on an unseen image from the test set and verified that the labels produced with great certainty by the network (100% certainty for Casual, and 99.16% for Women) correspond to the ground truth categories Casual and Women.

See also

For more information on the Fashion Product Images (Small) dataset, refer to the official Kaggle page where it is hosted: https://www.kaggle.com/paramaggarwal/fashion-product-images-small. I recommend you read the paper where the seminal VGG architecture was introduced: https://arxiv.org/abs/1409.1556.

Implementing ResNet from scratch

Residual Network, or ResNet for short, constitutes one of the most groundbreaking advancements in deep learning. This architecture relies on a component called the residual module, which allows us to ensemble networks with depths that were unthinkable a couple of years ago. There are variants of ResNet that have more than 100 layers, without any loss of performance!

In this recipe, we'll implement ResNet from scratch and train it on the challenging drop-in replacement to CIFAR-10, CINIC-10.

Getting ready

We won't explain ResNet in depth, so it is a good idea to familiarize yourself with the architecture if you are interested in the details. You can read the original paper here: https://arxiv.org/abs/1512.03385.

How to do it…

Follow these steps to implement ResNet from the ground up:

  1. Import all necessary modules:
    import os
    import numpy as np
    import tarfile
    import tensorflow as tf
    from tensorflow.keras.callbacks import ModelCheckpoint
    from tensorflow.keras.layers import *
    from tensorflow.keras.models import *
    from tensorflow.keras.regularizers import l2
    from tensorflow.keras.utils import get_file
  2. Define an alias to the tf.data.expertimental.AUTOTUNE option, which we'll use later:
    AUTOTUNE = tf.data.experimental.AUTOTUNE
  3. Define a function to create a residual module in the ResNet architecture. Let's start by specifying the function signature and implementing the first block:
    def residual_module(data,
                        filters,
                        stride,
                        reduce=False,
                        reg=0.0001,
                        bn_eps=2e-5,
                        bn_momentum=0.9):
        bn_1 = BatchNormalization(axis=-1,
                                  epsilon=bn_eps,
                               momentum=bn_momentum)(data)
        act_1 = ReLU()(bn_1)
        conv_1 = Conv2D(filters=int(filters / 4.),
                        kernel_size=(1, 1),
                        use_bias=False,
                        kernel_regularizer=l2(reg))(act_1)

    Let's now implement the second and third blocks:

        bn_2 = BatchNormalization(axis=-1,
                                  epsilon=bn_eps,
                             momentum=bn_momentum)(conv_1)
        act_2 = ReLU()(bn_2)
        conv_2 = Conv2D(filters=int(filters / 4.),
                        kernel_size=(3, 3),
                        strides=stride,
                        padding='same',
                        use_bias=False,
                        kernel_regularizer=l2(reg))(act_2)
        bn_3 = BatchNormalization(axis=-1,
                                  epsilon=bn_eps,
                                  momentum=bn_momentum)(conv_2)
        act_3 = ReLU()(bn_3)
        conv_3 = Conv2D(filters=filters,
                        kernel_size=(1, 1),
                        use_bias=False,
                        kernel_regularizer=l2(reg))(act_3)

    If reduce=True, we apply a 1x1 convolution:

        if reduce:
            shortcut = Conv2D(filters=filters,
                              kernel_size=(1, 1),
                              strides=stride,
                              use_bias=False,
                        kernel_regularizer=l2(reg))(act_1)

    Finally, we combine the shortcut and the third block into a single layer and return that as our output:

        x = Add()([conv_3, shortcut])
        return x
  4. Define a function to build a custom ResNet network:
    def build_resnet(input_shape,
                     classes,
                     stages,
                     filters,
                     reg=1e-3,
                     bn_eps=2e-5,
                     bn_momentum=0.9):
        inputs = Input(shape=input_shape)
        x = BatchNormalization(axis=-1,
                               epsilon=bn_eps,
                               
                             momentum=bn_momentum)(inputs)
        x = Conv2D(filters[0], (3, 3),
                   use_bias=False,
                   padding='same',
                   kernel_regularizer=l2(reg))(x)
        for i in range(len(stages)):
            stride = (1, 1) if i == 0 else (2, 2)
            x = residual_module(data=x,
                                filters=filters[i + 1],
                                stride=stride,
                                reduce=True,
                                bn_eps=bn_eps,
                                bn_momentum=bn_momentum)
            for j in range(stages[i] - 1):
                x = residual_module(data=x,
                                    filters=filters[i + 
                                                   1],
                                    stride=(1, 1),
                                    bn_eps=bn_eps,
                                    
                                bn_momentum=bn_momentum)
        x = BatchNormalization(axis=-1,
                               epsilon=bn_eps,
                               momentum=bn_momentum)(x)
        x = ReLU()(x)
        x = AveragePooling2D((8, 8))(x)
        x = Flatten()(x)
        x = Dense(classes, kernel_regularizer=l2(reg))(x)
        x = Softmax()(x)
        return Model(inputs, x, name='resnet')
  5. Define a function to load an image and its one-hot encoded labels, based on its file path:
    def load_image_and_label(image_path, target_size=(32, 32)):
        image = tf.io.read_file(image_path)
        image = tf.image.decode_png(image, channels=3)
        image = tf.image.convert_image_dtype(image, 
                                            np.float32)
        image -= CINIC_MEAN_RGB  # Mean normalize
        image = tf.image.resize(image, target_size)
        label = tf.strings.split(image_path, os.path.sep)[-2]
        label = (label == CINIC_10_CLASSES)  # One-hot encode.
        label = tf.dtypes.cast(label, tf.float32)
        return image, label
  6. Define a function to create a tf.data.Dataset instance of images and labels from a glob-like pattern that refers to the folder where the images are:
    def prepare_dataset(data_pattern, shuffle=False):
        dataset = (tf.data.Dataset
                   .list_files(data_pattern)
                   .map(load_image_and_label,
                        num_parallel_calls=AUTOTUNE)
                   .batch(BATCH_SIZE))
        
        if shuffle:
            dataset = dataset.shuffle(BUFFER_SIZE)
            
        return dataset.prefetch(BATCH_SIZE)
  7. Define the mean RGB values of the CINIC-10 dataset, which is used in the load_image_and_label() function to mean normalize the images (this information is available on the official CINIC-10 site):
    CINIC_MEAN_RGB = np.array([0.47889522, 0.47227842, 0.43047404])
  8. Define the classes of the CINIC-10 dataset:
    CINIC_10_CLASSES = ['airplane', 'automobile', 'bird', 'cat',
                        'deer', 'dog', 'frog', 'horse',    'ship',
                        'truck']
  9. Download and extract the CINIC-10 dataset to the ~/.keras/datasets directory:
    DATASET_URL = ('https://datashare.is.ed.ac.uk/bitstream/handle/'
                   '10283/3192/CINIC-10.tar.gz?'
                   'sequence=4&isAllowed=y')
    DATA_NAME = 'cinic10'
    FILE_EXTENSION = 'tar.gz'
    FILE_NAME = '.'.join([DATA_NAME, FILE_EXTENSION])
    downloaded_file_location = get_file(origin=DATASET_URL,
                                        fname=FILE_NAME,
                                        extract=False)
    data_directory, _ = (downloaded_file_location
                         .rsplit(os.path.sep, maxsplit=1))
    data_directory = os.path.sep.join([data_directory, 
                                      DATA_NAME])
    tar = tarfile.open(downloaded_file_location)
    if not os.path.exists(data_directory):
        tar.extractall(data_directory)
  10. Define the glob-like patterns to the train, test, and validation subsets:
    train_pattern = os.path.sep.join(
        [data_directory, 'train/*/*.png'])
    test_pattern = os.path.sep.join(
        [data_directory, 'test/*/*.png'])
    valid_pattern = os.path.sep.join(
        [data_directory, 'valid/*/*.png'])
  11. Prepare the datasets:
    BATCH_SIZE = 128
    BUFFER_SIZE = 1024
    train_dataset = prepare_dataset(train_pattern, 
                                    shuffle=True)
    test_dataset = prepare_dataset(test_pattern)
    valid_dataset = prepare_dataset(valid_pattern)
  12. Build, compile, and train a ResNet model. Because this is a time-consuming process, we'll save a version of the model after each epoch, using the ModelCheckpoint() callback:
    model = build_resnet(input_shape=(32, 32, 3),
                         classes=10,
                         stages=(9, 9, 9),
                         filters=(64, 64, 128, 256),
                         reg=5e-3)
    model.compile(loss='categorical_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])
    model_checkpoint_callback = ModelCheckpoint(
        filepath='./model.{epoch:02d}-{val_accuracy:.2f}.hdf5',
        save_weights_only=False,
        monitor='val_accuracy')
    EPOCHS = 100
    model.fit(train_dataset,
              validation_data=valid_dataset,
              epochs=EPOCHS,
              callbacks=[model_checkpoint_callback])
  13. Load the best model (in this case, model.38-0.72.hdf5) and evaluate it on the test set:
    model = load_model('model.38-0.72.hdf5')
    result = model.evaluate(test_dataset)
    print(f'Test accuracy: {result[1]}')

    This prints the following:

    Test accuracy: 0.71956664

Let's learn how it all works in the next section.

How it works…

The key to ResNet is the residual module, which we implemented in Step 3. A residual module is a micro-architecture that can be reused many times to create a macro-architecture, thus achieving great depths. The residual_module() function receives the input data (data), the number of filters (filters), the stride (stride) of the convolutional blocks, a reduce flag to indicate whether we want to reduce the spatial size of the shortcut branch by applying a 1x1 convolution (a technique used to reduce the dimensionality of the output volumes of the filters), and parameters to adjust the amount of regularization (reg) and batch normalization applied to the different layers (bn_eps and bn_momentum).

A residual module comprises two branches: the first one is the skip connection, also known as the shortcut branch, which is basically the same as the input. The second or main branch is composed of three convolution blocks: a 1x1 with a quarter of the filters, a 3x3 one, also with a quarter of the filters, and finally another 1x1, which uses all the filters. The shortcut and main branches are concatenated in the end using the Add() layer.

build_network() allows us to specify the number of stages to use, and also the number of filters per stage. We start by applying a 3x3 convolution to the input (after being batch normalized). Then we proceed to create the stages. A stage is a series of residual modules connected to each other. The length of the stages list controls the number of stages to create, and each element in this list controls the number of layers in that particular stage. The filters parameter contains the number of filters to use in each residual block within a stage. Finally, we built a fully connected network, Softmax-activated, on top of the stages with as many units as there are classes in the dataset (in this case, 10).

Because ResNet is a very deep, heavy, and slow-to-train architecture, we checkpointed the model after each epoch. In this recipe, we obtained the best model in epoch 38, which produced 72% accuracy on the test set, a respectable performance considering that CINIC-10 is not an easy dataset and that we did not apply any data augmentation or transfer learning.

See also

For more information on the CINIC-10 dataset, visit this link: https://datashare.is.ed.ac.uk/handle/10283/3192.

Classifying images with a pre-trained network using the Keras API

We do not always need to train a classifier from scratch, especially when the images we want to categorize resemble ones that another network trained on. In these instances, we can simply reuse the model, saving ourselves lots of time. In this recipe, we'll use a pre-trained network on ImageNet to classify a custom image.

Let's begin!

Getting ready

We will need Pillow. We can install it as follows:

$> pip install Pillow

You're free to use your own images in the recipe. Alternatively, you can download the one at this link: https://github.com/PacktPublishing/Tensorflow-2.0-Computer-Vision-Cookbook/tree/master/ch2/recipe5/dog.jpg.

Here's the image we'll pass to the classifier:

Figure 2.4 – Image passed to the pre-trained classifier

Figure 2.4 – Image passed to the pre-trained classifier

How to do it…

As we'll see in this section, re-using a pre-trained classifier is very easy!

  1. Import the required packages. These include the pre-trained network used for classification, as well as some helper functions to pre process the images:
    import matplotlib.pyplot as plt
    import numpy as np
    from tensorflow.keras.applications import imagenet_utils
    from tensorflow.keras.applications.inception_v3 import *
    from tensorflow.keras.preprocessing.image import *
  2. Instantiate an InceptionV3 network pre-trained on ImageNet:
    model = InceptionV3(weights='imagenet')
  3. Load the image to classify. InceptionV3 takes a 299x299x3 image, so we must resize it accordingly:
    image = load_img('dog.jpg', target_size=(299, 299))
  4. Convert the image to a numpy array, and wrap it into a singleton batch:
    image = img_to_array(image)
    image = np.expand_dims(image, axis=0)
  5. Pre process the image the same way InceptionV3 does:
    image = preprocess_input(image)
  6. Use the model to make predictions on the image, and then decode the predictions to a matrix:
    predictions = model.predict(image)
    prediction_matrix = (imagenet_utils
                         .decode_predictions(predictions))
  7. Examine the top 5 predictions along with their probability:
    for i in range(5):
        _, label, probability = prediction_matrix[0][i]
        print(f'{i + 1}. {label}: {probability * 100:.3f}%')

    This produces the following output:

    1. pug: 85.538%
    2. French_bulldog: 0.585%
    3. Brabancon_griffon: 0.543%
    4. Boston_bull: 0.218%
    5. bull_mastiff: 0.125%
  8. Plot the original image with its most probable label:
    _, label, _ = prediction_matrix[0][0]
    plt.figure()
    plt.title(f'Label: {label}.')
    original = load_img('dog.jpg')
    original = img_to_array(original)
    plt.imshow(original / 255.0)
    plt.show()

    This block generates the following image:

Figure 2.5 – Correctly classified image

Figure 2.5 – Correctly classified image

Let's see how it all works in the next section.

How it works…

As evidenced here, in order to classify images effortlessly, using a pre-trained network on ImageNet, we just need to instantiate the proper model with the right weights, like this: InceptionV3(weights='imagenet'). This will download the architecture and the weights if it is the first time we are using them; otherwise, a version of these files will be cached in our system.

Then, we loaded the image we wanted to classify, resized it to dimensions compatible with InceptionV3 (299x299x3), converted it into a singleton batch with np.expand_dims(image, axis=0), and pre processed it the same way InceptionV3 did when it was trained, with preprocess_input(image).

Next, we got the predictions from the model, which we need to transform to a prediction matrix with the help of imagenet_utils.decode_predictions(predictions). This matrix contains the label and probabilities in the 0th row, which we inspected to get the five most probable classes.

See also

You can read more about Keras pre-trained models here: https://www.tensorflow.org/api_docs/python/tf/keras/applications.

Classifying images with a pre-trained network using TensorFlow Hub

TensorFlow Hub (TFHub) is a repository of hundreds of machine learning models contributed to by the big and rich community that surrounds TensorFlow. Here we can find models for a myriad of different tasks, not only for computer vision but for applications in many different domains, such as Natural Language Processing (NLP) and reinforcement learning.

In this recipe, we'll use a model trained on ImageNet, hosted on TFHub, to make predictions on a custom image. Let's begin!

Getting ready

We'll need the tensorflow-hub and Pillow packages, which can be easily installed using pip, as follows:

$> pip install tensorflow-hub Pillow

If you want to use the same image we use in this recipe, you can download it here: https://github.com/PacktPublishing/Tensorflow-2.0-Computer-Vision-Cookbook/tree/master/ch2/recipe6/beetle.jpg.

Here's the image we'll classify:

Figure 2.6 – Image to be classified

Figure 2.6 – Image to be classified

Let's head to the next section.

How to do it…

Let's proceed with the recipe steps:

  1. Import the necessary packages:
    import matplotlib.pyplot as plt
    import numpy as np
    import tensorflow_hub as hub
    from tensorflow.keras import Sequential
    from tensorflow.keras.preprocessing.image import *
    from tensorflow.keras.utils import get_file
  2. Define the URL of the pre-trained ResNetV2152 classifier in TFHub:
    classifier_url = ('https://tfhub.dev/google/imagenet/'
                      'resnet_v2_152/classification/4')
  3. Download and instantiate the classifier hosted on TFHub:
    model = Sequential([
        hub.KerasLayer(classifier_url, input_shape=(224, 
                                                  224, 3))])
  4. Load the image we'll classify, convert it to a numpy array, normalize it, and wrap it into a singleton batch:
    image = load_img('beetle.jpg', target_size=(224, 224))
    image = img_to_array(image)
    image = image / 255.0
    image = np.expand_dims(image, axis=0)
  5. Use the pre-trained model to classify the image:
    predictions = model.predict(image)
  6. Extract the index of the most probable prediction:
    predicted_index = np.argmax(predictions[0], axis=-1)
  7. Download the ImageNet labels into a file named ImageNetLabels.txt:
    file_name = 'ImageNetLabels.txt'
    file_url = ('https://storage.googleapis.com/'
        'download.tensorflow.org/data/ImageNetLabels.txt')
             labels_path = get_file(file_name, file_url)
  8. Read the labels into a numpy array:
    with open(labels_path) as f:
        imagenet_labels = np.array(f.read().splitlines())
  9. Extract the name of the class corresponding to the index of the most probable prediction:
    predicted_class = imagenet_labels[predicted_index]
  10. Plot the original image with its most probable label:
    plt.figure()
    plt.title(f'Label: {predicted_class}.')
    original = load_img('beetle.jpg')
    original = img_to_array(original)
    plt.imshow(original / 255.0)
    plt.show()

    This produces the following:

Figure 2.7 – Correctly classified image

Figure 2.7 – Correctly classified image

Let's see how it all works.

How it works…

After importing the relevant packages, we proceeded to define the URL of the model we wanted to use to classify our input image. To download and convert such a network into a Keras model, we used the convenient hub.KerasLayer class in Step 3. Then, in Step 4, we loaded the image we wanted to classify into memory, making sure its dimensions match the ones the network expects: 224x224x3.

Steps 5 and 6 perform the classification and extract the most probable category, respectively. However, to make this prediction human-readable, we downloaded a plain text file with all ImageNet labels in Step 7, which we then parsed using numpy, allowing us to use the index of the most probable category to obtain the corresponding label, finally displayed in Step 10 along with the input image.

See also

You can learn more about the pre-trained model we used here: https://tfhub.dev/google/imagenet/resnet_v2_152/classification/4.

Using data augmentation to improve performance with the Keras API

More often than not, we can benefit from providing more data to our model. But data is expensive and scarce. Is there a way to circumvent this limitation? Yes, there is! We can synthesize new training examples by performing little modifications on the ones we already have, such as random rotations, random cropping, and horizontal flipping, among others. In this recipe, we'll learn how to use data augmentation with the Keras API to improve performance.

Let's begin.

Getting ready

We must install Pillow and tensorflow_docs:

$> pip install Pillow git+https://github.com/tensorflow/docs

In this recipe, we'll use the Caltech 101 dataset, which is available here: http://www.vision.caltech.edu/Image_Datasets/Caltech101/. Download and decompress 101_ObjectCategories.tar.gz to your preferred location. From now on, we assume the data is inside the ~/.keras/datasets directory, under the name 101_ObjectCategories.

Here are sample images from Caltech 101:

Figure 2.8 – Caltech 101 sample images

Figure 2.8 – Caltech 101 sample images

Let's implement!

How to do it…

The steps listed here are necessary to complete the recipe. Let's get started!

  1. Import the required modules:
    import os
    import pathlib
    import matplotlib.pyplot as plt
    import numpy as np
    import tensorflow_docs as tfdocs
    import tensorflow_docs.plots
    from glob import glob
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import LabelBinarizer
    from tensorflow.keras.layers import *
    from tensorflow.keras.models import Model
    from tensorflow.keras.preprocessing.image import *
  2. Define a function to load all images in the dataset, along with their labels, based on their file paths:
    def load_images_and_labels(image_paths, target_size=(64, 64)):
        images = []
        labels = []
        for image_path in image_paths:
            image = load_img(image_path, 
                             target_size=target_size)
            image = img_to_array(image)
            label = image_path.split(os.path.sep)[-2]
            images.append(image)
            labels.append(label)
        return np.array(images), np.array(labels)
  3. Define a function to build a smaller version of VGG:
    def build_network(width, height, depth, classes):
        input_layer = Input(shape=(width, height, depth))
        x = Conv2D(filters=32,
                   kernel_size=(3, 3),
                   padding='same')(input_layer)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = Conv2D(filters=32,
                   kernel_size=(3, 3),
                   padding='same')(x)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(rate=0.25)(x)
        x = Conv2D(filters=64,
                   kernel_size=(3, 3),
                   padding='same')(x)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = Conv2D(filters=64,
                   kernel_size=(3, 3),
                   padding='same')(x)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(rate=0.25)(x)
        x = Flatten()(x)
        x = Dense(units=512)(x)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = Dropout(rate=0.25)(x)
        x = Dense(units=classes)(x)
        output = Softmax()(x)
        return Model(input_layer, output)
  4. Define a function to plot and save a model's training curve:
    def plot_model_history(model_history, metric, 
                           plot_name):
        plt.style.use('seaborn-darkgrid')
        plotter = tfdocs.plots.HistoryPlotter()
        plotter.plot({'Model': model_history}, 
                      metric=metric)
        plt.title(f'{metric.upper()}')
        plt.ylim([0, 1])
        plt.savefig(f'{plot_name}.png')
        plt.close()
  5. Set the random seed:
    SEED = 999
    np.random.seed(SEED)
  6. Load the paths to all images in the dataset, excepting the ones of the BACKGROUND_Google class:
    base_path = (pathlib.Path.home() / '.keras' / 
                 'datasets' /
                 '101_ObjectCategories')
    images_pattern = str(base_path / '*' / '*.jpg')
    image_paths = [*glob(images_pattern)]
    image_paths = [p for p in image_paths if
                   p.split(os.path.sep)[-2] !=
                  'BACKGROUND_Google']
  7. Compute the set of classes in the dataset:
    classes = {p.split(os.path.sep)[-2] for p in 
              image_paths}
  8. Load the dataset into memory, normalizing the images and one-hot encoding the labels:
    X, y = load_images_and_labels(image_paths)
    X = X.astype('float') / 255.0
    y = LabelBinarizer().fit_transform(y)
  9. Create the training and testing subsets:
    (X_train, X_test,
     y_train, y_test) = train_test_split(X, y,
                                         test_size=0.2,
                                        random_state=SEED)
  10. Build, compile, train, and evaluate a neural network without data augmentation:
    EPOCHS = 40
    BATCH_SIZE = 64
    model = build_network(64, 64, 3, len(classes))
    model.compile(loss='categorical_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])
    history = model.fit(X_train, y_train,
                        validation_data=(X_test, y_test),
                        epochs=EPOCHS,
                        batch_size=BATCH_SIZE)
    result = model.evaluate(X_test, y_test)
    print(f'Test accuracy: {result[1]}')
    plot_model_history(history, 'accuracy', 'normal')

    The accuracy on the test set is as follows:

    Test accuracy: 0.61347926

    And here's the accuracy curve:

    Figure 2.9 – Training and validation accuracy for a network without data augmentation

    Figure 2.9 – Training and validation accuracy for a network without data augmentation

  11. Build, compile, train, and evaluate the same network, this time with data augmentation:
    model = build_network(64, 64, 3, len(classes))
    model.compile(loss='categorical_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])
    augmenter = ImageDataGenerator(horizontal_flip=True,
                                   rotation_range=30,
                                   width_shift_range=0.1,
                                   height_shift_range=0.1,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   fill_mode='nearest')
    train_generator = augmenter.flow(X_train, y_train, 
                                      BATCH_SIZE)
    hist = model.fit(train_generator,
                     steps_per_epoch=len(X_train) // 
                     BATCH_SIZE,
                     validation_data=(X_test, y_test),
                     epochs=EPOCHS)
    result = model.evaluate(X_test, y_test)
    print(f'Test accuracy: {result[1]}')
    plot_model_history(hist, 'accuracy', 'augmented')

    The accuracy on the test set when we use data augmentation is as follows:

    Test accuracy: 0.65207374

    And the accuracy curve looks like this:

Figure 2.10 – Training and validation accuracy for a network with data augmentation

Figure 2.10 – Training and validation accuracy for a network with data augmentation

Comparing Steps 10 and 11, we observe a noticeable gain in performance by using data augmentation. Let's understand better what we did in the next section.

How it works…

In this recipe, we implemented a scaled-down version of VGG on the challenging Caltech 101 dataset. First, we trained a network only on the original data, and then using data augmentation. The first network (see Step 10) obtained an accuracy level on the test set of 61.3% and clearly shows signs of overfitting, because the gap that separates the training and validation accuracy curves is very wide. On the other hand, by applying a series of random perturbations, through ImageDataGenerator(), such as horizontal flips, rotations, width, and height shifting, among others (see Step 11), we increased the accuracy on the test set to 65.2%. Also, the gap between the training and validation accuracy curves is much smaller this time, which suggests a regularization effect resulting from the application of data augmentation.

See also

You can learn more about Caltech 101 here: http://www.vision.caltech.edu/Image_Datasets/Caltech101/.

Using data augmentation to improve performance with the tf.data and tf.image APIs

Data augmentation is a powerful technique we can apply to artificially increment the size of our dataset, by creating slightly modified copies of the images at our disposal. In this recipe, we'll leverage the tf.data and tf.image APIs to increase the performance of a CNN trained on the challenging Caltech 101 dataset.

Getting ready

We must install tensorflow_docs:

$> pip install git+https://github.com/tensorflow/docs

In this recipe, we'll use the Caltech 101 dataset, which is available here: http://www.vision.caltech.edu/Image_Datasets/Caltech101/. Download and decompress 101_ObjectCategories.tar.gz to your preferred location. From now on, we assume the data is inside the ~/.keras/datasets directory, in a folder named 101_ObjectCategories.

Here are some sample images from Caltech 101:

Figure 2.11 – Caltech 101 sample images

Figure 2.11 – Caltech 101 sample images

Let's go to the next section.

How to do it…

Let's go over the steps required to complete this recipe.

  1. Import the necessary dependencies:
    import os
    import pathlib
    import matplotlib.pyplot as plt
    import numpy as np
    import tensorflow as tf
    import tensorflow_docs as tfdocs
    import tensorflow_docs.plots
    from glob import glob
    from sklearn.model_selection import train_test_split
    from tensorflow.keras.layers import *
    from tensorflow.keras.models import Model
  2. Create an alias for the tf.data.experimental.AUTOTUNE flag, which we'll use later on:
    AUTOTUNE = tf.data.experimental.AUTOTUNE
  3. Define a function to create a smaller version of VGG. Start by creating the input layer and the first block of two convolutions with 32 filters each:
    def build_network(width, height, depth, classes):
        input_layer = Input(shape=(width, height, depth))
        x = Conv2D(filters=32,
                   kernel_size=(3, 3),
                   padding='same')(input_layer)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = Conv2D(filters=32,
                   kernel_size=(3, 3),
                   padding='same')(x)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(rate=0.25)(x)
  4. Continue with the second block of two convolutions, this time each with 64 kernels:
        x = Conv2D(filters=64,
                   kernel_size=(3, 3),
                   padding='same')(x)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = Conv2D(filters=64,
                   kernel_size=(3, 3),
                   padding='same')(x)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(rate=0.25)(x)
  5. Define the last part of the architecture, which consists of a series of fully connected layers:
        x = Flatten()(x)
        x = Dense(units=512)(x)
        x = ReLU()(x)
        x = BatchNormalization(axis=-1)(x)
        x = Dropout(rate=0.5)(x)
        x = Dense(units=classes)(x)
        output = Softmax()(x)
        return Model(input_layer, output)
  6. Define a function to plot and save the training curves of a model, given its training history:
    def plot_model_history(model_history, metric, 
                           plot_name):
        plt.style.use('seaborn-darkgrid')
        plotter = tfdocs.plots.HistoryPlotter()
        plotter.plot({'Model': model_history}, 
                      metric=metric)
        plt.title(f'{metric.upper()}')
        plt.ylim([0, 1])
        plt.savefig(f'{plot_name}.png')
        plt.close()
  7. Define a function to load an image and one-hot encode its label, based on the image's file path:
    def load_image_and_label(image_path, target_size=(64, 
                                                     64)):
        image = tf.io.read_file(image_path)
        image = tf.image.decode_jpeg(image, channels=3)
        image = tf.image.convert_image_dtype(image, 
                                             np.float32)
        image = tf.image.resize(image, target_size)
        label = tf.strings.split(image_path, os.path.sep)[-2]
        label = (label == CLASSES)  # One-hot encode.
        label = tf.dtypes.cast(label, tf.float32)
        return image, label
  8. Define a function to augment an image by performing random transformations on it:
    def augment(image, label):
        image = tf.image.resize_with_crop_or_pad(image, 
                                                 74, 74)
        image = tf.image.random_crop(image, size=(64, 64, 3))
        image = tf.image.random_flip_left_right(image)
        image = tf.image.random_brightness(image, 0.2)
        return image, label
  9. Define a function to prepare a tf.data.Dataset of images, based on a glob-like pattern that refers to the folder where they live:
    def prepare_dataset(data_pattern):
        return (tf.data.Dataset
                .from_tensor_slices(data_pattern)
                .map(load_image_and_label,
                     num_parallel_calls=AUTOTUNE))
  10. Set the random seed:
    SEED = 999
    np.random.seed(SEED)
  11. Load the paths to all images in the dataset, excepting the ones of the BACKGROUND_Google class:
    base_path = (pathlib.Path.home() / '.keras' / 
                 'datasets' /
                 '101_ObjectCategories')
    images_pattern = str(base_path / '*' / '*.jpg')
    image_paths = [*glob(images_pattern)]
    image_paths = [p for p in image_paths if
                   p.split(os.path.sep)[-2] !=
                  'BACKGROUND_Google']
  12. Compute the unique categories in the dataset:
    CLASSES = np.unique([p.split(os.path.sep)[-2]
                         for p in image_paths])
  13. Split the image paths into training and testing subsets:
    train_paths, test_paths = train_test_split(image_paths,
                                              test_size=0.2,
                                          random_state=SEED)
  14. Prepare the training and testing datasets, without augmentation:
    BATCH_SIZE = 64
    BUFFER_SIZE = 1024
    train_dataset = (prepare_dataset(train_paths)
                     .batch(BATCH_SIZE)
                     .shuffle(buffer_size=BUFFER_SIZE)
                     .prefetch(buffer_size=BUFFER_SIZE))
    test_dataset = (prepare_dataset(test_paths)
                    .batch(BATCH_SIZE)
                    .prefetch(buffer_size=BUFFER_SIZE))
  15. Instantiate, compile, train and evaluate the network:
    EPOCHS = 40
    model = build_network(64, 64, 3, len(CLASSES))
    model.compile(loss='categorical_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])
    history = model.fit(train_dataset,
                        validation_data=test_dataset,
                        epochs=EPOCHS)
    result = model.evaluate(test_dataset)
    print(f'Test accuracy: {result[1]}')
    plot_model_history(history, 'accuracy', 'normal')

    The accuracy on the test set is:

    Test accuracy: 0.6532258

    And here's the accuracy curve:

    Figure 2.12 – Training and validation accuracy for a network without data augmentation

    Figure 2.12 – Training and validation accuracy for a network without data augmentation

  16. Prepare the training and testing sets, this time applying data augmentation to the training set:
    train_dataset = (prepare_dataset(train_paths)
                     .map(augment, 
                         num_parallel_calls=AUTOTUNE)
                     .batch(BATCH_SIZE)
                     .shuffle(buffer_size=BUFFER_SIZE)
                     .prefetch(buffer_size=BUFFER_SIZE))
    test_dataset = (prepare_dataset(test_paths)
                    .batch(BATCH_SIZE)
                    .prefetch(buffer_size=BUFFER_SIZE))
  17. Instantiate, compile, train, and evaluate the network on the augmented data:
    model = build_network(64, 64, 3, len(CLASSES))
    model.compile(loss='categorical_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])
    history = model.fit(train_dataset,
                        validation_data=test_dataset,
                        epochs=EPOCHS)
    result = model.evaluate(test_dataset)
    print(f'Test accuracy: {result[1]}')
    plot_model_history(history, 'accuracy', 'augmented')

    The accuracy on the test set when we use data augmentation is as follows:

    Test accuracy: 0.74711984

    And the accuracy curve looks like this:

Figure 2.13 – Training and validation accuracy for a network with data augmentation

Figure 2.13 – Training and validation accuracy for a network with data augmentation

Let's understand what we just did in the next section.

How it works…

We just implemented a trimmed down version of the famous VGG architecture, trained on the Caltech 101 dataset. To better understand the advantages of data augmentation, we fitted a first version on the original data, without any modification, obtaining an accuracy level of 65.32% on the test set. This first model displays signs of overfitting, because the gap that separates the training and validation accuracy curves widens early in the training process.

Next, we trained the same network on an augmented dataset (see Step 15), using the augment() function defined earlier. This greatly improved the model's performance, reaching a respectable accuracy of 74.19% on the test set. Also, the gap between the training and validation accuracy curves is noticeably smaller, which suggests a regularization effect coming out from the application of data augmentation.

See also

You can learn more about Caltech 101 here: http://www.vision.caltech.edu/Image_Datasets/Caltech101/.

Left arrow icon Right arrow icon

Key benefits

  • Develop, train, and use deep learning algorithms for computer vision tasks using TensorFlow 2.x
  • Discover practical recipes to overcome various challenges faced while building computer vision models
  • Enable machines to gain a human level understanding to recognize and analyze digital images and videos

Description

Computer vision is a scientific field that enables machines to identify and process digital images and videos. This book focuses on independent recipes to help you perform various computer vision tasks using TensorFlow. The book begins by taking you through the basics of deep learning for computer vision, along with covering TensorFlow 2.x’s key features, such as the Keras and tf.data.Dataset APIs. You’ll then learn about the ins and outs of common computer vision tasks, such as image classification, transfer learning, image enhancing and styling, and object detection. The book also covers autoencoders in domains such as inverse image search indexes and image denoising, while offering insights into various architectures used in the recipes, such as convolutional neural networks (CNNs), region-based CNNs (R-CNNs), VGGNet, and You Only Look Once (YOLO). Moving on, you’ll discover tips and tricks to solve any problems faced while building various computer vision applications. Finally, you’ll delve into more advanced topics such as Generative Adversarial Networks (GANs), video processing, and AutoML, concluding with a section focused on techniques to help you boost the performance of your networks. By the end of this TensorFlow book, you’ll be able to confidently tackle a wide range of computer vision problems using TensorFlow 2.x.

Who is this book for?

This book is for computer vision developers and engineers, as well as deep learning practitioners looking for go-to solutions to various problems that commonly arise in computer vision. You will discover how to employ modern machine learning (ML) techniques and deep learning architectures to perform a plethora of computer vision tasks. Basic knowledge of Python programming and computer vision is required.

What you will learn

  • Understand how to detect objects using state-of-the-art models such as YOLOv3
  • Use AutoML to predict gender and age from images
  • Segment images using different approaches such as FCNs and generative models
  • Learn how to improve your network's performance using rank-N accuracy, label smoothing, and test time augmentation
  • Enable machines to recognize people's emotions in videos and real-time streams
  • Access and reuse advanced TensorFlow Hub models to perform image classification and object detection
  • Generate captions for images using CNNs and RNNs

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Feb 26, 2021
Length: 542 pages
Edition : 1st
Language : English
ISBN-13 : 9781838820688
Vendor :
Google
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning

Product Details

Publication date : Feb 26, 2021
Length: 542 pages
Edition : 1st
Language : English
ISBN-13 : 9781838820688
Vendor :
Google
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 142.97
The TensorFlow Workshop
$43.99
Hands-On Image Generation with TensorFlow
$54.99
TensorFlow 2.0 Computer Vision Cookbook
$43.99
Total $ 142.97 Stars icon

Table of Contents

13 Chapters
Chapter 1: Getting Started with TensorFlow 2.x for Computer Vision Chevron down icon Chevron up icon
Chapter 2: Performing Image Classification Chevron down icon Chevron up icon
Chapter 3: Harnessing the Power of Pre-Trained Networks with Transfer Learning Chevron down icon Chevron up icon
Chapter 4: Enhancing and Styling Images with DeepDream, Neural Style Transfer, and Image Super-Resolution Chevron down icon Chevron up icon
Chapter 5: Reducing Noise with Autoencoders Chevron down icon Chevron up icon
Chapter 6: Generative Models and Adversarial Attacks Chevron down icon Chevron up icon
Chapter 7: Captioning Images with CNNs and RNNs Chevron down icon Chevron up icon
Chapter 8: Fine-Grained Understanding of Images through Segmentation Chevron down icon Chevron up icon
Chapter 9: Localizing Elements in Images with Object Detection Chevron down icon Chevron up icon
Chapter 10: Applying the Power of Deep Learning to Videos Chevron down icon Chevron up icon
Chapter 11: Streamlining Network Implementation with AutoML Chevron down icon Chevron up icon
Chapter 12: Boosting Performance Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3
(7 Ratings)
5 star 71.4%
4 star 14.3%
3 star 0%
2 star 0%
1 star 14.3%
Filter icon Filter
Top Reviews

Filter reviews by




Vipin Singh Jun 01, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Excellent book for data scientist enthusiam to build carrier in Computer Vision using tensorflow 2. This book provides a comprehensive coverage of everything associated with computer vision, right from the very basics to the latest and greatest models and techniques. The book builds onto concepts in a very gradual manner for the beginners while the hands-on examples keep things interesting for all types of readers. The book provides code examples on a variety of topics starting from image classification, CNNs, RNNs, transfer learning, GAN's, Image segmentation and object detection. The examples are well choosen and embibes the core concepts. The best part with the book is it covers topic like AutoML and method to boost performance.
Amazon Verified review Amazon
Aishwary Mar 09, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
By far, this is one of the best books to understand how to apply deep learning in the field of computer vision. The concepts have been clearly explained. It covers almost everything from image classification, image segmentation, object detection, etc. It introduces both the methods of loading images for model training. One via the Keras API and the other by tf.data (which is the most important module to develop flexible and efficient pipelines). It teaches you how to augment data to improvise a model performance. Then, it goes into the concepts of Auto-encoder, AutoML. This book even has a dedicated chapter about using TensorFlow for video processing. The best part about this book that I liked the most was the references to the relevant materials that the author has provided after each concept explained.
Amazon Verified review Amazon
Ashwini Jul 15, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is an awesome book for those working in Computer Vision. The books starts with the Tensorflow 2.0 introduction and how Keras can be used in Computer Vision. The following chapters dive deeper into Machine Learning models like ResNet and image classification. A great book for those who are starting their career in Computer Vision as well as for those who are already working in it.
Amazon Verified review Amazon
LeoQuiroa May 06, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book covers a variety of topics from basics to experts which makes it attractive for many people interested to work with images and videos. For example, it covers a broad range of algorithms like the classic ones such as Classification but also a lot of new stuff like Transfer Learning, Autoencoders, GAN, and so on. Lastly, the book has a good practice to include a hands-on experience coding for every concept, so you will learn by doing.
Amazon Verified review Amazon
Saquib Mohiuddin Apr 18, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
one of the best books to understand and applying DL CV concepts in the field of computer vision. The content is simple to understand and concise. covers all the relevant concepts from segmentation to classification.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.