Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
The TensorFlow Workshop

You're reading from   The TensorFlow Workshop A hands-on guide to building deep learning models from scratch using real-world datasets

Arrow left icon
Product type Paperback
Published in Dec 2021
Publisher Packt
ISBN-13 9781800205253
Length 600 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (4):
Arrow left icon
Matthew Moocarme Matthew Moocarme
Author Profile Icon Matthew Moocarme
Matthew Moocarme
Abhranshu Bagchi Abhranshu Bagchi
Author Profile Icon Abhranshu Bagchi
Abhranshu Bagchi
Anthony Maddalone Anthony Maddalone
Author Profile Icon Anthony Maddalone
Anthony Maddalone
Anthony So Anthony So
Author Profile Icon Anthony So
Anthony So
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface
1. Introduction to Machine Learning with TensorFlow 2. Loading and Processing Data FREE CHAPTER 3. TensorFlow Development 4. Regression and Classification Models 5. Classification Models 6. Regularization and Hyperparameter Tuning 7. Convolutional Neural Networks 8. Pre-Trained Networks 9. Recurrent Neural Networks 10. Custom TensorFlow Components 11. Generative Models Appendix

Processing Image Data

A plethora of images is being generated every day by various organizations that can be used to create predictive models for tasks such as object detection, image classification, and object segmentation. When working with image data and some other raw data types, you often need to preprocess the data. Creating models from raw data with minimal preprocessing is one of the biggest benefits of using ANNs for modeling since the feature engineering step is minimal. Feature engineering usually involves using domain knowledge to create features out of the raw data, which is time consuming and has no guarantee of improvements in model performance. Utilizing ANNs with no feature engineering streamlines the training process and has no need for domain knowledge.

For example, locating tumors in medical images requires expert knowledge from those who have been trained for many years, but for ANNs, all that is required is sufficient labeled data for training. There will be a small amount of preprocessing that generally needs to be applied to these images. These steps are optional but helpful for standardizing the training process and creating performant models.

One preprocessing step is rescaling. Since images have color values that are integers that range between 0 and 255, they are scaled to have values between 0 and 1, similar to Activity 2.01, Loading Tabular Data and Rescaling Numerical Fields with a MinMax Scaler. Another common preprocessing step that you will explore later in this section is image augmentation, which is essentially the act of augmenting images to add a greater number of training examples and build a more robust model.

This section also covers batch processing. Batch processing loads in the training data one batch at a time. This can result in slower training times than if the data was loaded in at once; however, this does allow you to train your models on very large-volume datasets. Training on images or audio are examples that often require large volumes to achieve performant results.

For example, a typical image may be 100 KB in size. For a training dataset of 1 million images, you would need 100 GB of memory, which may be unattainable to most. If the model is trained in batches of 32 images, the memory requirement is orders of magnitude less. Batch training allows you to augment the training data, as you will explore in a later section.

Images can be loaded into memory using a class named ImageDataGenerator, which can be imported from Keras' preprocessing package. This is a class originally from Keras that can now be used in TensorFlow. When loading in images, you can rescale them. It is common practice to rescale images by the value of 1/255 pixels. This means that images that have values from 0 to 255 will now have values from 0 to 1.

ImageDataGenerator can be initialized with rescaling, as follows:

datagenerator = ImageDataGenerator(rescale = 1./255)

Once the ImageDataGenerator class has been initialized, you can use the flow_from_directory method and pass in the directory that the images are located in. The directory should include sub-directories labeled with the class labels, and they should contain the images of the corresponding class. Another argument to be passed in is the desired size for the images, the batch size, and the class mode. The class mode determines the type of label arrays that are produced. Using the flow_from_directory method for binary classification with a batch size of 25 and an image size of 64x64 can be done as follows:

dataset = datagenerator.flow_from_directory\
          ('path/to/data',\
           target_size = (64, 64),\
           batch_size = 25,\
           class_mode = 'binary')

In the following exercise, you will load images into memory by utilizing the ImageDataGenerator class.

Note

The image data provided comes from the Open Image dataset, a full description of which can be found here: https://storage.googleapis.com/openimages/web/index.html.

Images can be viewed by plotting them using Matplotlib. This is a useful exercise for verifying that the images match their respective labels.

Exercise 2.03: Loading Image Data for Batch Processing

In this exercise, you'll learn how to load in image data for batch processing. The image_data folder contains a set of images of boats and airplanes. You will load the images of boats and airplanes for batch processing and rescale them so that the image values range between 0 and 1. You are then tasked with printing the labeled images of a batch from the data generator.

Note

You can find image_data here: https://packt.link/jZ2oc.

Perform the following steps to complete this exercise:

  1. Open a new Jupyter notebook to implement this exercise. Save the file as Exercise2-03.ipnyb.
  2. In a new Jupyter Notebook cell, import the ImageDataGenerator class from tensorflow.keras.preprocessing.image:
    from tensorflow.keras.preprocessing.image \
         import ImageDataGenerator
  3. Instantiate the ImageDataGenerator class and pass the rescale argument with the value 1./255 to convert image values so that they're between 0 and 1:
    train_datagen = ImageDataGenerator(rescale =  1./255)
  4. Use the data generator's flow_from_directory method to direct the data generator to the image data. Pass in the arguments for the target size, the batch size, and the class mode:
    training_set = train_datagen.flow_from_directory\
                   ('image_data',\
                    target_size = (64, 64),\
                    batch_size = 25,\
                    class_mode = 'binary')
  5. Create a function to display the images in the batch. The function will plot the first 25 images in a 5x5 array with their associated labels:
    import matplotlib.pyplot as plt
    def show_batch(image_batch, label_batch):\
        lookup = {v: k for k, v in \
                  training_set.class_indices.items()}
        label_batch = [lookup[label] for label in \
                       label_batch]
        plt.figure(figsize=(10,10))
        for n in range(25):
            ax = plt.subplot(5,5,n+1)
            plt.imshow(image_batch[n])
            plt.title(label_batch[n].title())
            plt.axis('off')
  6. Take a batch from the data generator and pass it to the function to display the images and their labels:
    image_batch, label_batch = next(training_set)
    show_batch(image_batch, label_batch)

    The output will be as follows:

    Figure 2.12: The images from a batch

Figure 2.12: The images from a batch

Here, you can see the output of a batch of images of boats and airplanes that can be input into a model. Note that all the images are the same size, which was achieved by modifying the aspect ratio of the images. This ensures consistency in the images as they are passed into an ANN.

In this exercise, you learned how to import images in batches so they can be used for training ANNs. Images are loaded one batch at a time and by limiting the number of training images per batch, you can ensure that the RAM of the machine is not exceeded.

In the following section, you will see how to augment images as they are loaded in.

You have been reading a chapter from
The TensorFlow Workshop
Published in: Dec 2021
Publisher: Packt
ISBN-13: 9781800205253
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime