You're reading from Mastering PyTorch Create and deploy deep learning models from CNNs to multimodal models, LLMs, and beyond

Product type Paperback

Published in May 2024

Publisher Packt

ISBN-13 9781801074308

Length 558 pages

Edition 2nd Edition

Tools

PyTorch

Concepts

Deep Learning

Author (1):

Ashish Ranjan Jha

View More author details

Table of Contents (21) Chapters

Preface

1. Overview of Deep Learning Using PyTorch

2. Deep CNN Architectures FREE CHAPTER

3. Combining CNNs and LSTMs

4. Deep Recurrent Model Architectures

5. Advanced Hybrid Models

6. Graph Neural Networks

7. Music and Text Generation with PyTorch

8. Neural Style Transfer

9. Deep Convolutional GANs

10. Image Generation Using Diffusion

11. Deep Reinforcement Learning

12. Model Training Optimizations

13. Operationalizing PyTorch Models into Production

14. PyTorch on Mobile Devices

15. Rapid Prototyping with PyTorch

16. PyTorch and AutoML

17. PyTorch and Explainable AI

18. Recommendation Systems with PyTorch

19. PyTorch and Hugging Face

20. Index

Running a pretrained VGG model

We have already discussed LeNet and AlexNet, two of the foundational CNN architectures. As we progress in the chapter, we will explore increasingly complex CNN models. That being said, the key principles in building these model architectures will be the same. We will see a modular model-building approach in putting together convolutional layers, pooling layers, and fully connected layers into blocks/modules and then stacking these blocks sequentially or in a branched manner. In this section, we look at the successor to AlexNet – VGGNet.

The name VGG is derived from the Visual Geometry Group of Oxford University, where this model was invented. Compared to the 8 layers and 60 million parameters of AlexNet, VGG consists of 13 layers (10 convolutional layers and 3 fully connected layers) and 138 million parameters. VGG basically stacks more layers onto the AlexNet architecture with smaller convolution kernels (2x2 or 3x3).

Hence, VGG’s novelty lies in the unprecedented level of depth that it brings with its architecture. Figure 2.12 shows the VGG architecture:

Figure 2.12: VGG16 architecture

The preceding VGG architecture is called VGG13, because of the 13 layers. Other variants are VGG16 and VGG19, consisting of 16 and 19 layers, respectively. There is another set of variants – VGG13_bn, VGG16_bn, and VGG19_bn, where bn suggests that these models also consist of batch-normalization layers.

PyTorch’s torchvision.model sub-package provides the pretrained VGG model (with all of the six variants discussed earlier) trained on the ImageNet dataset. In the following exercise, we will use the pretrained VGG13 model to make predictions on a small dataset of bees and ants (used in the previous exercise). We will focus on the key pieces of code here, as most other parts of our code will overlap with that of the previous exercises. We can always refer to our notebooks to explore the full code [7]:

First, we need to import dependencies, including torchvision.models.
Download the data and set up the ants and bees dataset and dataloader, along with the transformations.
In order to make predictions on these images, we will need to download the 1,000 labels of the ImageNet dataset [8].

Once downloaded, we need to create a mapping between the class indices 0 to 999 and the corresponding class labels, as shown here:

import ast
with open('./imagenet1000_clsidx_to_labels.txt') as f:
    classes_data = f.read()
classes_dict = ast.literal_eval(classes_data)
print({k: classes_dict[k] for k in list(classes_dict)[:5]})

This should output the first five class mappings, as shown below:

{0: 'tench, Tinca tinca', 1: 'goldfish, Carassius auratus', 2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias', 3: 'tiger shark, Galeocerdo cuvieri', 4: 'hammerhead, hammerhead shark'}

Define the model prediction visualization function that takes in the pretrained model object and the number of images to run predictions on. This function should output the images with predictions.

Load the pretrained VGG13 model:

model_finetune = models.vgg13(pretrained=True)

The VGG13 model is downloaded in this step.

FAQ – What is the disk size of a VGG13 model?

A VGG13 model will consume roughly 508 MB on your hard disk.

Finally, we run predictions on our ants and bees dataset using this pretrained model:
```
visualize_predictions(model_finetune)
```

This should output the following:

Figure 2.13: VGG13 predictions

The VGG13 model trained on an entirely different dataset seems to predict all the test samples correctly in the ants and bees dataset. Basically, the model grabs the two most similar animals from the dataset out of the 1,000 classes and finds them in the images. By doing this exercise, we see that the model is still able to extract relevant visual features out of the images and the exercise demonstrates the utility of PyTorch’s out-of-the-box inference feature.

In the next section, we are going to study a different type of CNN architecture – one that involves modules that have multiple parallel convolutional layers. The modules are called Inception modules and the resulting network is called the Inception Network – named after the movie Inception – because this model contains several branching modules much like the branching dreams of the movie. We will explore the various parts of this network and the reasoning behind its success. We will also build the Inception modules and the Inception network architecture using PyTorch.

You're reading from Mastering PyTorch Create and deploy deep learning models from CNNs to multimodal models, LLMs, and beyond

Table of Contents (21) Chapters

Running a pretrained VGG model

Authors (1)

Personalised recommendations for you