Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering PyTorch

You're reading from   Mastering PyTorch Create and deploy deep learning models from CNNs to multimodal models, LLMs, and beyond

Arrow left icon
Product type Paperback
Published in May 2024
Publisher Packt
ISBN-13 9781801074308
Length 558 pages
Edition 2nd Edition
Tools
Arrow right icon
Author (1):
Arrow left icon
Ashish Ranjan Jha Ashish Ranjan Jha
Author Profile Icon Ashish Ranjan Jha
Ashish Ranjan Jha
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Overview of Deep Learning Using PyTorch 2. Deep CNN Architectures FREE CHAPTER 3. Combining CNNs and LSTMs 4. Deep Recurrent Model Architectures 5. Advanced Hybrid Models 6. Graph Neural Networks 7. Music and Text Generation with PyTorch 8. Neural Style Transfer 9. Deep Convolutional GANs 10. Image Generation Using Diffusion 11. Deep Reinforcement Learning 12. Model Training Optimizations 13. Operationalizing PyTorch Models into Production 14. PyTorch on Mobile Devices 15. Rapid Prototyping with PyTorch 16. PyTorch and AutoML 17. PyTorch and Explainable AI 18. Recommendation Systems with PyTorch 19. PyTorch and Hugging Face 20. Index

Running a pretrained VGG model

We have already discussed LeNet and AlexNet, two of the foundational CNN architectures. As we progress in the chapter, we will explore increasingly complex CNN models. That being said, the key principles in building these model architectures will be the same. We will see a modular model-building approach in putting together convolutional layers, pooling layers, and fully connected layers into blocks/modules and then stacking these blocks sequentially or in a branched manner. In this section, we look at the successor to AlexNet – VGGNet.

The name VGG is derived from the Visual Geometry Group of Oxford University, where this model was invented. Compared to the 8 layers and 60 million parameters of AlexNet, VGG consists of 13 layers (10 convolutional layers and 3 fully connected layers) and 138 million parameters. VGG basically stacks more layers onto the AlexNet architecture with smaller convolution kernels (2x2 or 3x3).

Hence, VGG’s novelty lies in the unprecedented level of depth that it brings with its architecture. Figure 2.12 shows the VGG architecture:

Figure 3.20 – VGG16 architecture

Figure 2.12: VGG16 architecture

The preceding VGG architecture is called VGG13, because of the 13 layers. Other variants are VGG16 and VGG19, consisting of 16 and 19 layers, respectively. There is another set of variants – VGG13_bn, VGG16_bn, and VGG19_bn, where bn suggests that these models also consist of batch-normalization layers.

PyTorch’s torchvision.model sub-package provides the pretrained VGG model (with all of the six variants discussed earlier) trained on the ImageNet dataset. In the following exercise, we will use the pretrained VGG13 model to make predictions on a small dataset of bees and ants (used in the previous exercise). We will focus on the key pieces of code here, as most other parts of our code will overlap with that of the previous exercises. We can always refer to our notebooks to explore the full code [7]:

  1. First, we need to import dependencies, including torchvision.models.
  2. Download the data and set up the ants and bees dataset and dataloader, along with the transformations.
  3. In order to make predictions on these images, we will need to download the 1,000 labels of the ImageNet dataset [8].
  4. Once downloaded, we need to create a mapping between the class indices 0 to 999 and the corresponding class labels, as shown here:
    import ast
    with open('./imagenet1000_clsidx_to_labels.txt') as f:
        classes_data = f.read()
    classes_dict = ast.literal_eval(classes_data)
    print({k: classes_dict[k] for k in list(classes_dict)[:5]})
    

This should output the first five class mappings, as shown below:

{0: 'tench, Tinca tinca', 1: 'goldfish, Carassius auratus', 2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias', 3: 'tiger shark, Galeocerdo cuvieri', 4: 'hammerhead, hammerhead shark'}
  1. Define the model prediction visualization function that takes in the pretrained model object and the number of images to run predictions on. This function should output the images with predictions.
  2. Load the pretrained VGG13 model:
    model_finetune = models.vgg13(pretrained=True)
    

The VGG13 model is downloaded in this step.

FAQ – What is the disk size of a VGG13 model?

A VGG13 model will consume roughly 508 MB on your hard disk.

  1. Finally, we run predictions on our ants and bees dataset using this pretrained model:
    visualize_predictions(model_finetune)
    

This should output the following:

Figure 2.13: VGG13 predictions

The VGG13 model trained on an entirely different dataset seems to predict all the test samples correctly in the ants and bees dataset. Basically, the model grabs the two most similar animals from the dataset out of the 1,000 classes and finds them in the images. By doing this exercise, we see that the model is still able to extract relevant visual features out of the images and the exercise demonstrates the utility of PyTorch’s out-of-the-box inference feature.

In the next section, we are going to study a different type of CNN architecture – one that involves modules that have multiple parallel convolutional layers. The modules are called Inception modules and the resulting network is called the Inception Network – named after the movie Inception – because this model contains several branching modules much like the branching dreams of the movie. We will explore the various parts of this network and the reasoning behind its success. We will also build the Inception modules and the Inception network architecture using PyTorch.

You have been reading a chapter from
Mastering PyTorch - Second Edition
Published in: May 2024
Publisher: Packt
ISBN-13: 9781801074308
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image