Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering PyTorch

You're reading from   Mastering PyTorch Create and deploy deep learning models from CNNs to multimodal models, LLMs, and beyond

Arrow left icon
Product type Paperback
Published in May 2024
Publisher Packt
ISBN-13 9781801074308
Length 558 pages
Edition 2nd Edition
Tools
Arrow right icon
Author (1):
Arrow left icon
Ashish Ranjan Jha Ashish Ranjan Jha
Author Profile Icon Ashish Ranjan Jha
Ashish Ranjan Jha
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Overview of Deep Learning Using PyTorch 2. Deep CNN Architectures FREE CHAPTER 3. Combining CNNs and LSTMs 4. Deep Recurrent Model Architectures 5. Advanced Hybrid Models 6. Graph Neural Networks 7. Music and Text Generation with PyTorch 8. Neural Style Transfer 9. Deep Convolutional GANs 10. Image Generation Using Diffusion 11. Deep Reinforcement Learning 12. Model Training Optimizations 13. Operationalizing PyTorch Models into Production 14. PyTorch on Mobile Devices 15. Rapid Prototyping with PyTorch 16. PyTorch and AutoML 17. PyTorch and Explainable AI 18. Recommendation Systems with PyTorch 19. PyTorch and Hugging Face 20. Index

Fine-tuning the AlexNet model

In this section, we will first take a quick look at the AlexNet architecture and how to build one using PyTorch. Then we will explore PyTorch’s pretrained CNN models repository, and finally, use a pretrained AlexNet model for fine-tuning on an image classification task, as well as making predictions.

AlexNet is a successor of LeNet with incremental changes in the architecture, such as 8 layers (5 convolutional and 3 fully connected) instead of 5, and 60 million model parameters instead of 60,000, as well as using MaxPool instead of AvgPool. Moreover, AlexNet was trained and tested on a much bigger dataset – ImageNet, which is over 100 GB in size – as opposed to the MNIST dataset (on which LeNet was trained), which amounts to a few MB. AlexNet truly revolutionized CNNs as it emerged as a significantly more powerful class of models on image-related tasks than the other classical machine learning models, such as SVMs. Figure 2.9 shows the AlexNet architecture:

Figure 3.14 – AlexNet architecture

Figure 2.9: AlexNet architecture

As we can see, the architecture follows the common theme from LeNet of having convolutional layers stacked sequentially, followed by a series of fully connected layers toward the output end. PyTorch makes it easy to translate such a model architecture into actual code. This can be seen in the following PyTorch-code-equivalent of the architecture:

class AlexNet(nn.Module):
    def __init__(self, number_of_classes):
        super(AlexNet, self).__init__()
        self.feats = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=64, 
                      kernel_size=11, stride=4, padding=5),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(in_channels=64, out_channels=192, 
                      kernel_size=5, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(in_channels=192, out_channels=384, 
                      kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=384, out_channels=256, 
                      kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=256, out_channels=256, 
                      kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.clf = nn.Linear(in_features=256, out_features=number_of_classes)
    def forward(self, inp):
        op = self.feats(inp)
        op = op.view(op.size(0), -1)
        op = self.clf(op)
        return op

The code is quite self-explanatory, wherein the __init__ function contains the initialization of the whole layered structure, consisting of convolutional, pooling, and fully connected layers, along with ReLU activations. The forward function simply runs a data point x through this initialized network. Please note that the second line of the forward method already performs the flattening operation so that we need not define that function separately as we did for LeNet.

But besides the option of initializing the model architecture and training it ourselves, PyTorch, with its torchvision package, provides a models sub-package, which contains definitions of CNN models meant for solving different tasks, such as image classification, semantic segmentation, object detection, and so on. The following is a non-exhaustive list of available models for the task of image classification [3]:

  • AlexNet
  • VGG
  • ResNet
  • SqueezeNet
  • DenseNet
  • Inception v3
  • GoogLeNet
  • ShuffleNet v2
  • MobileNet v2
  • ResNeXt
  • Wide ResNet
  • MnasNet
  • EfficientNet

In the next section, we will use a pretrained AlexNet model as an example and demonstrate how to fine-tune it using PyTorch in the form of an exercise.

Using PyTorch to fine-tune AlexNet

In the following exercise, we will load a pretrained AlexNet model and fine-tune it on an image classification dataset different from ImageNet (on which it was originally trained). Finally, we will test the fine-tuned model’s performance to see if it could transfer-learn from the new dataset. Some parts of the code in the exercise are trimmed for readability but you can find the full code in our GitHub repository [4].

For this exercise, we will need to import a few dependencies. Execute the following import statements:

import os
import time
import copy
import numpy as np
import matplotlib.pyplot as plt
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
from torchvision import datasets, models, transforms
torch.use_deterministic_algorithms(True)

Next, we will download and transform the dataset. For this fine-tuning exercise, we will use a small image dataset of bees and ants. There are 240 training images and 150 validation images divided equally between the two classes (bees and ants).

We download the dataset from Kaggle [5] and store it in the current working directory. More information about the dataset can be found at the dataset’s website [6].

Dataset citation

Elsik, C. G., Tayal, A., Diesh, C. M., Unni, D. R., Emery, M. L., Nguyen, H. N., Hagen, D. E. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine. Nucleic Acids Research, 2016, Jan. 4; 44(D1):D793-800. DOI: 10.1093/nar/gkv1208. Epub 2015, Nov. 17. PubMed PMID: 26578564.

To download the dataset, you will need to log in to Kaggle. If you do not already have a Kaggle account, you will need to register. Let’s download and transform the dataset:

# Creating a local data directory 
ddir = 'hymenoptera_data'
# Data normalization and augmentation transformations 
# for train dataset
# Only normalization transformation for validation dataset
# The mean and std for normalization are calculated as the 
# mean of all pixel values for all images in the training 
# set per each image channel - R, G and B
data_transformers = {
    'train': transforms.Compose([transforms.RandomResizedCrop(224), 
                                 transforms.RandomHorizontalFlip(),
                                 transforms.ToTensor(),
                                 transforms.Normalize(
                                     [0.490, 0.449, 0.411], 
                                     [0.231, 0.221, 0.230])]),
    'val': transforms.Compose([transforms.Resize(256), 
                               transforms.CenterCrop(224), 
                               transforms.ToTensor(), 
                               transforms.Normalize(
                                   [0.490, 0.449, 0.411], 
                                   [0.231, 0.221, 0.230])])}
img_data = {k: datasets.ImageFolder(os.path.join(ddir, k), data_transformers[k])
            for k in ['train', 'val']}
dloaders = {k: torch.utils.data.DataLoader(img_data[k], batch_size=8, 
                                           shuffle=True)
            for k in ['train', 'val']}
dset_sizes = {x: len(img_data[x]) for x in ['train', 'val']}
classes = img_data['train'].classes
dvc = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Now that we have completed the pre-requisites, let’s begin:

  1. Let’s visualize some sample training dataset images:
    def imageshow(img, text=None):
        img = img.numpy().transpose((1, 2, 0))
        avg = np.array([0.490, 0.449, 0.411])
        stddev = np.array([0.231, 0.221, 0.230])
        img = stddev * img + avg
        img = np.clip(img, 0, 1)
        plt.imshow(img)
        if text is not None:
            plt.title(text)
    # Generate one train dataset batch
    imgs, cls = next(iter(dloaders['train']))
    # Generate a grid from batch
    grid = torchvision.utils.make_grid(imgs)
    imageshow(grid, text=[classes[c] for c in cls])
    

We have used the np.clip() method from numpy to ensure that the image pixel values are restricted between 0 and 1 to make the visualization clear. The output will be as follows:

Figure 2.10: Bees versus ants dataset

  1. We now define the fine-tuning routine, which is essentially a training routine performed on a pretrained model:
    def finetune_model(pretrained_model, loss_func, optim, epochs=10):
        ...
        for e in range(epochs):
            for dset in ['train', 'val']:
                if dset == 'train':
                    # set model to train mode 
                    # (i.e. trainbale weights)
                    pretrained_model.train()
                else:
                    # set model to validation mode
                    pretrained_model.eval()
                # iterate over the (training/validation) data.
                for imgs, tgts in dloaders[dset]:
                    ...
                    optim.zero_grad()
                    with torch.set_grad_enabled(dset == 'train'):
                        ops = pretrained_model(imgs)
                        _, preds = torch.max(ops, 1)
                        loss_curr = loss_func(ops, tgts)
                        # backward pass only if in training mode
                        if dset == 'train':
                            loss_curr.backward()
                            optim.step()
                    loss += loss_curr.item() * imgs.size(0)
                    successes += torch.sum(preds == tgts.data)
                loss_epoch = loss / dset_sizes[dset]
                accuracy_epoch = successes.double() / dset_sizes[dset]
                if dset == 'val' and accuracy_epoch > accuracy:
                    accuracy = accuracy_epoch
                    model_weights = copy.deepcopy(
                        pretrained_model.state_dict())
        # load the best model version (weights)
        pretrained_model.load_state_dict(model_weights)
        return pretrained_model
    

In this function, we require the pretrained model (that is, the architecture as well as the weights) as input along with the loss function, optimizer, and number of epochs. Basically, instead of starting from a random initialization of weights, we start with the pretrained weights of AlexNet. The other parts of this function are pretty similar to our previous exercises.

  1. Before starting to fine-tune (train) the model, we will define a function to visualize the model predictions:
    def visualize_predictions(pretrained_model, max_num_imgs=4):
       was_model_training = pretrained_model.training
        pretrained_model.eval()
        imgs_counter = 0
        fig = plt.figure()
        with torch.no_grad():
            for i, (imgs, tgts) in enumerate(dloaders['val']):
                imgs = imgs.to(dvc)
                tgts = tgts.to(dvc)
                ops = pretrained_model(imgs)
                _, preds = torch.max(ops, 1)
                for j in range(imgs.size()[0]):
                    imgs_counter += 1
                    ax = plt.subplot(max_num_imgs//2, 2, imgs_counter)
                    ax.axis('off')
                    ax.set_title(f'Prediction: 
                                 {classes[preds[j]]}, 
                                 Ground Truth: 
                                 {classes[tgts[j]]}')
                    imageshow(imgs.cpu().data[j])
                    if imgs_counter == max_num_imgs:
                       pretrained_model.train(mode=was_model_training)
                        return
            pretrained_model.train(mode=was_model_training)
    
  2. Finally, we get to the interesting part. Let’s use PyTorch’s torchvision.models sub-package to load the pretrained AlexNet model:
    model_finetune = models.alexnet(pretrained=True)
    

This model object has the following two main components:

  1. features: The feature extraction component, which contains all the convolutional and pooling layers
  2. classifier: The classifier block, which contains all the fully connected layers leading to the output layer
  1. We can visualize these components as shown here:
    print(model_finetune.features)
    

This should output the following:

Sequential(
  (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
  (1): ReLU(inplace=True)
  (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (4): ReLU(inplace=True)
  (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (7): ReLU(inplace=True)
  (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (9): ReLU(inplace=True)
  (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): ReLU(inplace=True)
  (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  1. Next, we inspect the classifier block as follows:
    print(model_finetune.classifier)
    

This should output the following:

Sequential( 
  (0): Dropout(p=0.5, inplace=False)  
  (1): Linear(in_features=9216, out_features=4096, bias=True)  
  (2): ReLU(inplace=True)  
  (3): Dropout(p=0.5, inplace=False)  
  (4): Linear(in_features=4096, out_features=4096, bias=True)  
  (5): ReLU(inplace=True)  
  (6): Linear(in_features=4096, out_features=1000, bias=True)  
  1. As you may have noticed, the pretrained model has an output layer of size 1000, but we only have 2 classes in our fine-tuning dataset. So, we shall alter that, as shown here:
    # change the last layer from 1000 classes to 2 classes
    model_finetune.classifier[6] = nn.Linear(4096, len(classes))
    
  2. And now, we are all set to define the optimizer and loss function, and thereafter run the training routine as follows:
    loss_func = nn.CrossEntropyLoss()
    optim_finetune = optim.SGD(model_finetune.parameters(), lr=0.0001)
    # train (fine-tune) and validate the model
    model_finetune = finetune_model(model_finetune, loss_func, 
                                    optim_finetune, epochs=10)
    

The output will be as follows:

Epoch number 0/9
====================
train loss in this epoch: 0.6528244360548551, accuracy in this epoch: 0.610655737704918
val loss in this epoch: 0.5563900120118085, accuracy in this epoch: 0.7320261437908496
Epoch number 1/9
====================
train loss in this epoch: 0.5144887796190919, accuracy in this epoch: 0.75
val loss in this epoch: 0.4758027388769038, accuracy in this epoch: 0.803921568627451
Epoch number 2/9
====================
train loss in this epoch: 0.4620713156754853, accuracy in this epoch: 0.7950819672131147
val loss in this epoch: 0.4326762077855129, accuracy in this epoch: 0.803921568627451
...
Epoch number 7/9
====================
train loss in this epoch: 0.3297723409582357, accuracy in this epoch: 0.860655737704918
val loss in this epoch: 0.3347476099441254, accuracy in this epoch: 0.869281045751634
Epoch number 8/9
====================
train loss in this epoch: 0.32671376110100353, accuracy in this epoch: 0.8524590163934426
val loss in this epoch: 0.32516936344258923, accuracy in this epoch: 0.8823529411764706
Epoch number 9/9
====================
train loss in this epoch: 0.3130935803055763, accuracy in this epoch: 0.8770491803278688
val loss in this epoch: 0.3200583465251268, accuracy in this epoch: 0.8888888888888888
Training finished in 5.0mins 50.6720712184906secs 
Best validation set accuracy: 0.8888888888888888
  1. Let’s visualize some of the model predictions to see whether the model has indeed learned the relevant features from this small dataset:
    visualize_predictions(model_finetune)
    

This should output the following:

Figure 2.11: AlexNet predictions

Clearly, the pretrained AlexNet model has been able to transfer-learn on this rather tiny image classification dataset. This demonstrates both the power of transfer learning as well as the speed and ease with which we can fine-tune well-known models using PyTorch.

In the next section, we will discuss an even deeper and more complex successor of AlexNet – the VGG network. We have demonstrated the model definition, dataset loading, model training (or fine-tuning), and evaluation steps in detail for LeNet and AlexNet. In subsequent sections, we will focus mostly on model architecture definition, as the PyTorch code for other aspects (such as data loading and evaluation) will be similar.

You have been reading a chapter from
Mastering PyTorch - Second Edition
Published in: May 2024
Publisher: Packt
ISBN-13: 9781801074308
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime