Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering PyTorch

You're reading from   Mastering PyTorch Create and deploy deep learning models from CNNs to multimodal models, LLMs, and beyond

Arrow left icon
Product type Paperback
Published in May 2024
Publisher Packt
ISBN-13 9781801074308
Length 558 pages
Edition 2nd Edition
Tools
Arrow right icon
Author (1):
Arrow left icon
Ashish Ranjan Jha Ashish Ranjan Jha
Author Profile Icon Ashish Ranjan Jha
Ashish Ranjan Jha
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Overview of Deep Learning Using PyTorch 2. Deep CNN Architectures FREE CHAPTER 3. Combining CNNs and LSTMs 4. Deep Recurrent Model Architectures 5. Advanced Hybrid Models 6. Graph Neural Networks 7. Music and Text Generation with PyTorch 8. Neural Style Transfer 9. Deep Convolutional GANs 10. Image Generation Using Diffusion 11. Deep Reinforcement Learning 12. Model Training Optimizations 13. Operationalizing PyTorch Models into Production 14. PyTorch on Mobile Devices 15. Rapid Prototyping with PyTorch 16. PyTorch and AutoML 17. PyTorch and Explainable AI 18. Recommendation Systems with PyTorch 19. PyTorch and Hugging Face 20. Index

Evolution of CNN architectures

CNNs have been in existence since 1989, when the first multi-layered CNN was developed by Yann LeCun. This model could perform the visual cognition task of identifying handwritten digits. In 1998, LeCun developed an improved ConvNet model called LeNet. Due to its high accuracy in optical recognition tasks, LeNet was adopted for industrial use soon after its invention. Ever since, CNNs have been successful not only in academic research but also in practical industry use cases. The following diagram shows a brief timeline of architectural developments in the lifetime of CNNs, starting from 1989 all the way to 2020:

Figure 3.5 – CNN architecture evolution – a broad picture

Figure 2.5: CNN architecture evolution – a broad picture

As we can see, there is a significant gap between the years 1998 and 2012. This was for two reasons:

  1. There wasn’t a dataset big and suitable enough to demonstrate the capabilities of CNNs, especially deep CNNs.
  2. The available computing power was limited.

And to add to the first reason, on the existing small datasets of the time such as MNIST, classical machine learning models such as SVMs were starting to beat CNN’s performance.

The above two limitations were alleviated as we transitioned from 1998 to 2012 and beyond. Firstly, we had an exponential growth in digital data thanks to the advent of the internet and access to affordable devices such as digital cameras and smartphones. Secondly, we saw an enormous increase in our computational capabilities including the arrival of GPUs.

These changes led to a few CNN developments. The ReLU activation function was developed in order to deal with the gradient explosion and decay problem during backpropagation. Non-random initialization of network parameter values proved to be crucial. Max pooling was invented as an effective method for subsampling. GPUs were getting popular for training neural networks, especially CNNs, at scale.

Finally, and most importantly, a large-scale dedicated dataset of annotated images called ImageNet [1] was created by a research group at Stanford. This dataset is still one of the primary benchmarking datasets for CNN models to date.

With all of these developments compounding over the years, in 2012, a different architectural design brought about a massive improvement in CNN performance on the ImageNet dataset. This network was called AlexNet (named after the creator, Alex Krizhevsky). AlexNet, along with having various novel aspects such as random cropping and pretraining, established the trend of uniform and modular convolutional layer design. The uniform and modular layer structure was taken forward by repeatedly stacking such modules (of convolutional layers), resulting in very deep CNNs also known as VGGs.

Another approach of branching the blocks/modules of convolutional layers and stacking these branched blocks on top of each other proved extremely effective for tailored visual tasks. This network was called GoogLeNet (as it was developed at Google) or Inception v1 (inception being the term for those branched blocks). Several variants of the VGG and Inception networks followed, such as VGG16, VGG19, Inception v2, Inception v3, and so on.

The next phase of development began with skip connections. To tackle the problem of gradient decay while training CNNs, non-consecutive layers were connected via skip connections lest information dissipate between them due to small gradients. It should be noted that skip connections are essentially a special case of multi-path-based CNNs discussed earlier. A popular type of network that emerged with this trick, among other novel characteristics such as batch normalization, was ResNet.

A logical extension of ResNet was DenseNet, where layers were densely connected to each other; that is, each layer gets the input from all the previous layers’ output feature maps. Furthermore, hybrid architectures were then developed by mixing successful architectures from the past such as Inception-ResNet and ResNeXt, where the parallel branches within a block were increased in number.

Lately, the channel boosting technique has proven useful in improving CNN performance. The idea here is to learn novel features and exploit pre-learned features through transfer learning. Most recently, automatically designing new blocks and finding optimal CNN architectures has been a growing trend in CNN research. Examples of such CNNs are MnasNets and EfficientNets. The approach behind these models is to perform a neural architecture search to deduce an optimal CNN architecture with a uniform model scaling approach.

In the next section, we will go back to one of the earliest CNN models and take a closer look at the various CNN architectures developed since. We will build these architectures using PyTorch, training some of the models on real-world datasets. We will also explore PyTorch’s pretrained CNN models repository, popularly known as model-zoo. We will learn how to fine-tune these pretrained models as well as running predictions on them.

You have been reading a chapter from
Mastering PyTorch - Second Edition
Published in: May 2024
Publisher: Packt
ISBN-13: 9781801074308
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime