Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering PyTorch

You're reading from   Mastering PyTorch Create and deploy deep learning models from CNNs to multimodal models, LLMs, and beyond

Arrow left icon
Product type Paperback
Published in May 2024
Publisher Packt
ISBN-13 9781801074308
Length 558 pages
Edition 2nd Edition
Tools
Arrow right icon
Author (1):
Arrow left icon
Ashish Ranjan Jha Ashish Ranjan Jha
Author Profile Icon Ashish Ranjan Jha
Ashish Ranjan Jha
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Overview of Deep Learning Using PyTorch 2. Deep CNN Architectures FREE CHAPTER 3. Combining CNNs and LSTMs 4. Deep Recurrent Model Architectures 5. Advanced Hybrid Models 6. Graph Neural Networks 7. Music and Text Generation with PyTorch 8. Neural Style Transfer 9. Deep Convolutional GANs 10. Image Generation Using Diffusion 11. Deep Reinforcement Learning 12. Model Training Optimizations 13. Operationalizing PyTorch Models into Production 14. PyTorch on Mobile Devices 15. Rapid Prototyping with PyTorch 16. PyTorch and AutoML 17. PyTorch and Explainable AI 18. Recommendation Systems with PyTorch 19. PyTorch and Hugging Face 20. Index

Why are CNNs so powerful?

CNNs are among the most powerful machine learning models at solving challenging problems such as image classification, object detection, object segmentation, video processing, natural language processing, and speech recognition. Their success is attributed to various factors, such as the following:

  • Weight sharing: This makes CNNs parameter-efficient; that is, different features are extracted using the same set of weights or parameters. Features are the high-level representations of input data that the model generates with its parameters.
  • Automatic feature extraction: Multiple feature extraction stages help a CNN to automatically learn feature representations in a dataset.
  • Hierarchical learning: The multi-layered CNN structure helps CNNs to learn low-, mid-, and high-level features.
  • The ability to explore both spatial and temporal correlations in the data, such as in video-processing tasks.

Besides these pre-existing fundamental characteristics, CNNs have advanced over the years with the help of improvements in the following areas:

  • The use of better activation and loss functions, such as using ReLU to overcome the vanishing gradient problem.
  • Parameter optimization, such as using an optimizer based on Adaptive Momentum (Adam) instead of simple stochastic gradient descent.
  • Regularization: Applying dropouts and batch normalization besides L2 regularization.

FAQ – What is the vanishing gradient problem?

Backpropagation in neural networks works on the basis of the chain rule of differentiation. According to the chain rule, the gradient of the loss function with respect to the input layer parameters can be written as a product of gradients at each layer. If these gradients are all less than 1 – and worse still, tending toward 0 – then the product of these gradients will be a vanishingly small value. The vanishing gradient problem can cause serious trouble in the optimization process by preventing the network parameters from changing their values, which is equivalent to stunted learning.

But some of the most significant drivers of development in CNNs over the years have been the various architectural innovations:

  • Spatial exploration-based CNNs: The idea behind spatial exploration is using different kernel sizes in order to explore different levels of visual features in input data. The following diagram shows a sample architecture for a spatial exploration-based CNN model:
Figure 3.1 – Spatial exploration-based CNN

Figure 2.1: Spatial exploration-based CNN

  • Depth-based CNNs: The depth here refers to the depth of the neural network, that is, the number of layers. So, the idea here is to create a CNN model with multiple convolutional layers in order to extract highly complex visual features. The following diagram shows an example of such a model architecture:
Figure 3.2 – Depth-based CNN

Figure 2.2: Depth-based CNN

  • Width-based CNNs: Width refers to the number of channels or feature maps in the data or features extracted from the data. So, width-based CNNs are all about increasing the number of feature maps as we go from the input to the output layers, as demonstrated in the following diagram:
Figure 3.3 – Width-based CNN

Figure 2.3: Width-based CNN

  • Multi-path-based CNNs: So far, the preceding three types of architectures have had monotonicity in connections between layers; that is, direct connections exist only between consecutive layers. Multi-path CNNs brought the idea of making shortcut connections or skip connections between non-consecutive layers. The following diagram shows an example of a multi-path CNN model architecture:
Figure 3.4 – Multi-path CNN

Figure 2.4: Multi-path CNN

A key advantage of multi-path architectures is a better flow of information across several layers, thanks to the skip connections. This, in turn, also lets the gradient flow back to the input layers without too much dissipation.

Having looked at the different architectural setups found in CNN models, we will now look at how CNNs have evolved over the years ever since they were first used.

You have been reading a chapter from
Mastering PyTorch - Second Edition
Published in: May 2024
Publisher: Packt
ISBN-13: 9781801074308
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime