You're reading from Mastering PyTorch Create and deploy deep learning models from CNNs to multimodal models, LLMs, and beyond

Product type Paperback

Published in May 2024

Publisher Packt

ISBN-13 9781801074308

Length 558 pages

Edition 2nd Edition

Tools

PyTorch

Concepts

Deep Learning

Author (1):

Ashish Ranjan Jha

View More author details

Table of Contents (21) Chapters

Preface

1. Overview of Deep Learning Using PyTorch

2. Deep CNN Architectures FREE CHAPTER

3. Combining CNNs and LSTMs

4. Deep Recurrent Model Architectures

5. Advanced Hybrid Models

6. Graph Neural Networks

7. Music and Text Generation with PyTorch

8. Neural Style Transfer

9. Deep Convolutional GANs

10. Image Generation Using Diffusion

11. Deep Reinforcement Learning

12. Model Training Optimizations

13. Operationalizing PyTorch Models into Production

14. PyTorch on Mobile Devices

15. Rapid Prototyping with PyTorch

16. PyTorch and AutoML

17. PyTorch and Explainable AI

18. Recommendation Systems with PyTorch

19. PyTorch and Hugging Face

20. Index

Why are CNNs so powerful?

CNNs are among the most powerful machine learning models at solving challenging problems such as image classification, object detection, object segmentation, video processing, natural language processing, and speech recognition. Their success is attributed to various factors, such as the following:

Weight sharing: This makes CNNs parameter-efficient; that is, different features are extracted using the same set of weights or parameters. Features are the high-level representations of input data that the model generates with its parameters.
Automatic feature extraction: Multiple feature extraction stages help a CNN to automatically learn feature representations in a dataset.
Hierarchical learning: The multi-layered CNN structure helps CNNs to learn low-, mid-, and high-level features.
The ability to explore both spatial and temporal correlations in the data, such as in video-processing tasks.

Besides these pre-existing fundamental characteristics, CNNs have advanced over the years with the help of improvements in the following areas:

The use of better activation and loss functions, such as using ReLU to overcome the vanishing gradient problem.
Parameter optimization, such as using an optimizer based on Adaptive Momentum (Adam) instead of simple stochastic gradient descent.
Regularization: Applying dropouts and batch normalization besides L2 regularization.

FAQ – What is the vanishing gradient problem?

Backpropagation in neural networks works on the basis of the chain rule of differentiation. According to the chain rule, the gradient of the loss function with respect to the input layer parameters can be written as a product of gradients at each layer. If these gradients are all less than 1 – and worse still, tending toward 0 – then the product of these gradients will be a vanishingly small value. The vanishing gradient problem can cause serious trouble in the optimization process by preventing the network parameters from changing their values, which is equivalent to stunted learning.

But some of the most significant drivers of development in CNNs over the years have been the various architectural innovations:

Spatial exploration-based CNNs: The idea behind spatial exploration is using different kernel sizes in order to explore different levels of visual features in input data. The following diagram shows a sample architecture for a spatial exploration-based CNN model:

Figure 3.1 – Spatial exploration-based CNN

Figure 2.1: Spatial exploration-based CNN

Depth-based CNNs: The depth here refers to the depth of the neural network, that is, the number of layers. So, the idea here is to create a CNN model with multiple convolutional layers in order to extract highly complex visual features. The following diagram shows an example of such a model architecture:

Figure 2.2: Depth-based CNN

Width-based CNNs: Width refers to the number of channels or feature maps in the data or features extracted from the data. So, width-based CNNs are all about increasing the number of feature maps as we go from the input to the output layers, as demonstrated in the following diagram:

Figure 2.3: Width-based CNN

Multi-path-based CNNs: So far, the preceding three types of architectures have had monotonicity in connections between layers; that is, direct connections exist only between consecutive layers. Multi-path CNNs brought the idea of making shortcut connections or skip connections between non-consecutive layers. The following diagram shows an example of a multi-path CNN model architecture:

Figure 2.4: Multi-path CNN

A key advantage of multi-path architectures is a better flow of information across several layers, thanks to the skip connections. This, in turn, also lets the gradient flow back to the input layers without too much dissipation.

Having looked at the different architectural setups found in CNN models, we will now look at how CNNs have evolved over the years ever since they were first used.

You're reading from Mastering PyTorch Create and deploy deep learning models from CNNs to multimodal models, LLMs, and beyond

Table of Contents (21) Chapters

Why are CNNs so powerful?

Authors (1)

Personalised recommendations for you