You're reading from Python Deep Learning Understand how deep neural networks work and apply them to real-world tasks

Product type Paperback

Published in Nov 2023

Publisher Packt

ISBN-13 9781837638505

Length 362 pages

Edition 3rd Edition

Languages

Python

Tools

Keras

Concepts

Deep Learning

Author (1):

Vasilev

View More author details

Table of Contents (17) Chapters

Preface

1. Part 1:Introduction to Neural Networks

2. Chapter 1: Machine Learning – an Introduction FREE CHAPTER

3. Chapter 2: Neural Networks

4. Chapter 3: Deep Learning Fundamentals

5. Part 2: Deep Neural Networks for Computer Vision

6. Chapter 4: Computer Vision with Convolutional Networks

7. Chapter 5: Advanced Computer Vision Applications

8. Part 3: Natural Language Processing and Transformers

9. Chapter 6: Natural Language Processing and Recurrent Neural Networks

10. Chapter 7: The Attention Mechanism and Transformers

11. Chapter 8: Exploring Large Language Models in Depth

12. Chapter 9: Advanced Applications of Large Language Models

13. Part 4: Developing and Deploying Deep Neural Networks

14. Chapter 10: Machine Learning Operations (MLOps)

15. Index

Why subscribe?

16. Other Books You May Enjoy

Classifying images with Vision Transformer

Vision Transformer (ViT, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, https://arxiv.org/abs/2010.11929) proves the adaptability of the attention mechanism by introducing a clever technique for processing images. One way to use transformers for image inputs is to encode each pixel with four variables – pixel intensity, row, column, and channel location. Each pixel encoding is an input to a simple neural network (NN), which outputs a <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:math> -dimensional embedding vector. We can represent the three-dimensional image as a one-dimensional sequence of these embedding vectors. It acts as an input to the model in the same way as a token embedding sequence does. Each pixel will attend to every other pixel in the attention blocks.

This approach has some disadvantages related to the length of the input sequence (context window). Unlike a one-dimensional text sequence, an image has a two-dimensional structure (the color...