What this book covers
Chapter 1, Artificial Neural Network Fundamentals, gives you the complete details of how an NN works. You will start by learning the key terminology associated with NNs. Next, you will understand the working details of the building blocks and build an NN from scratch on a toy dataset. By the end of this chapter, you will be confident about how an NN works.
Chapter 2, PyTorch Fundamentals, introduces you to working with PyTorch. You will learn about the ways of creating and manipulating tensor objects before learning about the different ways of building a neural network model using PyTorch. You will still work with a toy dataset so that you understand the specifics of working with PyTorch.
Chapter 3, Building a Deep Neural Network with PyTorch, combines all that has been covered in the previous chapters to understand the impact of various NN hyperparameters on model accuracy. By the end of this chapter, you will be confident about working with NNs on a realistic dataset.
Chapter 4, Introducing Convolutional Neural Networks, details the challenges of using a vanilla neural network and you will be exposed to the reason why convolutional neural networks (CNNs) overcome the various limitations of traditional neural networks. You will dive deep into the working details of CNN and understand the various components in it. Next, you will learn the best practices of working with images. In this chapter, you will start working with real-world images and learn the intricacies of how CNNs help in image classification.
Chapter 5, Transfer Learning for Image Classification, exposes you to solving image classification problems in real-world. You will learn about multiple transfer learning architectures and also understand how they help significantly improve image classification accuracy. Next, you will leverage transfer learning to implement the use cases of facial keypoint detection and age and gender estimation.
Chapter 6, Practical Aspects of Image Classification, provides insight into the practical aspects to take care of while building and deploying image classification models. You will practically see the advantages of leveraging data augmentation and batch normalization on real-world data. Further, you will learn about how class activation maps help to explain the reason why the CNN model predicted a certain outcome. By the end of this chapter, you will be able to confidently tackle the majority of image classification problems and leverage the models discussed in the previous three chapters on your custom dataset.
Chapter 7, Basics of Object Detection, lays the foundation for object detection where you will learn about the various techniques that are used to build an object detection model. Next, you will learn about region proposal-based object-detection techniques through a use case where you will implement a model to locate trucks and buses in an image.
Chapter 8, Advanced Object Detection, exposes you to the limitations of region-proposal-based architectures. You will then learn about the working details of more advanced architectures like YOLO and SSD that address the issues of region proposal-based architectures. You will implement all the architectures on the same dataset (trucks versus buses detection) so that you can contrast how each architecture works.
Chapter 9, Image Segmentation, builds upon the learnings in previous chapters and will help you build models that pinpoint the location of the objects of various classes as well as instances of objects in an image. You will implement the use cases on images of a road and also on images of a common household. By the end of this chapter, you will be able to confidently tackle any image classification and object detection/segmentation problem, and solve it by building a model using PyTorch.
Chapter 10, Applications of Object Detection and Segmentation, sums up what we’ve learned in all the previous chapters and moves on to implementing object detection and segmentation in a few lines of code and implementing models to perform human crowd counting and image colorization. Next, you will learn about 3D object detection on a real-world dataset. Finally, you will learn about performing action recognition on a video.
Chapter 11, Autoencoders and Image Manipulation, lays the foundation for modifying an image. You will start by learning about various autoencoders that help in compressing an image and also generating novel images. Next, you will learn about adversarial attacks that fool a model before implementing neural style transfer. Finally, you will implement an autoencoder to generate deep fake images.
Chapter 12, Image Generation Using GANs, starts by giving you a deep dive into how GANs work. Next, you will implement fake facial image generation and generate images of interest using GANs.
Chapter 13, Advanced GANs to Manipulate Images, takes image manipulation to the next level. You will implement GANs to convert objects from one class to another, generate images from sketches, and manipulate custom images so that we can generate an image in a specific style. By the end of this chapter, you will be able to confidently perform image manipulation using a combination of autoencoders and GANs.
Chapter 14, Combining Computer Vision and Reinforcement Learning, starts by exposing you to the terminology of reinforcement learning (RL) and also the way to assign value to a state. You will appreciate how RL and NNs can be combined as you learn about Deep Q-Learning. Using this knowledge, you will implement an agent to play the game of Pong and also an agent to implement a self-driving car.
Chapter 15, Combining Computer Vision and NLP Techniques, gives you the working details of transformers, using which you will implement applications like image classification, handwriting recognition, key-value extraction in passport images, and finally, visual question answering on images. In this process, you will learn a variety of ways of customizing/leveraging transformer architecture.
Chapter 16, Foundation Models in Computer Vision, starts by strengthening your understanding of combining image and text using CLIP model. Next, you will discuss the Segment Anything Model (SAM), which helps with a variety of tasks – segmentation, recognition, and tracking without any training. Finally, you will understand the working details of diffusion models before you learn the importance of prompt engineering and the impact of bigger pre-trained models like SDXL.
Chapter 17, Applications of Stable Diffusion, extends what you learned in the previous chapters by walking you through how a variety of Stable Diffusion applications (image in-painting, ControlNet, DepthNet, SDXL Turbo, and text-to-video) are trained and then walking you through leveraging different models to perform different tasks.
Chapter 18, Moving a Model to Production, describes the best practices for moving a model to production. You will first learn about deploying a model on a local server before moving it to the AWS public cloud. Next, you will learn about the impact of half-precision on latency, and finally, you will learn about leveraging vector stores (for instance, FAISS) and identifying data drift once a model is moved to production.
As the field evolves, we will periodically add valuable supplements to the GitHub repository. Do check the supplementary_sections
folder within each chapter’s directory for new and useful content.