You're reading from TensorFlow 2.0 Computer Vision Cookbook Implement machine learning solutions to overcome various computer vision challenges

Product type Paperback

Published in Feb 2021

Publisher Packt

ISBN-13 9781838829131

Length 542 pages

Edition 1st Edition

Languages

Python

Tools

OpenCV

Concepts

Computer Vision

Author (1):

Jesús Martínez

View More author details

Table of Contents (14) Chapters

Preface

1. Chapter 1: Getting Started with TensorFlow 2.x for Computer Vision

2. Chapter 2: Performing Image Classification FREE CHAPTER

3. Chapter 3: Harnessing the Power of Pre-Trained Networks with Transfer Learning

4. Chapter 4: Enhancing and Styling Images with DeepDream, Neural Style Transfer, and Image Super-Resolution

5. Chapter 5: Reducing Noise with Autoencoders

6. Chapter 6: Generative Models and Adversarial Attacks

7. Chapter 7: Captioning Images with CNNs and RNNs

8. Chapter 8: Fine-Grained Understanding of Images through Segmentation

9. Chapter 9: Localizing Elements in Images with Object Detection

10. Chapter 10: Applying the Power of Deep Learning to Videos

11. Chapter 11: Streamlining Network Implementation with AutoML

12. Chapter 12: Boosting Performance

13. Other Books You May Enjoy

Leave a review - let other readers know what you think

Chapter 7: Captioning Images with CNNs and RNNs

Equipping neural networks with the ability to describe visual scenes in a human-readable fashion has to be one of the most interesting yet challenging applications of deep learning. The main difficulty arises from the fact that this problem combines two major subfields of artificial intelligence: Computer Vision (CV) and Natural Language Processing (NLP).

The architectures of most image captioning networks use a Convolutional Neural Network (CNN) to encode images in a numeric format so that they're suitable for the consumption of the decoder, which is typically a Recurrent Neural Network (RNN). This is a kind of network specialized in learning from sequential data, such as time series, video, and text.

As we'll see in this chapter, the challenges of building a system with these capabilities start with preparing the data, which we'll cover in the first recipe. Then, we'll implement an image captioning solution...