You're reading from PyTorch Deep Learning Hands-On Build CNNs, RNNs, GANs, reinforcement learning, and more, quickly and easily

Product type Paperback

Published in Apr 2019

Publisher Packt

ISBN-13 9781788834131

Length 250 pages

Edition 1st Edition

Languages

Python

Tools

PyTorch

Concepts

Deep Learning

Authors (2):

Sherin Thomas

Sudhanshu Passi

View More author details

Table of Contents (11) Chapters

Preface

1. Deep Learning Walkthrough and PyTorch Introduction FREE CHAPTER

2. A Simple Neural Network

3. Deep Learning Workflow

4. Computer Vision

5. Sequential Data Processing

6. Generative Networks

7. Reinforcement Learning

Chapter 8. PyTorch to Production

Another Book You May Enjoy

Leave a review - let other readers know what you think

Index

Exploring deep learning

Since man invented computers, we have called them intelligent systems, and yet we are always trying to augment their intelligence. In the old days, anything a computer could do that a human couldn't was considered artificial intelligence. Remembering huge amounts of data, doing mathematical operations on millions or billions of numbers, and so on was considered artificial intelligence. We called Deep Blue, the machine that beat chess grandmaster Garry Kasparov at chess, an artificially intelligent machine.

Eventually, things that humans can't do and a computer can do became just computer programs. We realized that some things humans can do easily are impossible for a programmer to code up. This evolution changed everything. The number of possibilities or rules we could write down and make a computer work like us with was insanely large. Machine learning came to the rescue. People found a way to let the computers to learn the rules from examples, instead of having to code it up explicitly; that's called machine learning. An example is given in Figure 1.9, which shows how we could make a prediction of whether a customer will buy a product or not from his/her past shopping history.

Figure 1.7: Showing the dataset for a customer buying a product

We could predict most of the results, if not all of them. However, what if the number of data points that we could make a prediction from is a lot and we cannot process them with a mortal brain? A computer could look through the data and probably spit out the answer based on previous data. This data-driven approach can help us a lot, since the only thing we have to do is assume the relevant features and give them to the black box, which consists of different algorithms, to learn the rules or pattern from the feature set.

There are problems. Even though we know what to look for, cleaning up the data and extracting the features is not an interesting task. The foremost trouble isn't this, however; we can't predict the features for high-dimensional data and the data of other media types efficiently. For example, in face recognition, we initially found the length of particulars in our face using the rule-based program and gave that to the neural network as input, because we thought that's the feature set that humans use to recognize faces.

Figure 1.8: Human-selected facial features

It turned out that the features that are so obvious for humans are not so obvious for computers and vice versa. The realization of the feature selection problem led us to the era of deep learning. This is a subset of machine learning where we use the same data-driven approach, but instead of selecting the features explicitly, we let the computer decide what the features should be.

Let's consider our face recognition example again. FaceNet, a 2014 paper from Google, tackled it with the help of deep learning. FaceNet implemented the whole application using two deep networks. The first network was to identify the feature set from faces and the second network was to use this feature set and recognize the face (technically speaking, classifying the face into different buckets). Essentially, the first network was doing what we did before and the second network was a simple and traditional machine learning algorithm.

Deep networks are capable of identifying features from datasets, provided we have large labeled datasets. FaceNet's first network was trained with a huge dataset of faces with corresponding labels. The first network was trained to predict 128 features (generally speaking, there are 128 measurements from our faces, like the distance between the left eye and the right eye) from every face and the second network just used these 128 features to recognize a person.

Figure 1.9: A simple neural network

A simple neural network has a single hidden layer, an input layer, and an output layer. Theoretically, a single hidden layer should be able to approximate any complex mathematical equation, and we should be fine with a single layer. However, it turns out that the single hidden layer theory is not so practical. In deep networks, each layer is responsible for finding some features. Initial layers find more detailed features, and final layers abstract these detailed features and find high-level features.

Figure 1.10: A deep neural network

Getting to know different architectures

Deep learning has been around for decades, and different structures and architectures evolved for different use cases. Some of them were based on ideas we had about our brain and some others were based on the actual working of the brain. All the upcoming chapters are based on the state-of-the-art architectures that the industry is using now. We'll cover one or more applications under each architecture, with each chapter covering the concepts, specifications, and technical details behind all of them, obviously with PyTorch code.

Fully connected networks

Fully connected, or dense or linear, networks are the most basic, yet powerful, architecture. This is a direct extension of what is commonly called machine learning, where you use neural networks with a single hidden layer. Fully connected layers act as the endpoint of all the architectures to find the probability distribution of the scores we find using the below deep network. A fully connected network, as the name suggests, has all the neurons connected to each other in the previous and next layers. The network might eventually decide to switch off some neurons by setting the weight, but in an ideal situation, initially, all of them take part in the communication.

Encoders and decoders

Encoders and decoders are probably the next most basic architecture under the deep learning umbrella. All the networks have one or more encoder-decoder layers. You can consider hidden layers in fully connected layers as the encoded form coming from an encoder, and the output layer as a decoder that decodes the hidden layer into output. Commonly, encoders encode the input into an intermediate state, where the input is represented as vectors and then the decoder network decodes this into an output form that we want.

A canonical example of an encoder-decoder network is the sequence-to-sequence (seq2seq) network, which can be used for machine translation. A sentence, say in English, will be encoded to an intermediate vector representation, where the whole sentence will be chunked in the form of some floating-point numbers and the decoder decodes the output sentence in another language from the intermediate vector.

Figure 1.11: Seq2seq network

An autoencoder is a special type of encoder-decoder network and comes under the category of unsupervised learning. Autoencoders try to learn from unlabeled data, setting the target values to be equal to the input values. For example, if your input is an image of size 100 x 100, you'll have an input vector of dimension 10,000. So, the output size will also be 10,000, but the hidden layer size could be 500. In a nutshell, you are trying to convert your input to a hidden state representation of a smaller size, re-generating the same input from the hidden state.

If you were able to train a neural network that could do that, then voilà, you would have found a good compression algorithm where you could transfer high-dimensional input to a lower-dimensional vector with an order of magnitude's gain.

Autoencoders are being used in different situations and industries nowadays. You'll see a similar architecture in Chapter 4, Computer Vision, when we discuss semantic segmentation.

Figure 1.12: Structure of an autoencoder

Recurrent neural networks

RNNs are one of the most common deep learning algorithms, and they took the whole world by storm. Almost all the state-of-the-art performance we have now in natural language processing or understanding is because of a variant of RNNs. In recurrent networks, you try to identify the smallest unit in your data and make your data a group of those units. In the example of natural language, the most common approach is to make one word a unit and consider the sentence as a group of words while processing it. You unfold your RNN for the whole sentence and process your sentence one word at a time. RNNs have variants that work for different datasets and sometimes, efficiency can be taken into account while choosing the variant. Long short-term memory (LSTM) and gated recurrent units (GRUs) cells are the most common RNN units.

Figure 1.13: A vector representation of words in a recurrent network

Recursive neural networks

As the name indicates, recursive neural networks are tree-like networks for understanding the hierarchical structure of sequence data. Recursive networks have been used a lot in natural language processing applications, especially by Richard Socher, a chief scientist at Salesforce, and his team.

Word vectors, which we will see soon in Chapter 5, Sequential Data Processing, are capable of mapping the meaning of a word efficiently into a vector space, but when it comes to the meaning of the overall sentence, there is no go-to solution like word2vec for words. Recursive neural networks are one of the most used algorithms for such applications. Recursive networks can make a parse tree and compositional vectors, and map other hierarchical relations, which, in turn, help us to find the rules that combine words and make sentences. The Stanford Natural Language Inference group has found a renowned and well-used algorithm called SNLI, which is a good example of recursive network use.

Figure 1.14: Vector representation of words in a recursive network

Convolutional neural networks

Convolutional neural networks (CNNs) enabled us to get super-human performance in computer vision. We hit human accuracy in the early 2010s, and we are still gaining more accuracy year by year.

Convolutional networks are the most understood networks, as we have visualizers that show what each layer is doing. Yann LeCun, the Facebook AI Research (FAIR) head, invented CNNs back in the 1990s. We couldn't use them then, since we did not have enough dataset and computational power. CNNs basically scan through your input like a sliding window and make an intermediate representation, then abstract it layer by layer before it reaches the fully connected layer at the end. CNNs are used in non-image datasets successfully as well.

The Facebook research team found a state-of-the-art natural language processing system with convolutional networks that outperforms the RNN, which is supposed to be the go-to architecture for any sequence dataset. Although several neuroscientists and a few AI researchers are not fond of CNNs, since they believe that the brain doesn't do what CNNs do, networks based on CNNs are beating all the existing implementations.

Figure 1.15: A typical CNN

Generative adversarial networks

Generative adversarial networks (GANs) were invented by Ian Goodfellow in 2014 and since then, they have turned the whole AI community upside down. They were one of the simplest and most obvious implementations, yet had the power to fascinate the world with their capabilities. In GANs, two networks compete with each other and reach an equilibrium where the generator network can generate the data, which the discriminator network has a hard time discriminating from the actual image. A real-world example would be the fight between police and counterfeiters.

A counterfeiter tries to make fake currency and the police try to detect it. Initially, the counterfeiters are not knowledgeable enough to make fake currency that look original. As time passes, counterfeiters get better at making currency that looks more like original currency. Then the police start failing to identify fake currency, but eventually they'll get better at it again. This generation-discrimination process eventually leads to an equilibrium. The advantages of GANs are humungous and we'll discuss them in depth later.

Figure 1.16: GAN setup

Reinforcement learning

Learning through interaction is the foundation of human intelligence. Reinforcement learning is the methodology leading us in that direction. Reinforcement learning used to be a completely different field built on top of the idea that humans learn by trial and error. However, with the advancement of deep learning, another field popped up called deep reinforcement learning, which combines the power of deep learning and reinforcement learning.

Modern reinforcement learning uses deep networks to learn, unlike the old approach where we coded those rules explicitly. We'll look into Q-learning and deep Q-learning, showing you the difference between reinforcement learning with and without deep learning.

Reinforcement learning is considered as one of the pathways toward general intelligence, where computers or agents learn through interaction with the real world and objects or experiments, or from feedback. Teaching a reinforcement learning agent is comparable to training dogs through negative and positive rewards. When you give a piece of biscuit for picking up the ball or when you shout at your dog for not picking up the ball, you are reinforcing knowledge into your dog's brain through negative and positive rewards. We do the same with AI agents, but the positive reward will be a positive number, and the negative reward will be a negative number. Even though we can't consider reinforcement learning as another architecture similar to CNN/RNN and so on, I have included this here as another way of using deep neural networks to solve real-world problems: