You're reading from R Deep Learning Projects Master the techniques to design and develop neural network models in R

Product type Paperback

Published in Feb 2018

Publisher Packt

ISBN-13 9781788478403

Length 258 pages

Edition 1st Edition

Languages

Tools

Keras

Concepts

Deep Learning

Authors (2):

Yuxi (Hayden) Liu

Pablo Maldonado

View More author details

What is deep learning and why do we need it?

Deep learning is an emerging subfield of machine learning. It employs artificial neural network (ANN) algorithms to process data, derive patterns or to develop abstractions, simulating the thinking process of a biological brain. And those ANNs usually contain more than one hidden layer, which is how deep learning got its name—machine learning with stacked neural networks. Going beyond shallow ANNs (usually with only one hidden layer), a deep learning model with the right architectures and parameters can better represent complex non-linear relationships.

Here is an example of a shallow ANN:

And an example of a deep learning model:

Don't feel scared, regardless of how complicated it might sound or look. We will be going from shallow to deep dives into deep learning throughout five projects in this book.

First of all, as a part of the broad family of machine learning, deep learning can be used in supervised learning, semi-supervised learning, as well as unsupervised learning tasks, even reinforcement learning tasks. So what sets it apart from traditional machine learning algorithms?

What makes deep learning special?

Deep learning employs a stack of multiple hidden layers of non-linear processing units. The input of a hidden layer is the output of its previous layer. This can be easily observed from the examples of a shallow neural network and a deep neural network shown previously.

Features are extracted from each hidden layer. Features from different layers represent abstracts or patterns of different levels. Hence, higher-level features are derived from lower-level features, which are extracted from previous layers. All these together form a hierarchical representation learned from the data.

Take the cats and dogs image classification as an example, in traditional machine learning solutions, the classification step follows a feature extraction process, which is often based on:

Domain knowledge, such as color, shape, color of the animals, shape of the ears in this case, which are usually hand-crafted
Dimensionality reduction, such as principal component analysis (PCA), Latent Dirichlet Allocation (LDA)
Feature engineering techniques, such as histogram of oriented gradients transformation (HOG), Scale Invariant Feature Transform (SIFT), and Speeded up Robust Features (SURF)

The workflow of traditional machine learning solution to cats and dogs classification is displayed as follows:

However, in deep learning based solutions (such as CNNs, which we will be learning shortly), hierarchical representations are derived throughout the latent learning process and features of the highest level are then fed into the final classification step. These features capture the important and distinguishable details in the cat and dog images. Depending on the magic worked in hidden layers:

The low-level features can be edges, lines or dots of whiskers, nose or eyes, ears and so on
The higher-level features can be outlines or contours of the animals

The entire workflow of deep learning solution is shown as follows:

Deep learning removes those manual or explicit feature extraction steps, and instead relies on the training process to automatically discover useful patterns underneath the input data. And through tweaking the layout (number of layers, number of hidden units for a layer, activation function, and so on) of the networks, we can find the most efficient sets of features.

Recall the example of the shallow ANN and that of the deep learning model in the last section, data flow one-way from the input layer to the output. Besides feedforward architectures, deep learning models allow data to proceed in any direction, even to circle back to the input layer. Data looping back from the previous output becomes part of the next input data. Recurrent neural networks (RNNs) are great examples. We will be working on projects using RNNs later in this book. For now, we can still get a sense of what the recurrent or cycle-like architecture looks like from the diagram of RNNs as follows:

The recurrent architecture makes the models applicable to time series data and sequences of inputs. As data from previous time points goes into the training of the current time point, the deep learning recurrent model effectively solves a time series or sequence learning problem in a feedforward manner. In traditional machine learning solutions (read more in Machine Learning for Sequential Data: A Review by T. Dietterich) to time series problems, sliding windows of previous lags are usually provided as current inputs. This can be ineffective as the size of the sliding windows needs to be decided and so does the number of windows, while the recurrent models figure out timely or sequential relationships themselves.

Although we are discussing here all the advantages about deep learning over the other machine learning techniques, we did not make any claim or statement that the modern deep learning is superior to the traditional machine learning. That's right, there is no free lunch in this field, which was also emphasized in my last book, Python Machine Learning By Example. There is no single algorithm that can solve all machine learning problems more efficiently than others. It all depends on specific use cases - in some applications, the "traditional" ones are a better fit, or a deep learning setting makes no difference; in some cases, the "modern" ones yield better performance.

Next, we will see some typical applications of deep learning that will better motivate us to get started in deep learning projects.

What are the applications of deep learning?

Computer vision and image recognition is often considered the first area where breakthroughs of deep learning occurred. Handwritten digit recognition has become a Hello World in this field, and a common evaluation set for image classification algorithms and techniques is the scanned document dataset constructed from the National Institute of Standards and Technology (NIST), called MNIST (M stands for modified, which means data is pre-processed for the ease of machine learning processes).

Some examples from MNIST are shown as follows:

Some researchers have so far achieved the best performance 0.21% error rate on the MNIST dataset using CNNs. Details can be found in the paper, Regularization of Neural Networks using DropConnect, published in the International Conference on Machine Learning (ICML) in 2013. Other comparable results, for example 0.23%, 0.27% and 0.31%, are also yielded by CNNs and deep neural networks. However, traditional machine learning algorithms with sophisticated feature engineering techniques could only yield error rates ranging from 0.52% to 7.6%, which were achieved by using Support Vector Machine (SVMs) and pairwise linear classifiers respectively.

Besides image recognition (such as the well known face recognition), the applications of deep learning are extended to more challenging tasks including:

Image-based search engines, which cover image classification and image similarity encoding, heavily utilizing deep learning techniques.
Machine vision, with self-driving cars as an example, which interprets 360° camera views to make decisions in real time.
Color restoration from black and white photos—the examples after color recovery from http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/extra.html are impressive.
Image generation, including handwriting, cat images, and even video game images, or whatever image you name it. For example, we use an interesting playground, https://www.cs.toronto.edu/~graves/handwriting.html (developed by Alex Graves from the University of Toronto), to create handwritings of the title of this book in three different styles:

Natural language processing (NLP) is another field where deep learning is dominant in modern solutions. Recall we described deep learning models with recurrent architecture are appropriate for sequences of inputs, such as natural language and text. In recent years, deep learning has greatly helped to improve:

Machine translation, for example the sentence-based Google Neural Machine Translation system (GNMT) which utilizes deep RNNs to improve accuracy and fluency
Sentiment analysis, information retrieval, theme detection and many other common NLP applications, where deep learning models have achieved state-of-the-art performance thanks to word embedding techniques
Text generation, where RNNs learn the intricate relationship between words (including punctuation) in sentences and to write text, to become an author or a virtual Shakespeare

Image captioning generation, also known as image to text, couples recent breakthroughs in computer vision and NLP. It leverages CNNs to detect and classify objects in images, and assigns labels to those objects. It then applies RNNs to describe those labels in a comprehensible sentence. The following examples are captured from the web demo from http://cs.stanford.edu/people/karpathy/deepimagesent/generationdemo/ (developed by Andrej Karpathy from Stanford University):

Similarly, sound and speech is also a field of sequential learning, where machine learning algorithms are applied to predict time series or label sequence data. Speech recognition has been greatly revolutionized by deep learning. And now, deep learning based products like Apple's Siri, Amazon's Alexa, Google Home, Skype Translator and many others are "invading" our lives, in a good way for sure. Besides an author writing text, deep learning models can also be a music composer. For example, Francesco Marchesani from the Polytechnic University of Milan was able to train RNNs to produce Chopin's music.

Additionally, deep learning also excels in many use cases in video. It makes significant contributions to the boost of virtual reality with its capability of accurate motion detection, and to the advance of real-time behavior analysis in surveillance videos. Scientists from Google, DeepMind, and Oxford even built a computer lip reader called LipNet, achieving a success rate of 93%.

Besides supervised and unsupervised learning cases, deep learning is heavily used in reinforcement learning. Robots who can handle objects, climb stairs, operate in kitchens are not new to us. Recently, Google's AlphaGo beating the world's elite Go players received widespread media coverage. Nowadays, everybody looks forward to seeing self-driving cars being out in the market in just one or two years. These have all benefited from the advance of deep learning in reinforcement learning. Oh, and don't forget computers are taught to play the game, FlappyBird!

We did not even mention bioinformatics, drug discovery, recommendation systems in e-commerce, finance, especially the stock market, insurance and the Internet of Things (IoT). In fact, the list of deep learning applications is already long, and only gets longer and longer.

I hope this section excited you about deep learning and its power of providing better solutions to many machine learning problems we are facing. Artificial intelligence has a brighter future thanks to the advance of deep learning.

So what are we waiting for? Let's get started with handwritten digit recognition!