You're reading from R Deep Learning Essentials A step-by-step guide to building deep learning models using TensorFlow, Keras, and MXNet

Product type Paperback

Published in Aug 2018

Publisher Packt

ISBN-13 9781788992893

Length 378 pages

Edition 2nd Edition

Languages

Tools

H2O

Concepts

Deep Learning

Authors (2):

Joshua F. Wiley

Mark Hodnett

View More author details

Setting up your R environment

Before you begin your deep learning journey, the first step is to install R, which is available at https://cran.r-project.org/. When you download R and use it, only a few core packages are installed by default, but new packages can be added by selecting from a menu option or by a single line of code. We will not go into detail on how to install R or how to add packages, we assume that most readers are proficient in these skills. A good integrated development environment (IDE) for working with R is essential. By far the most popular IDE, and my recommendation, is RStudio, which can be downloaded from https://www.rstudio.com/. Another option is Emacs. An advantage of both Emacs and RStudio is that they are available on all major platforms (Windows, macOS, and Linux), so even if you switch computers, you can have a consistent IDE experience. The following is a screenshot of the RStudio IDE:

Figure 1.7 RStudio IDE

Using RStudio is a major improvement over the R GUI in Windows. There are a number of panes in RStudio that provide different perspectives on your work. The top-left pane shows the code, the bottom-left pane shows the console (results of running the code). The top-right pane shows the list of variables and their current values, the bottom-right pane shows the plots created by the code. All of these panes have further tabs to explore further perspectives.

As well as an IDE, RStudio (the company) have either developed or heavily supported other tools and packages for the R environment. We will use some of these tools, including the R Markdown and R Shiny applications. R Markdown is similar to Jupyter or IPython notebooks; it allows you to combine code, output (for example, plots), and documentation in one script. R Markdown was used to create sections of this book where code and descriptive text are interwoven. R Markdown is a very good tool to ensure that your data science experiments are documented correctly. By embedding the documentation within the analysis, they are more likely to stay synchronized. R Markdown can output to HTML, Word, or PDF. The following is an example of an R Markdown script on the left and the output on the right:

Figure 1.8: R Markdown example; on the left is a mixture of R code and text information. The output on the right is HTML generated from the source script.

We will also use R Shiny to create web applications using R. This is an excellent method to create interactive applications to demonstrate key functionality. The following screenshot is an example of an R Shiny web application, which we will see in Chapter 5, Image Classification Using Convolutional Neural Networks:

Figure 1.9: An example of an R Shiny web application

Once you have R installed, you can look at adding packages that can fit basic neural networks. The nnet package is one package and it can fit feed-forward neural networks with one hidden layer, such as the one shown in Figure 1.6. For more details on the nnet package, see Venables, W. N. and Ripley, B. D. (2002). The neuralnet package fits neural networks with multiple hidden layers and can train them using back-propagation. It also allows custom error and neuron-activation functions. We will also use the RSNNS package, which is an R wrapper of the Stuttgart Neural Network Simulator (SNNS). The SNNS was originally written in C, but was ported to C++. The RSNNS package makes many model components from SNNS available, making it possible to train a wide variety of models. For more details on the RSNNS package, see Bergmeir, C., and Benitez, J. M. (2012). We will see examples of how to use these models in Chapter 2, Training a Prediction Model.

The deepnet package provides a number of tools for deep learning in R. Specifically, it can train RBMs and use these as part of DBNs to generate initial values to train deep neural networks. The deepnet package also allows for different activation functions, and the use of dropout for regularization.

Deep learning frameworks for R

There are a number of R packages available for neural networks, but few options for deep learning. When the first edition of this book came out, it used the deep learning functions in h2o (https://www.h2o.ai/). This is an excellent, general machine learning framework written in Java, and has an API that allows you to use it from R. I recommend you look at it, especially for large datasets. However, most deep learning practitioners had a preference preferred other deep learning libraries, such as TensorFlow, CNTK, and MXNet, which were not supported in R when the first edition of this book was written. Today, there is a good choice of deep learning libraries that are supported in R—MXNet and Keras. Keras is actually a frontend abstraction for other deep learning libraries, and can use TensorFlow in the background. We will use MXNet, Keras, and TensorFlow in this book.

MXNet

MXNet is a deep learning library developed by Amazon. It can run on CPUs and GPUs. For this chapter, running on CPUs will suffice.

Apache MXNet is a flexible and scalable deep learning framework that supports convolutional neural networks (CNNs) and long short-term memory networks (LSTMs). It can be distributed across multiple processors/machines and achieves almost linear scale on multiple GPUs/CPUs. It is easy to install on R and it supports a good range of deep learning functionality for R. It is an excellent choice for writing our first deep learning model for image-classification.

MXNet originated at Carnegie Mellon University and is heavily supported by Amazon; they chose it as their default deep learning library in 2016. In 2017, MXNet was accepted as the Apache Incubator project, ensuring that it would remain open source software. It has a higher-level programming model similar to Keras, but the reported performance is better. MXNet is very scalable as additional GPUs are added.

To install the MXNet package for Windows, run the following code from an R session:

cran <- getOption("repos")
cran["dmlc"] <- "https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/R/CRAN"
options(repos = cran)
install.packages("mxnet")

This installs the CPU version; for the GPU version, you need to change the second line to:

cran["dmlc"] <- "https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/R/CRAN/GPU/cu92"

You have to change cu92 to cu80, cu90 or cu91 based on the version of CUDA installed on your machine. For other operating systems (and in case the this does not work, as things change very fast in deep learning), you can get further instructions at https://mxnet.incubator.apache.org/install/index.html.

Keras

Keras is a high-level, open source, deep learning framework created by Francois Chollet from Google that emphasizes iterative and fast development; it is generally regarded as one of the best options to use to learn deep learning. Keras has a choice of backend lower-level frameworks: TensorFlow, Theano, or CNTK, but it is most commonly used with TensorFlow. Keras models can be deployed on practically any environment, for example, a web server, iOS, Android, a browser, or the Raspberry Pi.

To learn more about Keras, go to https://keras.io/. To learn more about using Keras in R, go to https://keras.rstudio.com; this link will also has more examples of R and Keras, as well as a handy Keras cheat sheet that gives a thorough reference to all of the functionality of the R Keras package. To install the keras package for R, run the following code:

devtools::install_github("rstudio/keras")
library(keras)
install_keras()

This will install the CPU-based package of Keras and TensorFlow. If your machine has a suitable GPU, you can refer to the documentation for install_keras() to find out how to install it.

Do I need a GPU (and what is it, anyway)?

Probably the two biggest reasons for the exponential growth in deep learning are:

The ability to accumulate, store, and process large datasets of all types
The ability to use GPUs to train deep learning models

So what exactly are GPUs and why are they so important to deep learning? Probably the best place to start is by actually looking at the CPU and why this is not optimal for training deep learning models. The CPU in a modern PC is one of the pinnacles of human design and engineering. Even the chip in a mobile phone is more powerful now than the entire computer systems of the first space shuttles. However, because they are designed to be good at all tasks, they may not be the best option for niche tasks. One such task is high-end graphics.

If we take a step back to the mid-1990s, most games were 2D, for example, platform games where the character in the game jumps between platforms and/or avoids obstacles. Today, almost all computer games utilize 3D space. Modern consoles and PCs have co-processors that take the load of modelling 3D space onto a 2D screen. These co-processors are known as GPUs.

GPUs are actually far simpler than CPUs. They are built to just do one task: massively parallel matrix operations. CPUs and GPUs both have cores, where the actual computation takes place. A PC with an Intel i7 CPU has four physical cores and eight virtual cores by using Hyper Threading. The NVIDIA TITAN Xp GPU card has 3,840 CUDA® cores. These cores are not directly comparable; a core in a CPU is much more powerful than a core in a GPU. But if the workload requires a large amount of matrix operations that can be done independently, a chip with lots of simple cores is much quicker.

Before deep learning was even a concept, researchers in neural networks realized that doing high-end graphics and training neural networks both involved workloads: large amounts of matrix multiplication that could be done in parallel. They realized that training the models on the GPU rather than the CPU would allow them to create much more complicated models.

Today, all deep learning frameworks run on GPUs as well as CPUs. In fact, if you want to train models from scratch and/or have a large amount of data, you almost certainly need a GPU. The GPU must be an NVIDIA GPU and you also need to install the CUDA® Toolkit, NVIDIA drivers, and cuDNN. These allow you to interface with the GPU and hijack its use from a graphics card to a maths co-processor. Installing these is not always easy, you have to ensure that the versions of CUDA, cuDNN and the deep learning libraries you use are compatible. Some people advise you need to use Unix rather than Windows, but support on Windows has improved greatly. This code on this book was developed on a Windows workstation. Forget about using a macOS, because they don't support NVIDIA cards.

That was the bad news. The good news is that you can learn everything about deep learning if you don't have a suitable GPU. The examples in the early chapters of this book will run perfectly fine on a modern PC. When we need to scale up, the book will explain how to use cloud resources, such as AWS and Google Cloud, to train large deep learning models.

Setting up reproducible results

Software for data science is advancing and changing rapidly. Although this is wonderful for progress, it can make reproducing someone else's results a challenge. Even your own code may not work when you go back to it a few months later. This is one of the biggest issues in scientific research today, across all fields, not just artificial intelligence and machine learning. If you work in research or academia and you want to publish your results in scientific journals, this is something you need to be concerned about. The first edition of this book partially addressed this problem by using the R checkpoint package provided by Revolution Analytics. This makes a record of what versions of software were used and ensures there is a snapshot of them available.

For the second edition, we will not use this package for a number of reasons:

Most readers are probably not publishing their work and are more interested in other concerns (maximizing accuracy, interpretability, and so on).
Deep learning requires large datasets. When you have a large amount of data, it should mean that, while we may not get precisely the same result each time, it will be very close (fractions of percentages).
In production systems, there is more to reproducibility than software. You also have to consider data pipelines and random seed-generation.
In order to ensure reproducibility, the libraries used must stay frozen. New versions of deep learning APIs are released constantly and may contain enhancements. If we limited ourselves to old versions, we would get poor results.

If you are interested in learning more about the checkpoint package, you can read the online vignette for the package at https://cran.r-project.org/web/packages/checkpoint/vignettes/checkpoint.html.

This book was written using R version 3.5 on Windows 10 Professional x64, which is the latest version of R at the time of writing. The code was run on a machine with an Intel i5 processor and 32 GB RAM; it should run on an Intel i3 processor with 8 GB RAM.

You can download the example code files for this book from your account at http://www.packtpub.com/. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps: