You're reading from TensorFlow 1.x Deep Learning Cookbook Over 90 unique recipes to solve artificial-intelligence driven problems with Python

Product type Paperback

Published in Dec 2017

Publisher Packt

ISBN-13 9781788293594

Length 536 pages

Edition 1st Edition

Languages

Python

Tools

TensorFlow

Concepts

Deep Learning

Authors (2):

Dr. Amita Kapoor

Antonio Gulli

View More author details

TensorFlow for Deep Learning

DNNs today is the buzzword in the AI community. Many data science/Kaggle competitions have been recently won by candidates using DNNs. While the concept of DNNs had been around since the proposal of Perceptrons by Rosenblat in 1962 and they were made feasible by the discovery of the Gradient Descent Algorithm in 1986 by Rumelhart, Hinton, and Williams. It is only recently that DNNs became the favourite of AI/ML enthusiasts and engineers world over.

The main reason for this is the availability of modern computing power such as GPUs and tools like TensorFlow that make it easier to access GPUs and construct complex neural networks in just a few lines of code.

As a machine learning enthusiast, you must already be familiar with the concepts of neural networks and deep learning, but for the sake of completeness, we will introduce the basics here and explore what features of TensorFlow make it a popular choice for deep learning.

Neural networks are a biologically inspired model for computation and learning. Like a biological neuron, they take weighted input from other cells (neurons or environment); this weighted input undergoes a processing element and results in an output which can be binary (fire or not fire) or continuous (probability, prediction). Artificial Neural Networks (ANNs) are networks of these neurons, which can be randomly distributed or arranged in a layered structure. These neurons learn through the set of weights and biases associated with them.

The following figure gives a good idea about the similarity of the neural network in biology and an artificial neural network:

Deep learning, as defined by Hinton et al. (https://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf), consists of computational models composed of multiple processing layers (hidden layers). An increase in the number of layers results in an increase in learning time. The learning time further increases due to a large dataset, as is the norm of present-day CNN or Generative Adversarial Networks (GANs). Thus, to practically implement DNNs, we require high computation power. The advent of GPUs by NVDIA® made it feasible and then TensorFlow by Google made it possible to implement complex DNN structures without going into the complex mathematical details, and availability of large datasets provided the necessary food for DNNs. TensorFlow is the most popular library for deep learning for the following reasons:

TensorFlow is a powerful library for performing large-scale numerical computations like matrix multiplication or auto-differentiation. These two computations are necessary to implement and train DNNs.
TensorFlow uses C/C++ at the backend, which makes it computationally fast.
TensorFlow has a high-level machine learning API (tf.contrib.learn) that makes it easier to configure, train, and evaluate a large number of machine learning models.
One can use Keras, a high-level deep learning library, on top of TensorFlow. Keras is very user-friendly and allows easy and fast prototyping. It supports various DNNs like RNNs, CNNs, and even a combination of the two.

How to do it...

Any deep learning network consists of four important components: Dataset, defining the model (network structure), Training/Learning, and Prediction/Evaluation. We can do all these in TensorFlow; let's see how:

Dataset: DNNs depend on large amounts of data. The data can be collected or generated or one can also use standard datasets available. TensorFlow supports three main methods to read the data. There are different datasets available; some of the datasets we will be using to train the models built in this book are as follows:
MNIST: It is the largest database of handwritten digits (0 - 9). It consists of a training set of 60,000 examples and a test set of 10,000 examples. The dataset is maintained at Yann LeCun's home page (http://yann.lecun.com/exdb/mnist/). The dataset is included in the TensorFlow library in tensorflow.examples.tutorials.mnist.
CIFAR10: This dataset contains 60,000 32 x 32 color images in 10 classes, with 6,000 images per class. The training set contains 50,000 images and test dataset--10,000 images. The ten classes of the dataset are: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. The data is maintained by the Computer Science Department, University of Toronto (https://www.cs.toronto.edu/~kriz/cifar.html).
WORDNET: This is a lexical database of English. It contains nouns, verbs, adverbs, and adjectives, grouped into sets of cognitive synonyms (Synsets), that is, words that represent the same concept, for example, shut and close or car and automobile are grouped into unordered sets. It contains 155,287 words organised in 117,659 synsets for a total of 206,941 word-sense pairs. The data is maintained by Princeton University (https://wordnet.princeton.edu/).

ImageNET: This is an image dataset organized according to the WORDNET hierarchy (only nouns at present). Each meaningful concept (synset) is described by multiple words or word phrases. Each synset is represented on average by 1,000 images. At present, it has 21,841 synsets and a total of 14,197,122 images. Since 2010, an annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) has been organized to classify images in one of the 1,000 object categories. The work is sponsored by Princeton University, Stanford University, A9, and Google (http://www.image-net.org/).
YouTube-8M: This is a large-scale labelled video dataset consisting of millions of YouTube videos. It has about 7 million YouTube video URLs classified into 4,716 classes, organized into 24 top-level categories. It also provides preprocessing support and frame-level features. The dataset is maintained by Google Research (https://research.google.com/youtube8m/).

Reading the data: The data can be read in three ways in TensorFlow--feeding through feed_dict, reading from files, and using preloaded data. We will be using the components described in this recipe throughout the book for reading, and feeding the data. In the next steps, you will learn each one of them.

Feeding: In this case, data is provided while running each step, using the feed_dict argument in the run() or eval() function call. This is done with the help of placeholders and this method allows us to pass Numpy arrays of data. Consider the following part of code using TensorFlow:

... 
y = tf.placeholder(tf.float32) 
x = tf.placeholder(tf.float32). 
... 
with tf.Session as sess: 
   X_Array = some Numpy Array 
   Y_Array = other Numpy Array 
   loss= ... 
sess.run(loss,feed_dict = {x: X_Array, y: Y_Array}). 
...

Here, x and y are the placeholders; using them, we pass on the array containing X values and the array containing Y values with the help of feed_dict.

Reading from files: This method is used when the dataset is very large to ensure that not all data occupies the memory at once (imagine the 60 GB YouTube-8m dataset). The process of reading from files can be done in the following steps:

- A list of filenames is created using either string Tensor ["file0", "file1"] or [("file%d"i) for in in range(2)] or using the files = tf.train.match_filenames_once('*.JPG') function.

- Filename queue: A queue is created to keep the filenames until the reader needs them using the tf.train.string_input_producer function:

filename_queue = tf.train.string_input_producer(files) 
# where files is the list of filenames created above

This function also provides an option to shuffle and set a maximum number of epochs. The whole list of filenames is added to the queue for each epoch. If the shuffling option is selected (shuffle=True), then filenames are shuffled in each epoch.

- Reader is defined and used to read from files from the filename queue. The reader is selected based on the input file format. The read method a key identifying the file and record (useful while debugging) and a scalar string value. For example, in the case of .csv file formats:

reader = tf.TextLineReader() 
key, value = reader.read(filename_queue)

- Decoder: One or more decoder and conversion ops are then used to decode the value string into Tensors that make up the training example:

record_defaults = [[1], [1], [1]]
col1, col2, col3 = tf.decode_csv(value, record_defaults=record_defaults)

Preloaded data: This is used when the dataset is small and can be loaded fully in the memory. For this, we can store data either in a constant or variable. While using a variable, we need to set the trainable flag to False so that the data does not change while training. As TensorFlow constants:

# Preloaded data as constant
training_data = ... 
training_labels = ... 
with tf.Session as sess: 
   x_data = tf.Constant(training_data) 
   y_data = tf.Constant(training_labels) 
...

# Preloaded data as Variables
training_data = ... 
training_labels = ... 
with tf.Session as sess: 
   data_x = tf.placeholder(dtype=training_data.dtype, shape=training_data.shape) 
   data_y = tf.placeholder(dtype=training_label.dtype, shape=training_label.shape) 
   x_data = tf.Variable(data_x, trainable=False, collections[]) 
   y_data = tf.Variable(data_y, trainable=False, collections[]) 
...

Conventionally, the data is divided into three parts--training data, validation data, and test data.

Defining the model: A computational graph is built describing the network structure. It involves specifying the hyperparameters, variables, and placeholders sequence in which information flows from one set of neurons to another and a loss/error function. You will learn more about the computational graphs in a later section of this chapter.

Training/Learning: The learning in DNNs is normally based on the gradient descent algorithm, (it will be dealt in detail in Chapter 2, Regression) where the aim is to find the training variables (weights/biases) such that the error or loss (as defined by the user in step 2) is minimized. This is achieved by initializing the variables and using run():

with tf.Session as sess: 
   .... 
   sess.run(...) 
   ...

Evaluating the model: Once the network is trained, we evaluate the network using predict() on validation data and test data. The evaluations give us an estimate of how well our model fits the dataset. We can thus avoid the common mistakes of overfitting or underfitting. Once we are satisfied with our model, we can deploy it in production.

There's more

In TensorFlow 1.3, a new feature called TensorFlow Estimators has been added. TensorFlow Estimators make the task of creating the neural network models even easier, it is a higher level API that encapsulates the process of training, evaluation, prediction and serving. It provides the option of either using pre-made Estimators or one can write their own custom Estimators. With pre-made Estimators, one no longer have to worry about building the computational or creating a session, it handles it all.

At present TensorFlow Estimator has six pre-made Estimators. Another advantage of using TensorFlow pre-made Estimators is that it also by itself creates summaries that can be visualised on TensorBoard. More details about Estimators are available at https://www.tensorflow.org/programmers_guide/estimators.