Any deep learning network consists of four important components: Dataset, defining the model (network structure), Training/Learning, and Prediction/Evaluation. We can do all these in TensorFlow; let's see how:
- Dataset: DNNs depend on large amounts of data. The data can be collected or generated or one can also use standard datasets available. TensorFlow supports three main methods to read the data. There are different datasets available; some of the datasets we will be using to train the models built in this book are as follows:
- MNIST: It is the largest database of handwritten digits (0 - 9). It consists of a training set of 60,000 examples and a test set of 10,000 examples. The dataset is maintained at Yann LeCun's home page (http://yann.lecun.com/exdb/mnist/). The dataset is included in the TensorFlow library in tensorflow.examples.tutorials.mnist.
- CIFAR10: This dataset contains 60,000 32 x 32 color images in 10 classes, with 6,000 images per class. The training set contains 50,000 images and test dataset--10,000 images. The ten classes of the dataset are: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. The data is maintained by the Computer Science Department, University of Toronto (https://www.cs.toronto.edu/~kriz/cifar.html).
- WORDNET: This is a lexical database of English. It contains nouns, verbs, adverbs, and adjectives, grouped into sets of cognitive synonyms (Synsets), that is, words that represent the same concept, for example, shut and close or car and automobile are grouped into unordered sets. It contains 155,287 words organised in 117,659 synsets for a total of 206,941 word-sense pairs. The data is maintained by Princeton University (https://wordnet.princeton.edu/).
- ImageNET: This is an image dataset organized according to the WORDNET hierarchy (only nouns at present). Each meaningful concept (synset) is described by multiple words or word phrases. Each synset is represented on average by 1,000 images. At present, it has 21,841 synsets and a total of 14,197,122 images. Since 2010, an annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) has been organized to classify images in one of the 1,000 object categories. The work is sponsored by Princeton University, Stanford University, A9, and Google (http://www.image-net.org/).
- YouTube-8M: This is a large-scale labelled video dataset consisting of millions of YouTube videos. It has about 7 million YouTube video URLs classified into 4,716 classes, organized into 24 top-level categories. It also provides preprocessing support and frame-level features. The dataset is maintained by Google Research (https://research.google.com/youtube8m/).
Reading the data: The data can be read in three ways in TensorFlow--feeding through feed_dict, reading from files, and using preloaded data. We will be using the components described in this recipe throughout the book for reading, and feeding the data. In the next steps, you will learn each one of them.
- Feeding: In this case, data is provided while running each step, using the feed_dict argument in the run() or eval() function call. This is done with the help of placeholders and this method allows us to pass Numpy arrays of data. Consider the following part of code using TensorFlow:
...
y = tf.placeholder(tf.float32)
x = tf.placeholder(tf.float32).
...
with tf.Session as sess:
X_Array = some Numpy Array
Y_Array = other Numpy Array
loss= ...
sess.run(loss,feed_dict = {x: X_Array, y: Y_Array}).
...
Here, x and y are the placeholders; using them, we pass on the array containing X values and the array containing Y values with the help of feed_dict.
- Reading from files: This method is used when the dataset is very large to ensure that not all data occupies the memory at once (imagine the 60 GB YouTube-8m dataset). The process of reading from files can be done in the following steps:
-
- A list of filenames is created using either string Tensor ["file0", "file1"] or [("file%d"i) for in in range(2)] or using the files = tf.train.match_filenames_once('*.JPG') function.
filename_queue = tf.train.string_input_producer(files)
# where files is the list of filenames created above
This function also provides an option to shuffle and set a maximum number of epochs. The whole list of filenames is added to the queue for each epoch. If the shuffling option is selected (shuffle=True), then filenames are shuffled in each epoch.
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
record_defaults = [[1], [1], [1]]
col1, col2, col3 = tf.decode_csv(value, record_defaults=record_defaults)
- Preloaded data: This is used when the dataset is small and can be loaded fully in the memory. For this, we can store data either in a constant or variable. While using a variable, we need to set the trainable flag to False so that the data does not change while training. As TensorFlow constants:
# Preloaded data as constant
training_data = ...
training_labels = ...
with tf.Session as sess:
x_data = tf.Constant(training_data)
y_data = tf.Constant(training_labels)
...
# Preloaded data as Variables
training_data = ...
training_labels = ...
with tf.Session as sess:
data_x = tf.placeholder(dtype=training_data.dtype, shape=training_data.shape)
data_y = tf.placeholder(dtype=training_label.dtype, shape=training_label.shape)
x_data = tf.Variable(data_x, trainable=False, collections[])
y_data = tf.Variable(data_y, trainable=False, collections[])
...
Conventionally, the data is divided into three parts--training data, validation data, and test data.
- Defining the model: A computational graph is built describing the network structure. It involves specifying the hyperparameters, variables, and placeholders sequence in which information flows from one set of neurons to another and a loss/error function. You will learn more about the computational graphs in a later section of this chapter.
- Training/Learning: The learning in DNNs is normally based on the gradient descent algorithm, (it will be dealt in detail in Chapter 2, Regression) where the aim is to find the training variables (weights/biases) such that the error or loss (as defined by the user in step 2) is minimized. This is achieved by initializing the variables and using run():
with tf.Session as sess:
....
sess.run(...)
...
- Evaluating the model: Once the network is trained, we evaluate the network using predict() on validation data and test data. The evaluations give us an estimate of how well our model fits the dataset. We can thus avoid the common mistakes of overfitting or underfitting. Once we are satisfied with our model, we can deploy it in production.