Here, we use Keras to define a network that recognizes MNIST handwritten digits. We start with a very simple neural network and then progressively improve it.
Keras provides suitable libraries to load the dataset and split it into training sets X_train, used for fine-tuning our net, and tests set X_test, used for assessing the performance. Data is converted into float32 for supporting GPU computation and normalized to [0, 1]. In addition, we load the true labels into Y_train and Y_test respectively and perform a one-hot encoding on them. Let's see the code:
from __future__ import print_function
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD
from keras.utils import np_utils
np.random.seed(1671) # for reproducibility
# network and training
NB_EPOCH = 200
BATCH_SIZE = 128
VERBOSE = 1
NB_CLASSES = 10 # number of outputs = number of digits
OPTIMIZER = SGD() # SGD optimizer, explained later in this chapter
N_HIDDEN = 128
VALIDATION_SPLIT=0.2 # how much TRAIN is reserved for VALIDATION
# data: shuffled and split between train and test sets
#
(X_train, y_train), (X_test, y_test) = mnist.load_data()
#X_train is 60000 rows of 28x28 values --> reshaped in 60000 x 784
RESHAPED = 784
#
X_train = X_train.reshape(60000, RESHAPED)
X_test = X_test.reshape(10000, RESHAPED)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# normalize
#
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, NB_CLASSES)
Y_test = np_utils.to_categorical(y_test, NB_CLASSES)
The input layer has a neuron associated with each pixel in the image for a total of 28 x 28 = 784 neurons, one for each pixel in the MNIST images.
Typically, the values associated with each pixel are normalized in the range [0, 1] (which means that the intensity of each pixel is divided by 255, the maximum intensity value). The output is 10 classes, one for each digit.
The final layer is a single neuron with activation function softmax, which is a generalization of the sigmoid function. Softmax squashes a k-dimensional vector of arbitrary real values into a k-dimensional vector of real values in the range (0, 1). In our case, it aggregates 10 answers provided by the previous layer with 10 neurons:
# 10 outputs
# final stage is softmax
model = Sequential()
model.add(Dense(NB_CLASSES, input_shape=(RESHAPED,)))
model.add(Activation('softmax'))
model.summary()
Once we define the model, we have to compile it so that it can be executed by the Keras backend (either Theano or TensorFlow). There are a few choices to be made during compilation:
- We need to select the optimizer that is the specific algorithm used to update weights while we train our model
- We need to select the objective function that is used by the optimizer to navigate the space of weights (frequently, objective functions are called loss function, and the process of optimization is defined as a process of loss minimization)
- We need to evaluate the trained model
Some common choices for the objective function (a complete list of Keras objective functions is at https://keras.io/objectives/) are as follows:
- MSE: This is the mean squared error between the predictions and the true values. Mathematically, if is a vector of n predictions, and Y is the vector of n observed values, then they satisfy the following equation:
These objective functions average all the mistakes made for each prediction, and if the prediction is far from the true value, then this distance is made more evident by the squaring operation.
- Binary cross-entropy: This is the binary logarithmic loss. Suppose that our model predicts p while the target is t, then the binary cross-entropy is defined as follows:
This objective function is suitable for binary labels prediction.
- Categorical cross-entropy: This is the multiclass logarithmic loss. If the target is ti,j and the prediction is pi,j, then the categorical cross-entropy is this:
This objective function is suitable for multiclass labels predictions. It is also the default choice in association with softmax activation.
Some common choices for metrics (a complete list of Keras metrics is at https://keras.io/metrics/) are as follows:
- Accuracy: This is the proportion of correct predictions with respect to the targets
- Precision: This denotes how many selected items are relevant for a multilabel classification
- Recall: This denotes how many selected items are relevant for a multilabel classification
Metrics are similar to objective functions, with the only difference that they are not used for training a model but only for evaluating a model. Compiling a model in Keras is easy:
model.compile(loss='categorical_crossentropy', optimizer=OPTIMIZER, metrics=['accuracy'])
Once the model is compiled, it can be then trained with the fit() function, which specifies a few parameters:
- epochs: This is the number of times the model is exposed to the training set. At each iteration, the optimizer tries to adjust the weights so that the objective function is minimized.
- batch_size: This is the number of training instances observed before the optimizer performs a weight update.
Training a model in Keras is very simple. Suppose we want to iterate for NB_EPOCH steps:
history = model.fit(X_train, Y_train,
batch_size=BATCH_SIZE, epochs=NB_EPOCH,
verbose=VERBOSE, validation_split=VALIDATION_SPLIT)
We reserved part of the training set for validation. The key idea is that we reserve a part of the training data for measuring the performance on the validation while training. This is a good practice to follow for any machine learning task, which we will adopt in all our examples.
Once the model is trained, we can evaluate it on the test set that contains new unseen examples. In this way, we can get the minimal value reached by the objective function and best value reached by the evaluation metric.
Note that the training set and the test set are, of course, rigorously separated. There is no point in evaluating a model on an example that has already been used for training. Learning is essentially a process intended to generalize unseen observations and not to memorize what is already known:
score = model.evaluate(X_test, Y_test, verbose=VERBOSE)
print("Test score:", score[0])
print('Test accuracy:', score[1])
So, congratulations, you have just defined your first neural network in Keras. A few lines of code, and your computer is able to recognize handwritten numbers. Let's run the code and see what the performance is.