The MNIST dataset is included in keras and can be accessed using the dataset_mnist() function:
- Let's load the data into the R environment:
mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y
- Our training data is of the form (images, width, height). Due to this, we'll convert the data into a one-dimensional array and rescale it:
# Reshaping the data
x_train <- array_reshape(x_train , c(nrow(x_train),784))
x_test <- array_reshape(x_test , c(nrow(x_test),784))
# Rescaling the data
x_train <- x_train/255
x_test <- x_test/255
- Our target data is an integer vector and contains values from 0 to 9. We need to one-hot encode our target variable in order to convert it into a binary matrix format. We use the to_categorical() function from keras to do this:
y_train <- to_categorical(y_train,10)
y_test <- to_categorical(y_test,10)
- Now, we can build the model. We use the Sequential API from keras to configure this model. Note that in the first layer's configuration, the input_shape argument is the shape of the input data; that is, it's a numeric vector of length 784 and represents a grayscale image. The final layer outputs a length 10 numeric vector (probabilities for each digit from 0 to 9) using a softmax activation function:
model <- keras_model_sequential()
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 10, activation = 'softmax')
Let's look at the details of the model:
summary(model)
Here's the model's summary:
- Next, we go ahead and compile our model by providing some appropriate arguments, such as the loss function, optimizer, and metrics. Here, we have used the rmsprop optimizer. This optimizer is similar to the gradient descent optimizer, except that it can increase our learning rate so that our algorithm can take larger steps in the horizontal direction, thus converging faster:
model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
- Now, let's fit the training data to the configured model. Here, we've set the number of epochs to 30, the batch size to 128, and the validation percentage to 20:
history <- model %>% fit(
x_train, y_train,
epochs = 30, batch_size = 128,
validation_split = 0.2
)
- Next, we visualize the model metrics. We can plot the model's accuracy and loss metrics from the history variable. Let's plot the model's accuracy:
# Plot the accuracy of the training data
plot(history$metrics$acc, main="Model Accuracy", xlab = "epoch", ylab="accuracy", col="blue",
type="l")
# Plot the accuracy of the validation data
lines(history$metrics$val_acc, col="green")
# Add Legend
legend("bottomright", c("train","validation"), col=c("blue", "green"), lty=c(1,1))
The following plot shows the model's accuracy on the training and test dataset:
Now, let's plot the model's loss:
# Plot the model loss of the training data
plot(history$metrics$loss, main="Model Loss", xlab = "epoch", ylab="loss", col="blue", type="l")
# Plot the model loss of the validation data
lines(history$metrics$val_loss, col="green")
# Add legend
legend("topright", c("train","validation"), col=c("blue", "green"), lty=c(1,1))
The following plot shows the model's loss on the training and test dataset:
- Now, we predict the classes for the test data instances using the trained model:
model %>% predict_classes(x_test)
- Let's check the accuracy of the model on the test data:
model %>% evaluate(x_test, y_test)
The following diagram shows the model metrics on the test data:
Here, we got an accuracy of around 97.9 %.