Handwritten digit recognition using CNNs
For sure, we begin with exploration of the handwritten digit dataset.
Get started with exploring MNIST
The MNIST dataset from http://yann.lecun.com/exdb/mnist/ consists of a training set of 60,000 samples, and a testing set of 10,000 samples. As said previously, images were originally taken from the NIST, and then centered and resized to the same height and width (28 * 28).
Rather than handling the ubyte
files, train-images-idx3-ubyte.gz
and train-labels-idx1-ubyte.gz
in the preceding website and merge them, we use a dataset that is well-formatted from the Kaggle competition Digit Recognizer, https://www.kaggle.com/c/digit-recognizer/. We can download the training dataset, train.csv
directly from https://www.kaggle.com/c/digit-recognizer/data. It is the only labeled dataset provided in the site, and we will use it to train classification models, evaluate models and do predictions. Now let's load it up:
> data <- read.csv ("train.csv") > dim(data...