In this two-part post, we will get our hands dirty with deep learning by solving a real-world problem. The problem we are gonna tackle is The German Traffic Sign Recognition Benchmark (GTSRB). The problem is to to recognize the traffic sign from the images. Solving this problem is essential for self-driving cars to operate on roads. Here are representative images for each of the traffic sign classes in the GTSRB dataset:
The dataset features 43 different signs under various sizes, lighting conditions, and occlusions and is very similar to real-life data. The training set includes about 39,000 images while the test set has around 12,000 images. Images are not guaranteed to be of fixed dimensions and the sign is not necessarily centered in each image. Each image contains about a 10% border around the actual traffic sign.
Our approach to solving the problem will of course be very successful convolutional neural networks (CNNs). CNNs are multi-layered, feed-forward neural networks that are able to learn task-specific invariant features in a hierarchical manner. You can read more about them in very readable Neural Networks and Deep Learning book by Michael Nielsen. Chapter 6 is the essential reading. Just read the first section in this chapter if you are in a hurry.
A note about the code: This tutorial is a recommended way to run the code in this post, and for experimenting with it is Jupyter notebook. A notebook with slightly improved code is available here.
We will implement our CNNs in Keras. Keras is a deep learning library written in Python and allows us to do quick experimentation. Let's start by installing Keras and other libraries(Use the anaconda python distribution):
$ sudo pip install kerasscikit-image pandas
Then download 'Images and annotations' for training and the test set from GTSRB website and extract them into a folder. Also download the 'Extended annotations including class ids' file for the test set. Organize these files so that the directory structure looks like this:
GTSRB
├── GT-final_test.csv
├── Final_Test
│ └── Images
└── Final_Training
└── Images
├── 00000
├── 00001
├── ...
├── 00041
└── 00042
As you can see from the representative images above, images vary a lot in illumination. They also vary in size. So, let's write a function to do histogram equalization in HSV color space and resize the images to a standard size:
Import numpy as np
From skimage import color, exposure, transform
NUM_CLASSES = 43
IMG_SIZE = 48
Def preprocess_img(img):
# Histogram normalization in v channel
hsv = color.rgb2hsv(img)
hsv[:,:,2] = exposure.equalize_hist(hsv[:,:,2])
img = color.hsv2rgb(hsv)
# central square crop
min_side = min(img.shape[:-1])
centre = img.shape[0]//2, img.shape[1]//2
img = img[centre[0]-min_side//2:centre[0]+min_side//2,
centre[1]-min_side//2:centre[1]+min_side//2,
:]
# rescale to standard size
img = transform.resize(img, (IMG_SIZE, IMG_SIZE))
# roll color axis to axis 0
img = np.rollaxis(img,-1)
return img
Input image to preprocess_img (scaled 4x).
Let's preprocess all the training images and store in numpy arrays. We'll also get labels of images from paths. We'll convert targets to one-hot form as is required by Keras:
From skimage import io
importos
import glob
def get_class(img_path):
return int(img_path.split('/')[-2])
root_dir = 'GTSRB/Final_Training/Images/'
imgs = []
labels = []
all_img_paths = glob.glob(os.path.join(root_dir, '*/*.ppm'))
np.random.shuffle(all_img_paths)
for img_path in all_img_paths:
img = preprocess_img(io.imread(img_path))
label = get_class(img_path)
imgs.append(img)
labels.append(label)
X = np.array(imgs, dtype='float32')
# Make one hot targets
Y = np.eye(NUM_CLASSES, dtype='uint8')[labels]
Let's now define our models. We'll use a feed-forward network with 6 convolutional layers followed by a fully connected hidden layer. We'll also use dropout layers in between. Dropout regularizes the networks, i.e. it prevents the network from overfitting.
All our layers have relu activations except the output layer. Output layer uses softmax activation as it has to output the probability for each of the classes.
Sequential is a Keras container for linear stack of layers. Each of the layers in the model needs to know the input shape it should expect, but it is enough to specify input_shape for the first layer of the Sequential model. Rest of the layers do automatic shape inference.
To attach a fully connected layer (that is, dense layer) to a convolutional layer, we will have to reshape/flatten the output of the conv layer. This is achieved with the Flatten layer.
Go through the documentation of Keras (the relevant documentation is here and here) to understand what the parameters for each of the layers mean.
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import SGD
defcnn_model():
model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='same', input_shape=(3, IMG_SIZE, IMG_SIZE), activation='relu'))
model.add(Convolution2D(32, 3, 3, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Convolution2D(64, 3, 3, border_mode='same', activation='relu'))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Convolution2D(128, 3, 3, border_mode='same', activation='relu'))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(NUM_CLASSES, activation='softmax'))
return model
Before training the model, we need to configure the learning algorithm model and compile it. We need to specify:
from keras.optimizers import SGD
model = cnn_model()
# let's train the model using SGD + momentum
lr = 0.01
sgd = SGD(lr=lr, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
The next step is for us to actually train our model, which is what we will do in part 2. We will also evaluate our model and then do some augmentation to improve it.
In this post, we have used the Keras deep learning framework to implement CNNs in Python. In part 2, you will see how we achieve performance close to human-level performance. You will also see how to improve the accuracy of the model using augmentation of the training data.
References:
Sasank Chilamkurthy works at Qure.ai. His work involves deep learning on medical images obtained from radiology and pathology. He completed his UG in Mumbai at the Indian Institute of Technology, Bombay. He can be found on Github at here.