The current implementation, though it shows promising results, can be tuned further. Further improvements could be achieved by utilizing a bigger and more diverse dataset.
Improvements could also be made by utilizing an even more powerful state-of-the-art pretrained image classification model, such as InceptionV3 or InceptionResNetV2.
We could also leverage the functional API from Keras by preparing an ensemble network composed of an even more complex architecture. One of the next steps could be to provide temporal information to the network and see whether there is a scope to learn to colorize videos as well.