Computer vision and image recognition is often considered the first area where breakthroughs of deep learning occurred. Handwritten digit recognition has become a Hello World in this field, and a common evaluation set for image classification algorithms and techniques is the scanned document dataset constructed from the National Institute of Standards and Technology (NIST), called MNIST (M stands for modified, which means data is pre-processed for the ease of machine learning processes).
Some examples from MNIST are shown as follows:
Some researchers have so far achieved the best performance 0.21% error rate on the MNIST dataset using CNNs. Details can be found in the paper, Regularization of Neural Networks using DropConnect, published in the International Conference on Machine Learning (ICML) in 2013. Other comparable results, for example 0.23%, 0.27% and 0.31%, are also yielded by CNNs and deep neural networks. However, traditional machine learning algorithms with sophisticated feature engineering techniques could only yield error rates ranging from 0.52% to 7.6%, which were achieved by using Support Vector Machine (SVMs) and pairwise linear classifiers respectively.
Besides image recognition (such as the well known face recognition), the applications of deep learning are extended to more challenging tasks including:
- Image-based search engines, which cover image classification and image similarity encoding, heavily utilizing deep learning techniques.
- Machine vision, with self-driving cars as an example, which interprets 360° camera views to make decisions in real time.
- Color restoration from black and white photos—the examples after color recovery from http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/extra.html are impressive.
- Image generation, including handwriting, cat images, and even video game images, or whatever image you name it. For example, we use an interesting playground, https://www.cs.toronto.edu/~graves/handwriting.html (developed by Alex Graves from the University of Toronto), to create handwritings of the title of this book in three different styles:
Natural language processing (NLP) is another field where deep learning is dominant in modern solutions. Recall we described deep learning models with recurrent architecture are appropriate for sequences of inputs, such as natural language and text. In recent years, deep learning has greatly helped to improve:
- Machine translation, for example the sentence-based Google Neural Machine Translation system (GNMT) which utilizes deep RNNs to improve accuracy and fluency
- Sentiment analysis, information retrieval, theme detection and many other common NLP applications, where deep learning models have achieved state-of-the-art performance thanks to word embedding techniques
- Text generation, where RNNs learn the intricate relationship between words (including punctuation) in sentences and to write text, to become an author or a virtual Shakespeare
Image captioning generation, also known as image to text, couples recent breakthroughs in computer vision and NLP. It leverages CNNs to detect and classify objects in images, and assigns labels to those objects. It then applies RNNs to describe those labels in a comprehensible sentence. The following examples are captured from the web demo from http://cs.stanford.edu/people/karpathy/deepimagesent/generationdemo/ (developed by Andrej Karpathy from Stanford University):
Similarly, sound and speech is also a field of sequential learning, where machine learning algorithms are applied to predict time series or label sequence data. Speech recognition has been greatly revolutionized by deep learning. And now, deep learning based products like Apple's Siri, Amazon's Alexa, Google Home, Skype Translator and many others are "invading" our lives, in a good way for sure. Besides an author writing text, deep learning models can also be a music composer. For example, Francesco Marchesani from the Polytechnic University of Milan was able to train RNNs to produce Chopin's music.
Additionally, deep learning also excels in many use cases in video. It makes significant contributions to the boost of virtual reality with its capability of accurate motion detection, and to the advance of real-time behavior analysis in surveillance videos. Scientists from Google, DeepMind, and Oxford even built a computer lip reader called LipNet, achieving a success rate of 93%.
Besides supervised and unsupervised learning cases, deep learning is heavily used in reinforcement learning. Robots who can handle objects, climb stairs, operate in kitchens are not new to us. Recently, Google's AlphaGo beating the world's elite Go players received widespread media coverage. Nowadays, everybody looks forward to seeing self-driving cars being out in the market in just one or two years. These have all benefited from the advance of deep learning in reinforcement learning. Oh, and don't forget computers are taught to play the game, FlappyBird!
We did not even mention bioinformatics, drug discovery, recommendation systems in e-commerce, finance, especially the stock market, insurance and the Internet of Things (IoT). In fact, the list of deep learning applications is already long, and only gets longer and longer.
I hope this section excited you about deep learning and its power of providing better solutions to many machine learning problems we are facing. Artificial intelligence has a brighter future thanks to the advance of deep learning.
So what are we waiting for? Let's get started with handwritten digit recognition!