There are many misconceptions, half-truths, and downright misleading opinions on deep learning. Here are some common mis-conceptions regarding deep learning:
- Artificial intelligence means deep learning and replaces all other techniques
- Deep learning requires a PhD-level understanding of mathematics
- Deep learning is hard to train, almost an art form
- Deep learning requires lots of data
- Deep learning has poor interpretability
- Deep learning needs GPUs
The following paragraphs discuss these statements, one by one.
Deep learning is not artificial intelligence and does not replace all other machine learning algorithms. It is only one family of algorithms in machine learning. Despite the hype, deep learning probably accounts for less than 1% of the machine learning projects in production right now. Most of the recommendation engines and online adverts that you encounter when you browse the net are not powered by deep learning. Most models used internally by companies to manage their subscribers, for example churn analysis, are not deep learning models. The models used by credit institutions to decide who gets credit do not use deep learning.
Deep learning does not require a deep understanding of mathematics unless your interest is in researching new deep learning algorithms and specialized architectures. Most practitioners use existing deep learning techniques on their data by taking an existing architecture and modifying it for their work. This does not require a deep mathematical foundation, the mathematics used in deep learning are taught at high school level throughout the world. In fact, we demonstrate this in Chapter 3, Deep Learning Fundamentals, where we build an entire neural network from basic code in less than 70 lines of code!
Training deep learning models is difficult but it is not an art form. It does require practice, but the same problems occur over and over again. Even better, there is often a prescribed fix for that problem, for example, if your model is overfitting, add regularization, if your model is not training well, build a more complex model and/or use data augmentation. We will look at this in more depth in Chapter 6, Tuning and Optimizing Models.
There is a lot of truth to the statement that deep learning requires lots of data. However, you may still be able to apply deep learning to the problem by using a pre-trained network, or creating more training data from existing data (data augmentation). We will look at these in later Chapter 6, Tuning and Optimizing Models and Chapter 11, The Next Level in Deep Learning.
Deep learning models are difficult to interpret. By this, we mean being able to explain how the models came to their decision. This is a problem in many machine learning algorithms, not just deep learning. In machine learning, generally there is an inverse relationship between accuracy and interpretation – the more accurate the model needs to be, the less interpretable it is. For some tasks, for example, online advertising, interpretability is not important and there is little cost from being wrong, so the most powerful algorithm is preferred. In some cases, for example, credit scoring, interpretability may be required by law; people could demand an explanation of why they were denied credit. In other cases, such as medical diagnoses, interpretability may be important for a doctor to see why the model decided someone had a disease.
If interpretability is important, some methods can be applied to machine learning models to get an understanding of why they predicted the output for an instance. Some of them work by perturbing the data (that is, making slight changes to it) and trying to find what variables are most influential in the model coming to its decision. One such algorithm is called LIME (Local Interpretable Model-Agnostic Explanations). (Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. Why should I trust you?: Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2016.) This has been implemented in many languages including R; there is a package called lime. We will use this package in Chapter 6, Tuning and Optimizing Models.
Finally, while deep learning models can run on CPUs, the truth is that any real work requires a workstation with a GPU. This does not mean that you need to go out and purchase one, as you can use cloud-computing to train your models. In Chapter 10, Running Deep Learning Models in the Cloud, will look at using AWS, Azure, and Google Cloud to train deep learning models.