In order to understand what deep learning is, we have to explore what Artificial Intelligence (AI) is and how it relates to machine learning and deep learning. Conceptually, deep learning is a form of machine learning, whilst machine learning is a form of AI:
In computer science, Artificial intelligence, is a form of intelligence demonstrated by machines. AI is a term that was invented in the 1950s by scientists doing research in computer science. AI encompasses a large set of algorithms that shows behavior that is more intelligent than the standard software we build for our computers.
Some algorithms demonstrate intelligent behavior but aren't capable of improving themselves. One group of algorithms, called machine learning algorithms, can learn from sample data that you show them and generate models that you then use on similar data to make predictions.
Within the group of machine learning algorithms there's the sub-category of deep learning algorithms. This group of algorithms uses models that are inspired by the structure and function of a biological brain found in humans or animals.
Both machine learning and deep learning learn from sample data that you provide. When we build regular programs, we write business rules by using different language constructs, such as if-statements, loops, and functions. The rules are fixed. In machine learning, we feed samples and an expected answer into an algorithm that then learns the rules that connect the samples to the expected answers:
There are two major components in machine learning: machine learning models and machine learning algorithms.
When you use machine learning to build a program, you first choose a machine learning model. A machine learning model is a mathematical equation containing trainable parameters that transforms input into a predicted answer. This model shapes the rules that the computer will learn. For example: predicting the miles per gallon for a car requires that you model reality in a certain way. Classifying whether a credit card transaction is fraudulent requires a different model.
The representation of the input could be the properties of a car turned into a vector. The output of the model could be the miles per gallon for a car. In the case of credit card fraud, the input could be the properties of the user account and the transaction that was done. The output representation could be a score between 0 and 1 where a value close to 1 means that the transaction should be rejected.
The mathematical transformation in the machine learning model is controlled by a set of parameters that need to be trained for the transformation to produce the correct output representation.
This is where the second part, the machine learning algorithm comes into play. To find the best values for the parameters in the machine learning model we need to perform a multi-step process:
- Initially, the computer will choose a random value for each of the unknown parameters in your model
- It will then use sample data to make an initial prediction
- This prediction is fed into a loss function together with the expected output to get feedback regarding how well the model is performing
- This feedback is then used by the machine learning algorithm to find better values for the parameters in the model
These steps are repeated many times to find the best possible values for the parameters in the model. If all goes well, you end up with a model that is capable of making accurate predictions for many complicated situations.
The fact that we can learn rules from examples is a useful concept. There are many situations where we can't use simple rules to solve a particular problem. For example: credit card fraud cases come in many shapes and sizes. Sometimes a hacker slowly breaks the system injecting smaller hacks over time and then stealing the money. Other times hackers simply try to steal a lot of money in one attempt. A rule-based program would become too hard to maintain because it would need to contain a lot of code to handle all different fraud cases. Machine learning is an elegant way to solve this problem. It understands how to handle different kinds of credit card fraud without a lot of code. And it is also capable of making a judgment on cases that it hasn't seen before, within reasonable boundaries.