In short, deep learning algorithms are mostly a set of ANNs that can make better representations of large-scale datasets, in order to build models that learn these representations very extensively. Nowadays it's not limited to ANNs, but there have been really many theoretical advances and software and hardware improvements that were necessary for us to get to this day. In this regard, Ian Goodfellow et al. (Deep Learning, MIT Press, 2016) defined deep learning as follows:
"Deep learning is a particular kind of machine learning that achieves great power and flexibility by learning to represent the world as a nested hierarchy of concepts, with each concept defined in relation to simpler concepts, and more abstract representations computed in terms of less abstract ones."
Let's take an example; suppose we want to develop a predictive analytics model, such as an animal recognizer, where our system has to resolve two problems:
- To classify whether an image represents a cat or a dog
- To cluster images of dogs and cats.
If we solve the first problem using a typical ML method, we must define the facial features (ears, eyes, whiskers, and so on) and write a method to identify which features (typically nonlinear) are more important when classifying a particular animal.
However, at the same time, we cannot address the second problem because classical ML algorithms for clustering images (such as k-means) cannot handle nonlinear features. Deep learning algorithms will take these two problems one step further and the most important features will be extracted automatically after determining which features are the most important for classification or clustering.
In contrast, when using a classical ML algorithm, we would have to provide the features manually. In summary, the deep learning workflow would be as follows:
- A deep learning algorithm would first identify the edges that are most relevant when clustering cats or dogs. It would then try to find various combinations of shapes and edges hierarchically. This step is called ETL.
- After several iterations, hierarchical identification of complex concepts and features is carried out. Then, based on the identified features, the DL algorithm automatically decides which of these features are most significant (statistically) to classify the animal. This step is feature extraction.
- Finally, it takes out the label column and performs unsupervised training using AutoEncoders (AEs) to extract the latent features to be redistributed to k-means for clustering.
- Then the clustering assignment hardening loss (CAH loss) and reconstruction loss are jointly optimized towards optimal clustering assignment. Deep Embedding Clustering (see more at https://arxiv.org/pdf/1511.06335.pdf) is an example of such an approach. We will discuss deep learning-based clustering approaches in Chapter 11, Discussion, Current Trends, and Outlook.
Up to this point, we have seen that deep learning systems are able to recognize what an image represents. A computer does not see an image as we see it because it only knows the position of each pixel and its color. Using deep learning techniques, the image is divided into various layers of analysis.
At a lower level, the software analyzes, for example, a grid of a few pixels with the task of detecting a type of color or various nuances. If it finds something, it informs the next level, which at this point checks whether or not that given color belongs to a larger form, such as a line. The process continues to the upper levels until you understand what is shown in the image. The following diagram shows what we have discussed in the case of an image classification system:
A deep learning system at work on a dog versus cat classification problem
More precisely, the preceding image classifier can be built layer by layer, as follows:
- Layer 1: The algorithm starts identifying the dark and light pixels from the raw images
- Layer 2: The algorithm then identifies edges and shapes
- Layer 3: It then learns more complex shapes and objects
- Layer 4: The algorithm then learns which objects define a human face
Although this is a very simple classifier, software capable of doing these types of things is now widespread and is found in systems for recognizing faces, or in those for searching by an image on Google, for example. These pieces of software are based on deep learning algorithms.
On the contrary, by using a linear ML algorithm, we cannot build such applications since these algorithms are incapable of handling nonlinear image features. Also, using ML approaches, we typically handle a few hyperparameters only. However, when neural networks are brought to the party, things become too complex. In each layer, there are millions or even billions of hyperparameters to tune, so much that the cost function becomes non-convex.
Another reason is that activation functions used in hidden layers are nonlinear, so the cost is non-convex. We will discuss this phenomenon in more detail in later chapters but let's take a quick look at ANNs.