What this book covers
Chapter 1, Introduction to Deep Learning, covers how deep learning has gained its popularity over the last decade and is now growing even faster than machine learning due to its enhanced functionalities. This chapter starts with an introduction of the real-life applications of Artificial Intelligence, the associated challenges, and how effectively Deep learning is able to address all of these. The chapter provides an in-depth explanation of deep learning by addressing some of the major machine learning problems such as, The curse of dimensionality, Vanishing gradient problem, and the likes. To get started with deep learning for the subsequent chapters, the classification of various deep learning networks is discussed in the latter part of this chapter. This chapter is primarily suitable for readers, who are interested to know the basics of deep learning without getting much into the details of individual deep neural networks.
Chapter 2, Distributed Deep Learning for Large - Scale Data, explains that big data and deep learning are undoubtedly the two hottest technical trends in recent days. Both of them are critically interconnected and have shown tremendous growth in the past few years. This chapter starts with how deep learning technologies can be furnished with massive amount of unstructured data to facilitate extraction of valuable hidden information out of them. Famous technological companies such as Google, Facebook, Apple, and the like are using this large-scale data in their deep learning projects to train some aggressively deep neural networks in a smarter way. Deep neural networks, however, show certain challenges while dealing with Big data. This chapter provides a detailed explanation of all these challenges. The latter part of the chapter introduces Hadoop, to discuss how deep learning models can be implemented using Hadoop's YARN and its iterative Map-reduce paradigm. The chapter further introduces Deeplearning4j, a popular open source distributed framework for deep learning and explains its various components.
Chapter 3 , Convolutional Neural Network, introduces Convolutional neural network (CNN), a deep neural network widely used by top technological industries in their various deep learning projects. CNN comes with a vast range of applications in various fields such as image recognition, video recognition, natural language processing, and so on. Convolution, a special type of mathematical operation, is an integral component of CNN. To get started, the chapter initially discusses the concept of convolution with a real-life example. Further, an in-depth explanation of Convolutional neural network is provided by describing each component of the network. To improve the performance of the network, CNN comes with three most important parameters, namely, sparse connectivity, parameter sharing, and equivariant representation. The chapter explains all of these to get a better grip on CNN. Further, CNN also possesses few crucial hyperparameters, which help in deciding the dimension of output volume of the network. A detailed discussion along with the mathematical relationship among these hyperparameters can be found in this chapter. The latter part of the chapter focuses on distributed convolutional neural networks and shows its implementation using Hadoop and Deeplearning4j.
Chapter 4, Recurrent Neural Network, explains that it is a special type of neural network that can work over long sequences of vectors to produce different sequences of vectors. Recently, they have become an extremely popular choice for modeling sequences of variable length. RNN has been successfully implemented for various applications such as speech recognition, online handwritten recognition, language modeling, and the like. The chapter provides a detailed explanation of the various concepts of RNN by providing essential mathematical relations and visual representations. RNN possesses its own memory to store the output of the intermediate hidden layer. Memory is the core component of the recurrent neural network, which has been discussed in this chapter with an appropriate block diagram. Moreover, the limitations of uni-directional recurrent neural networks are provided, and to overcome the same, the concept of bidirectional recurrent neural network (BRNN) is introduced. Later, to address the problem of vanishing gradient, introduced in chapter 1, a special unit of RNN, called Long short-term Memory (LSTM) is discussed. In the end, the implementation of distributed deep recurrent neural network with Hadoop is shown with Deeplearning4j.
Chapter 5 , Restricted Boltzmann Machines, covers both the models discussed in chapters 3 and 4 and explains that they are discriminative models. A generative model called Restricted Boltzmann machine (RBM) is discussed in chapter 5. RBM is capable of randomly producing visible data values when hidden parameters are supplied to it. The chapter starts with introducing the concept of an Energy-based model, and explains how Restricted Boltzmann machines are related to it. Furthermore, the discussion progresses towards a special type of RBM known as Convolutional Restricted Boltzmann machine, which is a combination of both Convolution and Restricted Boltzmann machines, and facilitates in the extraction of the features of high dimensional images.
Deep Belief networks (DBN), a widely used multilayer network composed of several Restricted Boltzmann machines gets introduced in the latter part of the chapter. This part also discusses how DBN can be implemented in a distributed environment using Hadoop. The implementation of RBM as well as distributed DBN using Deeplearning4j is discussed in the end of the chapter.
Chapter 6, Autoencoders, introduces one more generative model called autoencoder, which is generally used for dimensionality reduction, feature learning, or extraction. The chapter starts with explaining the basic concept of autoencoder and its generic block diagram. The core structure of an autoencoder is basically divided into two parts, encoder and decoder. The encoder maps the input to the hidden layer, whereas the decoder maps the hidden layer to the output layer. The primary concern of a basic autoencoder is to copy certain aspects of the input layer to the output layer. The next part of the chapter discusses a type of autoencoder called sparse autoencoder, which is based on the distributed sparse representation of the hidden layer. Going further, the concept of deep autoencoder, comprising multiple encoders and decoders is explained in-depth with an appropriate example and block diagram. As we proceed, denoising autoencoder and stacked denoising autoencoder are explained in the latter part of the chapter. In conclusion, chapter 6 also shows the implementation of stacked denoising autoencoder and deep autoencoder in Hadoop using Deeplearning4j.
Chapter 7 , Miscellaneous Deep Learning Operations using Hadoop, focuses, mainly,on the design of three most commonly used machine learning applications in distributed environment. The chapter discusses the implementation of large-scale video processing, large-scale image processing, and natural language processing (NLP) with Hadoop. It explains how the large-scale video and image datasets can be deployed in Hadoop Distributed File System (HDFS) and processed with Map-reduce algorithm. For NLP, an in-depth explanation of the design and implementation is provided at the end of the chapter.