Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Codeless Deep Learning with KNIME

You're reading from   Codeless Deep Learning with KNIME Build, train, and deploy various deep neural network architectures using KNIME Analytics Platform

Arrow left icon
Product type Paperback
Published in Nov 2020
Publisher Packt
ISBN-13 9781800566613
Length 384 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (3):
Arrow left icon
Kathrin Melcher Kathrin Melcher
Author Profile Icon Kathrin Melcher
Kathrin Melcher
KNIME AG KNIME AG
Author Profile Icon KNIME AG
KNIME AG
Rosaria Silipo Rosaria Silipo
Author Profile Icon Rosaria Silipo
Rosaria Silipo
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Section 1: Feedforward Neural Networks and KNIME Deep Learning Extension
2. Chapter 1: Introduction to Deep Learning with KNIME Analytics Platform FREE CHAPTER 3. Chapter 2: Data Access and Preprocessing with KNIME Analytics Platform 4. Chapter 3: Getting Started with Neural Networks 5. Chapter 4: Building and Training a Feedforward Neural Network 6. Section 2: Deep Learning Networks
7. Chapter 5: Autoencoder for Fraud Detection 8. Chapter 6: Recurrent Neural Networks for Demand Prediction 9. Chapter 7: Implementing NLP Applications 10. Chapter 8: Neural Machine Translation 11. Chapter 9: Convolutional Neural Networks for Image Classification 12. Section 3: Deployment and Productionizing
13. Chapter 10: Deploying a Deep Learning Network 14. Chapter 11: Best Practices and Other Deployment Options 15. Other Books You May Enjoy

Introducing Autoencoders

In previous chapters, we have seen that neural networks are very powerful algorithms. The power of each network lies in its architecture, activation functions, and regularization terms, plus a few other features. Among the varieties of neural architectures, there is a very versatile one, especially useful for three tasks: detecting unknown events, detecting unexpected events, and reducing the dimensionality of the input space. This neural network is the autoencoder.

Architecture of the Autoencoder

The autoencoder (or autoassociator) is a multilayer feedforward neural network, trained to reproduce the input vector onto the output layer. Like many neural networks, it is trained using the gradient descent algorithm, or one of its modern variations, against a loss function, such as the Mean Squared Error (MSE). It can have as many hidden layers as desired. Regularization terms and other general parameters that are useful for avoiding overfitting or for improving the learning process can be applied here as well.

The only constraint on the architecture is that the number of input units must be the same as the number of output units, as the goal is to train the autoencoder to reproduce the input vector onto the output layer.

The simplest autoencoder has only three layers: one input layer, one hidden layer, and one output layer. More complex structured autoencoders might include additional hidden layers:

Figure 5.1 – A simple autoencoder

Figure 5.1 – A simple autoencoder

Autoencoders can be used for many different tasks. Let's first see how an autoencoder can be used for dimensionality reduction.

Reducing the Input Dimensionality with an Autoencoder

Let's consider an autoencoder with a very simple architecture: one input layer with units; one output layer, also with  units; and one hidden layer with units. If , the autoencoder produces a compression of the input vector onto the hidden layer, reducing its dimensionality from to .

In this case, the first part of the network, moving the data from a vector with size to a vector with size , plays the role of the encoder. The second part of the network, reconstructing the input vector from a space back into a space, is the decoder. The compression rate is then . The larger the value of and the smaller the value of , the higher the compression rate:

Figure 5.2 – Encoder and decoder subnetworks in a three-layer autoencoder

Figure 5.2 – Encoder and decoder subnetworks in a three-layer autoencoder

When using the autoencoder for dimensionality reduction, the full network is first trained to reproduce the input vector onto the output layer. Then, before deployment, it is split into two parts: the encoder (input layer and hidden layer) and the decoder (hidden layer and output layer). The two subnetworks are stored separately.

Tip

If you are interested in the output of the bottleneck layer, you can configure the Keras Network Executor node to output the middle layer. Alternatively, you can split the network within the DL Python Network Editor node by writing a few lines of Python code.

During the deployment phase, in order to compress an input record, we just pass it through the encoder and save the output of the hidden layer as the compressed record. Then, in order to reconstruct the original vector, we pass the compressed record through the decoder and save the output values of the output layer as the reconstructed vector.

If a more complex structure is used for the autoencoder – for example, with more than one hidden layer – one of the hidden layers must work as the compressor output, producing the compressed record and separating the encoder from the decoder subnetwork.

Now, the question when we talk about data compression is how faithfully can the original record be reconstructed? How much information is lost by using the output of the hidden layer instead of the original data vector? Of course, this all depends on how well the autoencoder performs and how large our error tolerance is.

During testing, when we apply the network to new data, we denormalize the output values and we calculate the chosen error metric – for example, the Root Mean Square Error (RMSE) – between the original input data and the reconstructed data on the whole test set. This error value gives us a measure of the quality of the reconstructed data. Of course, the higher the compression rate, the higher the reconstruction error. The problem thus becomes to train the network to achieve acceptable performance, as per our error tolerance.

Let's move on to the next application field of autoencoders: anomaly detection.

Detecting Anomalies Using an Autoencoder

In most classification/prediction problems, we have a set of examples covering all event classes and based on this dataset, we train a model to classify events. However, sometimes, the event class we want to predict is so rare and unexpected that no (or almost no) examples are available at all. In this case, we do not talk about classification or prediction but about anomaly detection.

An anomaly can be any rare, unexpected, unknown event: a cardiac arrhythmia, a mechanical breakdown, a fraudulent transaction, or other rare, unexpected, unknown events. In this case, since no examples of anomalies are available in the training set, we need to use neural networks in a more creative way than for conventional, standard classification. The autoencoder structure lends itself to such creative usage, as required for the solution of an anomaly detection problem (see, for example, A.G. Gebresilassie, Neural Networks for Anomaly (Outliers) Detection, https://blog.goodaudience.com/neural-networks-for-anomaly-outliers-detection-a454e3fdaae8).

Since no anomaly examples are available, the autoencoder is trained only on non-anomaly examples. Let's call these examples of the "normal" class. On a training set full of "normal" data, the autoencoder network is trained to reproduce the input feature vector onto the output layer.

The idea is that, when required to reproduce a vector of the "normal" class, the autoencoder is likely to perform a decent job because that is what it was trained to do. However, when required to reproduce an anomaly on the output layer, it will hopefully fail because it won't have seen this kind of vector throughout the whole training phase. Therefore, if we calculate the distance – any distance – between the original vector and the reproduced vector, we see a small distance for input vectors of the "normal" class and a much larger distance for input vectors representing an anomaly.

Thus, by setting a threshold, , we should be able to detect anomalies with the following rule:

IF THEN -> "normal"

IF THEN -> "anomaly"

Here, is the reconstruction error for the input vector, , and is the set threshold.

This sort of solution has already been implemented successfully for fraud detection, as described in a blog post, Credit Card Fraud Detection using Autoencoders in Keras -- TensorFlow for Hackers (Part VII), by Venelin Valkov (https://medium.com/@curiousily/credit-card-fraud-detection-using-autoencoders-in-keras-tensorflow-for-hackers-part-vii-20e0c85301bd). In this chapter, we will use the same idea to build a similar solution using a different autoencoder structure.

Let's find out how the idea of an autoencoder can be used to detect fraudulent transactions.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image