Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Java Deep Learning Cookbook

You're reading from   Java Deep Learning Cookbook Train neural networks for classification, NLP, and reinforcement learning using Deeplearning4j

Arrow left icon
Product type Paperback
Published in Nov 2019
Publisher Packt
ISBN-13 9781788995207
Length 304 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Rahul Raj Rahul Raj
Author Profile Icon Rahul Raj
Rahul Raj
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Introduction to Deep Learning in Java 2. Data Extraction, Transformation, and Loading FREE CHAPTER 3. Building Deep Neural Networks for Binary Classification 4. Building Convolutional Neural Networks 5. Implementing Natural Language Processing 6. Constructing an LSTM Network for Time Series 7. Constructing an LSTM Neural Network for Sequence Classification 8. Performing Anomaly Detection on Unsupervised Data 9. Using RL4J for Reinforcement Learning 10. Developing Applications in a Distributed Environment 11. Applying Transfer Learning to Network Models 12. Benchmarking and Neural Network Optimization 13. Other Books You May Enjoy

Determining the right activation function

The purpose of an activation function is to introduce non-linearity into a neural network. Non-linearity helps a neural network to learn more complex patterns. We will discuss some important activation functions, and their respective DL4J implementations.

The following are the activation functions that we will consider:

  • Tanh
  • Sigmoid
  • ReLU (short for Rectified Linear Unit)
  • Leaky ReLU
  • Softmax

In this recipe, we will walk through the key steps to decide the right activation functions for a neural network.

How to do it...

  1. Choose an activation function according to the network layers: We need to know the activation functions to be used for the input/hidden layers and output layers. Use ReLU for input/hidden layers preferably.
  2. Choose the right activation function to handle data impurities: Inspect the data that you feed to the neural network. Do you have inputs with a majority of negative values observing dead neurons? Choose the appropriate activation functions accordingly. Use Leaky ReLU if dead neurons are observed during training.
  3. Choose the right activation function to handle overfitting: Observe the evaluation metrics and their variation for each training period. Understand gradient behavior and how well your model performs on new unseen data.
  4. Choose the right activation function as per the expected output of your use case: Examine the desired outcome of your network as a first step. For example, the SOFTMAX function can be used when you need to measure the probability of the occurrence of the output class. It is used in the output layer. For any input/hidden layers, ReLU is what you need for most cases. If you're not sure about what to use, just start experimenting with ReLU; if that doesn't improve your expectations, then try other activation functions.

How it works...

For step 1, ReLU is most commonly used because of its non-linear behavior. The output layer activation function depends on the expected output behavior. Step 4 targets this too.

For step 2, Leaky ReLU is an improved version of ReLU and is used to avoid the zero gradient problem. However, you might observe a performance drop. We use Leaky ReLU if dead neurons are observed during training. Dead neurons are referred to as neurons with a zero gradient for all possible inputs, which makes them useless for training.

For step 3, the tanh and sigmoid activation functions are similar and are used in feed-forward networks. If you use these activation functions, then make sure you add regularization to network layers to avoid the vanishing gradient problem. These are generally used for classifier problems.

There's more...

The ReLU activation function is non-linear, hence, the backpropagation of errors can easily be performed. Backpropagation is the backbone of neural networks. This is the learning algorithm that computes gradient descent with respect to weights across neurons. The following are ReLU variations currently supported in DL4J:

  • ReLU: The standard ReLU activation function:
public static final Activation RELU
  • ReLU6: ReLU activation, which is capped at 6, where 6 is an arbitrary choice:
public static final Activation RELU6
  • RReLU: The randomized ReLU activation function:
public static final Activation RRELU
  • ThresholdedReLU: Threshold ReLU:
public static final Activation THRESHOLDEDRELU

There are a few more implementations, such as SeLU (short for the Scaled Exponential Linear Unit), which is similar to the ReLU activation function but has a slope for negative values.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image