Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
TensorFlow Machine Learning Cookbook

You're reading from   TensorFlow Machine Learning Cookbook Over 60 recipes to build intelligent machine learning systems with the power of Python

Arrow left icon
Product type Paperback
Published in Aug 2018
Publisher Packt
ISBN-13 9781789131680
Length 422 pages
Edition 2nd Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Sujit Pal Sujit Pal
Author Profile Icon Sujit Pal
Sujit Pal
Nick McClure Nick McClure
Author Profile Icon Nick McClure
Nick McClure
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Getting Started with TensorFlow FREE CHAPTER 2. The TensorFlow Way 3. Linear Regression 4. Support Vector Machines 5. Nearest-Neighbor Methods 6. Neural Networks 7. Natural Language Processing 8. Convolutional Neural Networks 9. Recurrent Neural Networks 10. Taking TensorFlow to Production 11. More with TensorFlow 12. Other Books You May Enjoy

Implementing activation functions

Activation functions are the key for neural networks to approximate non-linear outputs and adapt to non-linear features. They introduce non-linear operations into neural networks. If we are careful as to which activation functions are selected and where we put them, they are very powerful operations that we can tell TensorFlow to fit and optimize.

Getting ready

When we start to use neural networks, we will use activation functions regularly because activation functions are an essential part of any neural network. The goal of an activation function is just to adjust weight and bias. In TensorFlow, activation functions are non-linear operations that act on tensors. They are functions that operate in a similar way to the previous mathematical operations. Activation functions serve many purposes, but the main concept is that they introduce a non-linearity into the graph while normalizing the outputs. Start a TensorFlow graph with the following commands:

import tensorflow as tf 
sess = tf.Session() 

How to do it...

The activation functions live in the neural network (nn) library in TensorFlow. Besides using built-in activation functions, we can also design our own using TensorFlow operations. We can import the predefined activation functions (import tensorflow.nn as nn) or be explicit and write nn in our function calls. Here, we choose to be explicit with each function call:

  1. The rectified linear unit, known as ReLU, is the most common and basic way to introduce non-linearity into neural networks. This function is just called max(0,x). It is continuous, but not smooth. It appears as follows:
print(sess.run(tf.nn.relu([-3., 3., 10.]))) 
[  0.  3.  10.] 
  1. There are times where we will want to cap the linearly increasing part of the preceding ReLU activation function. We can do this by nesting the max(0,x) function into a min() function. The implementation that TensorFlow has is called the ReLU6 function. This is defined as min(max(0,x),6). This is a version of the hard-sigmoid function and is computationally faster, and does not suffer from vanishing (infinitesimally near zero) or exploding values. This will come in handy when we discuss deeper neural networks in Chapter 8, Convolutional Neural Networks, and Chapter 9, Recurrent Neural Networks. It appears as follows:
print(sess.run(tf.nn.relu6([-3., 3., 10.]))) 
[ 0. 3. 6.]
  1. The sigmoid function is the most common continuous and smooth activation function. It is also called a logistic function and has the form . The sigmoid function is not used very often because of its tendency to zero-out the backpropagation terms during training. It appears as follows:
print(sess.run(tf.nn.sigmoid([-1., 0., 1.]))) 
[ 0.26894143  0.5         0.7310586 ] 
We should be aware that some activation functions are not zero-centered, such as the sigmoid. This will require us to zero-mean data prior to using it in most computational graph algorithms.
  1. Another smooth activation function is the hyper tangent. The hyper tangent function is very similar to the sigmoid except that instead of having a range between 0 and 1, it has a range between -1 and 1. This function has the form of the ratio of the hyperbolic sine over the hyperbolic cosine. Another way to write this is . This activation function is as follows:
print(sess.run(tf.nn.tanh([-1., 0., 1.]))) 
[-0.76159418  0.         0.76159418 ] 
  1. The softsign function also gets used as an activation function. The form of this function is . The softsign function is supposed to be a continuous (but not smooth) approximation to the sign function. See the following code:
print(sess.run(tf.nn.softsign([-1., 0., -1.]))) 
[-0.5  0.   0.5] 
  1. Another function, the softplus function, is a smooth version of the ReLU function. The form of this function is . It appears as follows:
print(sess.run(tf.nn.softplus([-1., 0., -1.]))) 
[ 0.31326166  0.69314718  1.31326163] 
The softplus function goes to infinity as the input increases, whereas the softsign function goes to 1. As the input gets smaller, however, the softplus function approaches zero and the softsign function goes to -1.
  1. The Exponential Linear Unit (ELU) is very similar to the softplus function except that the bottom asymptote is -1 instead of 0. The form is if x < 0 else x. It appears as follows:
print(sess.run(tf.nn.elu([-1., 0., -1.]))) 
[-0.63212055  0.          1.        ] 

How it works...

These activation functions are ways that we can introduce non-linearities in neural networks or other computational graphs in the future. It is important to note where in our network we are using activation functions. If the activation function has a range between 0 and 1 (sigmoid), then the computational graph can only output values between 0 and 1. If the activation functions are inside and hidden between nodes, then we want to be aware of the effect that the range can have on our tensors as we pass them through. If our tensors were scaled to have a mean of zero, we will want to use an activation function that preserves as much variance as possible around zero. This would imply that we want to choose an activation function such as the hyperbolic tangent (tanh) or the softsign. If the tensors are all scaled to be positive, then we would ideally choose an activation function that preserves variance in the positive domain.

There's more...

Here are two graphs that illustrate the different activation functions. The following graphs show the ReLU, ReLU6, softplus, exponential LU, sigmoid, softsign, and hyperbolic tangent functions:

Figure 3: Activation functions of softplus, ReLU, ReLU6, and exponential LU

Here, we can see four of the activation functions: softplus, ReLU, ReLU6, and exponential LU. These functions flatten out to the left of zero and linearly increase to the right of zero, with the exception of ReLU6, which has a maximum value of six:

Figure 4: Sigmoid, hyperbolic tangent (tanh), and softsign activation function

Here are the sigmoid, hyperbolic tangent (tanh), and softsign activation functions. These activation functions are all smooth and have a S n shape. Note that there are two horizontal asymptotes for these functions.

You have been reading a chapter from
TensorFlow Machine Learning Cookbook - Second Edition
Published in: Aug 2018
Publisher: Packt
ISBN-13: 9781789131680
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image