Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Machine Learning Using TensorFlow Cookbook

You're reading from   Machine Learning Using TensorFlow Cookbook Create powerful machine learning algorithms with TensorFlow

Arrow left icon
Product type Paperback
Published in Feb 2021
Publisher Packt
ISBN-13 9781800208865
Length 416 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (3):
Arrow left icon
Konrad Banachewicz Konrad Banachewicz
Author Profile Icon Konrad Banachewicz
Konrad Banachewicz
Luca Massaron Luca Massaron
Author Profile Icon Luca Massaron
Luca Massaron
Alexia Audevart Alexia Audevart
Author Profile Icon Alexia Audevart
Alexia Audevart
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Getting Started with TensorFlow 2.x 2. The TensorFlow Way FREE CHAPTER 3. Keras 4. Linear Regression 5. Boosted Trees 6. Neural Networks 7. Predicting with Tabular Data 8. Convolutional Neural Networks 9. Recurrent Neural Networks 10. Transformers 11. Reinforcement Learning with TensorFlow and TF-Agents 12. Taking TensorFlow to Production 13. Other Books You May Enjoy
14. Index

Implementing loss functions

For this recipe, we will cover some of the main loss functions that we can use in TensorFlow. Loss functions are a key aspect of machine learning algorithms. They measure the distance between the model outputs and the target (truth) values.

In order to optimize our machine learning algorithms, we will need to evaluate the outcomes. Evaluating outcomes in TensorFlow depends on specifying a loss function. A loss function tells TensorFlow how good or bad the predictions are compared to the desired result. In most cases, we will have a set of data and a target on which to train our algorithm. The loss function compares the target to the prediction (it measures the distance between the model outputs and the target truth values) and provides a numerical quantification between the two.

Getting ready

We will first start a computational graph and load matplotlib, a Python plotting package, as follows:

import matplotlib.pyplot as plt 
import TensorFlow as tf 

Now that we are ready to plot, let's proceed to the recipe without further ado.

How to do it...

First, we will talk about loss functions for regression, which means predicting a continuous dependent variable. To start, we will create a sequence of our predictions and a target as a tensor. We will output the results across 500 x values between -1 and 1. See the How it works... section for a plot of the outputs. Use the following code:

x_vals = tf.linspace(-1., 1., 500) 
target = tf.constant(0.) 

The L2 norm loss is also known as the Euclidean loss function. It is just the square of the distance to the target. Here, we will compute the loss function as if the target is zero. The L2 norm is a great loss function because it is very curved near the target and algorithms can use this fact to converge to the target more slowly the closer it gets to zero. We can implement this as follows:

def l2(y_true, y_pred):
    return tf.square(y_true - y_pred) 

TensorFlow has a built-in form of the L2 norm, called tf.nn.l2_loss(). This function is actually half the L2 norm. In other words, it is the same as the previous one but divided by 2.

The L1 norm loss is also known as the absolute loss function. Instead of squaring the difference, we take the absolute value. The L1 norm is better for outliers than the L2 norm because it is not as steep for larger values. One issue to be aware of is that the L1 norm is not smooth at the target, and this can result in algorithms not converging well. It appears as follows:

def l1(y_true, y_pred):
    return tf.abs(y_true - y_pred)

Pseudo-Huber loss is a continuous and smooth approximation to the Huber loss function. This loss function attempts to take the best of the L1 and L2 norms by being convex near the target and less steep for extreme values. The form depends on an extra parameter, delta, which dictates how steep it will be. We will plot two forms, delta1 = 0.25 and delta2 = 5, to show the difference, as follows:

def phuber1(y_true, y_pred):
    delta1 = tf.constant(0.25) 
    return tf.multiply(tf.square(delta1), tf.sqrt(1. +  
                        tf.square((y_true - y_pred)/delta1)) - 1.) 
def phuber2(y_true, y_pred):
    delta2 = tf.constant(5.) 
    return tf.multiply(tf.square(delta2), tf.sqrt(1. +  
                        tf.square((y_true - y_pred)/delta2)) - 1.) 

Now, we'll move on to loss functions for classification problems. Classification loss functions are used to evaluate loss when predicting categorical outcomes. Usually, the output of our model for a class category is a real-value number between 0 and 1. Then, we choose a cutoff (0.5 is commonly chosen) and classify the outcome as being in that category if the number is above the cutoff. Next, we'll consider various loss functions for categorical outputs.

To start, we will need to redefine our predictions (x_vals) and target. We will save the outputs and plot them in the next section. Use the following:

x_vals = tf.linspace(-3., 5., 500) 
target = tf.fill([500,], 1.)

Hinge loss is mostly used for support vector machines but can be used in neural networks as well. It is meant to compute a loss among two target classes, 1 and -1. In the following code, we are using the target value 1, so the closer our predictions are to 1, the lower the loss value:

def hinge(y_true, y_pred):
    return tf.maximum(0., 1. - tf.multiply(y_true, y_pred))

Cross-entropy loss for a binary case is also sometimes referred to as the logistic loss function. It comes about when we are predicting the two classes 0 or 1. We wish to measure a distance from the actual class (0 or 1) to the predicted value, which is usually a real number between 0 and 1. To measure this distance, we can use the cross-entropy formula from information theory, as follows:

def xentropy(y_true, y_pred):
    return (- tf.multiply(y_true, tf.math.log(y_pred)) -   
          tf.multiply((1. - y_true), tf.math.log(1. - y_pred))) 

Sigmoid cross-entropy loss is very similar to the previous loss function except we transform the x values using the sigmoid function before we put them in the cross-entropy loss, as follows:

def xentropy_sigmoid(y_true, y_pred):
    return tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true,  
                                                   logits=y_pred) 

Weighted cross-entropy loss is a weighted version of sigmoid cross-entropy loss. We provide a weight on the positive target. For an example, we will weight the positive target by 0.5, as follows:

def xentropy_weighted(y_true, y_pred):
    weight = tf.constant(0.5) 
    return tf.nn.weighted_cross_entropy_with_logits(labels=y_true,
                                                    logits=y_pred,  
                                                pos_weight=weight)

Softmax cross-entropy loss operates on non-normalized outputs. This function is used to measure a loss when there is only one target category instead of multiple. Because of this, the function transforms the outputs into a probability distribution via the softmax function and then computes the loss function from a true probability distribution, as follows:

def softmax_xentropy(y_true, y_pred):
    return tf.nn.softmax_cross_entropy_with_logits(labels=y_true,                                                    logits=y_pred)
    
unscaled_logits = tf.constant([[1., -3., 10.]]) 
target_dist = tf.constant([[0.1, 0.02, 0.88]])
print(softmax_xentropy(y_true=target_dist,                        y_pred=unscaled_logits))
[ 1.16012561] 

Sparse softmax cross-entropy loss is almost the same as softmax cross-entropy loss, except instead of the target being a probability distribution, it is an index of which category is true. Instead of a sparse all-zero target vector with one value of 1, we just pass in the index of the category that is the true value, as follows:

def sparse_xentropy(y_true, y_pred):
    return tf.nn.sparse_softmax_cross_entropy_with_logits(
                                                    labels=y_true,
                                                    logits=y_pred) 
unscaled_logits = tf.constant([[1., -3., 10.]]) 
sparse_target_dist = tf.constant([2]) 
print(sparse_xentropy(y_true=sparse_target_dist,  
                      y_pred=unscaled_logits))
[ 0.00012564] 

Now let's understand better how such loss functions operate by plotting them on a graph.

How it works...

Here is how to use matplotlib to plot the regression loss functions:

x_vals = tf.linspace(-1., 1., 500) 
target = tf.constant(0.) 
funcs = [(l2, 'b-', 'L2 Loss'),
         (l1, 'r--', 'L1 Loss'),
         (phuber1, 'k-.', 'P-Huber Loss (0.25)'),
         (phuber2, 'g:', 'P-Huber Loss (5.0)')]
for func, line_type, func_name in funcs:
    plt.plot(x_vals, func(y_true=target, y_pred=x_vals), 
             line_type, label=func_name)
plt.ylim(-0.2, 0.4) 
plt.legend(loc='lower right', prop={'size': 11}) 
plt.show()

We get the following plot as output from the preceding code:

Figure 2.1: Plotting various regression loss functions

Here is how to use matplotlib to plot the various classification loss functions:

x_vals = tf.linspace(-3., 5., 500)  
target = tf.fill([500,], 1.)
funcs = [(hinge, 'b-', 'Hinge Loss'),
         (xentropy, 'r--', 'Cross Entropy Loss'),
         (xentropy_sigmoid, 'k-.', 'Cross Entropy Sigmoid Loss'),
         (xentropy_weighted, 'g:', 'Weighted Cross Enropy Loss            (x0.5)')]
for func, line_type, func_name in funcs:
    plt.plot(x_vals, func(y_true=target, y_pred=x_vals), 
             line_type, label=func_name)
plt.ylim(-1.5, 3) 
plt.legend(loc='lower right', prop={'size': 11}) 
plt.show()

We get the following plot from the preceding code:

Figure 2.2: Plots of classification loss functions

Each of these loss curves provides different advantages to the neural network optimizing it. We are now going to discuss this a little bit more.

There's more...

Here is a table summarizing the properties and benefits of the different loss functions that we have just graphically described:

Loss function

Use

Benefits

Disadvantages

L2

Regression

More stable

Less robust

L1

Regression

More robust

Less stable

Pseudo-Huber

Regression

More robust and stable

One more parameter

Hinge

Classification

Creates a max margin for use in SVM

Unbounded loss affected by outliers

Cross-entropy

Classification

More stable

Unbounded loss, less robust

The remaining classification loss functions all have to do with the type of cross-entropy loss. The cross-entropy sigmoid loss function is for use on unscaled logits and is preferred over computing the sigmoid loss and then the cross-entropy loss, because TensorFlow has better built-in ways to handle numerical edge cases. The same goes for softmax cross-entropy and sparse softmax cross-entropy.

Most of the classification loss functions described here are for two-class predictions. This can be extended to multiple classes by summing the cross-entropy terms over each prediction/target.

There are also many other metrics to look at when evaluating a model. Here is a list of some more to consider:

Model metric

Description

R-squared (coefficient of determination)

For linear models, this is the proportion of variance in the dependent variable that is explained by the independent data. For models with a larger number of features, consider using adjusted R-squared.

Root mean squared error

For continuous models, this measures the difference between prediction and actual via the square root of the average squared error.

Confusion matrix

For categorical models, we look at a matrix of predicted categories versus actual categories. A perfect model has all the counts along the diagonal.

Recall

For categorical models, this is the fraction of true positives over all predicted positives.

Precision

For categorical models, this is the fraction of true positives over all actual positives.

F-score

For categorical models, this is the harmonic mean of precision and recall.

In your choice of the right metric, you have to both evaluate the problem you have to solve (because each metric will behave differently and, depending on the problem at hand, some loss minimization strategies will prove better than others for our problem), and to experiment with the behavior of the neural network.

You have been reading a chapter from
Machine Learning Using TensorFlow Cookbook
Published in: Feb 2021
Publisher: Packt
ISBN-13: 9781800208865
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image