You're reading from Java Deep Learning Cookbook Train neural networks for classification, NLP, and reinforcement learning using Deeplearning4j

Product type Paperback

Published in Nov 2019

Publisher Packt

ISBN-13 9781788995207

Length 304 pages

Edition 1st Edition

Languages

Java

Tools

Deeplearning4j

Concepts

Deep Learning

Author (1):

Rahul Raj

View More author details

Combating overfitting problems

As we know, overfitting is a major challenge that machine learning developers face. It becomes a big challenge when the neural network architecture is complex and training data is huge. While mentioning overfitting, we're not ignoring the chances of underfitting at all. We will keep overfitting and underfitting in the same category. Let's discuss how we can combat overfitting problems.

The following are possible reasons for overfitting, including but not limited to:

Too many feature variables compared to the number of data records
A complex neural network model

Self-evidently, overfitting reduces the generalization power of the network and the network will fit noise instead of a signal when this happens. In this recipe, we will walk through key steps to prevent overfitting problems.

How to do it...

Use KFoldIterator to perform k-fold cross-validation-based resampling:

KFoldIterator kFoldIterator = new KFoldIterator(k, dataSet);

Construct a simpler neural network architecture.
Use enough train data to train the neural network.

How it works...

In step 1, k is the arbitrary number of choice and dataSet is the dataset object that represents your training data. We perform k-fold cross-validation to optimize the model evaluation process.

Complex neural network architectures can cause the network to tend to memorize patterns. Hence, your neural network will have a hard time generalizing unseen data. For example, it's better and more efficient to have a few hidden layers rather than hundreds of hidden layers. That's the relevance of step 2.

Fairly large training data will encourage the network to learn better and a batch-wise evaluation of test data will increase the generalization power of the network. That's the relevance of step 3. Although there are multiple types of data iterator and various ways to introduce batch size in an iterator in DL4J, the following is a more conventional definition for RecordReaderDataSetIterator:

public RecordReaderDataSetIterator(RecordReader recordReader,
 WritableConverter converter,
 int batchSize,
 int labelIndexFrom,
 int labelIndexTo,
 int numPossibleLabels,
 int maxNumBatches,
 boolean regression)

There's more...

When you perform k-fold cross-validation, data is divided into k number of subsets. For every subset, we perform evaluation by keeping one of the subsets for testing and the remaining k-1 subsets for training. We will repeat this k number of times. Effectively, we use the entire data for training with no data loss, as opposed to wasting some of the data on testing.

Underfitting is handled here. However, note that we perform the evaluation k number of times only.

When you perform batch training, the entire dataset is divided as per the batch size. If your dataset has 1,000 records and the batch size is 8, then you have 125 training batches.

You need to note the training-to-testing ratio as well. According to that ratio, every batch will be divided into a training set and testing set. Then the evaluation will be performed accordingly. For 8-fold cross-validation, you evaluate the model 8 times, but for a batch size of 8, you perform 125 model evaluations.

Note the rigorous mode of evaluation here, which will help to improve the generalization power while increasing the chances of underfitting.