Hyperparameter tuning and AutoML
The experiments defined above give some opportunities for fine-tuning a net. However, what works for this example will not necessarily work for other examples. For a given net, there are indeed multiple parameters that can be optimized (such as the number of hidden neurons, BATCH_SIZE
, number of epochs, and many more depending on the complexity of the net itself). These parameters are called "hyperparameters" to distinguish them from the parameters of the network itself, that is, the values of the weights and biases.
Hyperparameter tuning is the process of finding the optimal combination of those hyperparameters that minimize cost functions. The key idea is that if we have n hyperparameters, then we can imagine that they define a space with n dimensions and the goal is to find the point in this space that corresponds to an optimal value for the cost function. One way to achieve this goal is to create a grid in this space and systematically check the value assumed by the cost function for each grid vertex. In other words, the hyperparameters are divided into buckets and different combinations of values are checked via a brute force approach.
If you think that this process of fine-tuning the hyperparameters is manual and expensive, then you are absolutely right! However, during the last few years we have seen significant results in AutoML, a set of research techniques aiming at both automatically tuning hyperparameters and searching automatically for optimal network architecture. We will discuss more about this in Chapter 14, An introduction to AutoML.