Optimizing the parameters with HPO
Hyperparameters are the only parameters that deep learning models cannot learn during training. So, we need to follow a way to specify them. The first way is usually to follow common practices in the literature. For example, the typical epoch value for fine-tuning a BERT model is around 3
. One strategy can be manually adjusting these parameters while monitoring model performance – in particular, monitoring validation loss.
Furthermore, it is possible to systematically determine the optimal set of parameters using certain algorithms. One approach to achieve this is to explore all possibilities by searching over a predefined set of hyperparameters, a naive process known as grid search. However, a grid search-like model is not always practical and can be tedious since training a single Transformer model is a lengthy procedure. Alternatively, a random search can be preferred. It achieves similar results with specific randomization while reducing...