Managing training
In this section, we will go through some of the common challenges that you may encounter while managing the training of DL models. This includes troubleshooting in terms of saving model parameters and debugging the model logic efficiently.
Saving model hyperparameters
There is often a need to save the model's hyperparameters. A few reasons are reproducibility, consistency, and that some models' network architecture are extremely sensitive to hyperparameters.
On more than one occasion, you may find yourself being unable to load the model from the checkpoint. The load_from_checkpoint
method of the LightningModule
class fails with an error.
Solution
A checkpoint is nothing more than a saved state of the model. Checkpoints contain precise values of all parameters used by the model. However, hyperparameter arguments passed to the __init__
model are not saved in the checkpoint by default. Calling self.save_hyperparameters
inside __init__
of the...