How well will the scalable version of evolution strategies perform in the LunarLander environment? Let's find out!
As you may recall, we already used LunarLander against A2C and REINFORCE in Chapter 6, Learning Stochastic and PG optimization. This task consists of landing a lander on the moon through continuous actions. We decided to use this environment for its medium difficulty and to compare the ES results to those that were obtained with A2C.
The hyperparameters that performed the best in this environment are as follows:
Hyperparameter | Variable name | Value |
Neural network size | hidden_sizes | [32, 32] |
Training iterations (or generations) | number_iter | 200 |
Worker's number | num_workers | 4 |
Adam learning rate | lr | 0.02 |
Individuals per worker | indiv_per_worker | 12 |
Standard deviation | std_noise | 0.05 |
The results are shown in the...