GA on Cheetah
In our final example in this chapter, we'll implement the parallelized deep GA on the HalfCheetah environment. The complete code is in Chapter16/04_cheetah_ga.py
. The architecture is very close to the parallel ES version, with one master process and several workers. The goal of every worker is to evaluate the batch of networks and return the result to the master, which merges partial results into the complete population, ranks the individuals according to the obtained reward and generates the next population to be evaluated by the workers.
Every individual is encoded by a list of random seeds used to initialize the initial network weights and all subsequent mutations. This representation allows very compact encoding of the network, even when the number of parameters in the policy is not very large. For example, in our network with two hidden layers of 64 neurons, we have 6278 float values (the input is 26 values and the action is six floats). Every float occupies 4 bytes...