Tweaking wrappers
The final step in our sequence of experiments will be tweaking wrappers applied to the environment. This is very easy to overlook, as wrappers are normally written once or just borrowed from other code, applied to the environment, and left to sit there. But you should be aware of their importance in terms of the speed and convergence of your method. For example, the normal DeepMind-style stack of wrappers applied to an Atari game looks like this:
-
NoopResetEnv: Applies a random amount of NOOP operations to the game reset. In some Atari games, this is needed to remove weird initial observations.
-
MaxAndSkipEnv: Applies max to N observations (four by default) and returns this as an observation for the step. This solves the “flickering” problem in some Atari games, when the game draws different portions of the screen on even and odd frames (a normal practice among...