Tweaking observations
Our first series of attempts will be in feeding more information to the agent. Here, I will just briefly introduce the changes made and effect they had on a training result. You can find the full example in Chapter13/train_preproc.py.
Tracking visited rooms
First, you will notice that our agent has no idea whether the current room was already visited or not. In situations when the agent already knows the optimal way to the goal, it might be not needed (as generated games always have different rooms). But if the policy is not perfect, it might be useful to have a clear indication that we’re visiting the same room over and over again.
To feed this knowledge into the observation, I implemented a simple room tracking in the preproc.LocationWrapper class, which tracks visited rooms over the episode. Then this flag is concatenated to the agent’s observation as a single 1 if the room was visited before or 0 if it is a...