"For the things we have to learn before we can do them, we learn by doing them."
– Aritstotle
Be sure to complete the following questions or exercises on your own:
- Extend the bandit cube maze in the last section with your own design. Make sure to keep all the cubes connected so that the agent has a clear path to the end.
- Think of another problem in gaming, simulation, or otherwise, where you could use RL and the Q-Learning algorithm to help an agent learn to solve this problem. This is just a thought exercise, but give yourself a huge pat on the back if you build a demo.
- Add new properties for the Exploration Epsilon minimum and the amount of change per decision step. Remember, these are the parameters we hard-coded in order to decrease the epsilon-greedy exploration value.
- Add the ability to show the Q values on the individual BanditCube objects. If...