In the previous chapters, we learned about mapping input to a target—where, the input and output values are provided. In this chapter, we will be learning about reinforcement learning, where the objective that we want to achieve and the environment that we operate in are provided, but not any input or output mapping. The way in which reinforcement learning works is that we generate input values (the state in which the agent is) and the corresponding output values (the reward the agent achieves for taking certain actions in a state) by taking random actions at the start and gradually learning from the generated input data (actions in a state) and output values (rewards achieved by taking certain actions).
In this chapter, we will cover the following:
- The optimal action to take in a simulated game with a non-negative reward
- The optimal action to take...