In the previous scenario, we considered a simplistic case where there is a reward when the objective is achieved. In this scenario, we will complicate game by having negative rewards too. However, the objective remains the same: maximizing the reward in the given problem setting where the environment has both positive and negative rewards.
The optimal action to take in a state in a simulated game
Getting ready
The environment we are working on is as follows:
data:image/s3,"s3://crabby-images/633b2/633b2a6fc21a8ca167f1dbb236233b7d426586cd" alt=""
We start at the cell with S in it and our objective is to reach the cell where the reward is +1. In order to maximize the chances of achieving the reward, we will be using Bellman's equation, which calculates the value of each cell in the preceding grid as follows...