Cumulative discounted rewards
For an agent to maximize the cumulative reward, one method to think about is to maximize the reward at each time step. Doing this may have a negative effect because maximizing the reward in an initial time step might lead to the agent failing in the future quite quickly. Let's take an example of a walking robot. Assuming the speed of the robot is a factor in the reward, if the robot maximizes its speed at every time step, it might destabilize it and make it fall sooner.
We are training the robot to walk; thus, we can conclude that the agent cannot just focus on the current time step to maximize the reward; it needs to take all time steps into consideration. This would be the case with all reinforcement learning problems. Actions may have short- or long-term effects and the agent needs to understand the complexity of the action, and the effects that come from it from the environment.
In the preceding case, if the agent will learn that it cannot move...