Here's an updated version of the Bellman equation:
Compare it to the version we used in the last section:
In the new version, we've added in an alpha term, which means we need to include the current Q-value of the state-action pair and discount it by the alpha value.
The first equation is telling us that the new Q-value (the right side of the equation) of our state-action pair is equal to the old Q-value plus the current reward and the discounted future reward, minus the old Q-value multiplied by the alpha term. Because the alpha value is relatively small, more of the current Q-value is incorporated into the new Q-value. In both versions of the equation, because the gamma value is also relatively small, current rewards are valued more highly than future rewards.
Notice that, if the alpha value is 1, the first...