Deep Q-Learning
Q-learning, introduced by Chris Watkins in 1989, is an algorithm to learn the value of an action in a particular state. Q-learning revolves around representing the expected rewards for an action taken in a given state.
The expected reward of the state-action combination is approximated by the Q function:
Q is initialized to a fixed value, usually at random. At each time step t, the agent selects an action and sees a new state of the environment as a consequence and receives a reward.
The value function Q can then be updated according to the Bellman equation as the weighted average of the old value and the new information:
The weight is by , the learning rate – the higher the learning rate, the more adaptive the Q-function. The discount factor is weighting the rewards by their immediacy – the higher , the more impatient (myopic) the agent becomes.
represents the current reward. is the reward obtained by weighted by...