Q-learning
In reinforcement learning, we want the Q-function Q(s,a) to predict the best action for a state s in order to maximize the future reward. The Q-function is estimated using Q-learning, which involves the process of updating the Q-function using Bellman equations through a series of iterations as follows:
Here:
Q(s,a) = Q value for the current state s and action a pair
= learning rate of convergence
= discounting factor of future rewards
Q(s',a') = Q value for the state action pair at the resultant state s' after action a was taken at state s
R = refers to immediate reward
= future reward
In simpler cases, where state space and action space are discrete, Q-learning is implemented using a Q-table, where rows represent the states and columns represent the actions.
Steps involved in Q-learning are as follows:
- Initialize Q-table randomly
- For each episode, perform the following steps:
- For the given state s, choose action a from the Q-table
- Perform action a
- Reward R and state s' is observed
- Update...