Q-learning
In reinforcement learning, we want the Q-function Q(s,a) to predict the best action for a state s in order to maximize the future reward. The Q-function is estimated using Q-learning, which involves the process of updating the Q-function using Bellman equations through a series of iterations as follows:
![](https://static.packt-cdn.com/products/9781788835725/graphics/adc2cc36-6897-4ea5-9687-4af146ff33d9.png)
Here:
Q(s,a) = Q value for the current state s and action a pair
![](https://static.packt-cdn.com/products/9781788835725/graphics/f50f0952-a56d-4d8c-aada-389430b01b6e.png)
 = learning rate of convergence
![](https://static.packt-cdn.com/products/9781788835725/graphics/f164f099-be2a-4190-89e5-4ee3ca43e107.png)
 = discounting factor of future rewards
Q(s',a') = Q value for the state action pair at the resultant state s' after action a was taken at state s
R = refers to immediate reward
![](https://static.packt-cdn.com/products/9781788835725/graphics/6b1140f7-fd02-4b87-ae0d-7eaccae56188.png)
 = future reward
In simpler cases, where state space and action space are discrete, Q-learning is implemented using a Q-table, where rows represent the states and columns represent the actions.Â
Steps involved in Q-learning are as follows:
- Initialize Q-table randomly
- For each episode, perform the following steps:
- For the given state s, choose action a from the Q-table
- Perform action a
- Reward R and state s' is observed
- Update...