In Chapter 3, Markov Decision Process, we discussed the transition model of the environment, which follows the Markov property, and the concept of delayed rewards and value (or utility) functions. Well, in this chapter we take a look at the Markov decision process, learn about Q-learning, and a modified approach called the deep Q-network for generalizing in different environments.
We will cover the following topics in this chapter:
- Supervised and unsupervised learning for artificial intelligence
- Model based learning and model free learning
- Q-learning
- Deep Q-networks
- Monte Carlo tree search algorithm
- SARSA algorithm