This chapter covers the basic principles of Reinforcement Learning and the fundamental Q-learning algorithm.
The distinctive feature of Q-learning is its capacity to choose between immediate rewards and delayed rewards. Q-learning at its simplest uses tables to store data. This very quickly loses viability as the state/action space of the system it is monitoring/controlling increases.
We can overcome this problem by using a neural network as a function approximator, which takes the state and action as input, and outputs the corresponding Q-value.
Following this idea, we implemented a Q-learning neural network using the TensorFlow framework and the OpenAI Gym toolkit for developing and comparing Reinforcement Learning algorithms.
Our journey into Deep Learning with TensorFlow ends here.
Deep learning is a very productive research area; there are many books, courses, and online resources that may help you...