Building value-based reinforcement learning agent algorithms
Value-based reinforcement learning works by learning the state-value function or the action-value function in a given environment. This recipe will show you how to create and update the value function for the Maze environment to obtain an optimal policy. Learning value functions, especially in model-free RL problems where a model of the environment is not available, can prove to be quite effective, especially for RL problems with low-dimensional state space.
Upon completing this recipe, you will have an algorithm that can generate the following optimal action sequence based on value functions:
Let's get started.
Getting ready
To complete this recipe, you will need to activate the tf2rl-cookbook
Python/conda virtual environment and run pip install numpy...