Implementing the SARSA algorithm and an RL agent
This recipe will show you how to implement the State-Action-Reward-State-Action (SARSA) algorithm, as well as how to develop and train an agent using the SARSA algorithm so that it can act in a reinforcement learning environment. The SARSA algorithm can be applied to model-free control problems and allows us to optimize the value function of an unknown MDP.
Upon completing this recipe, you will have a working RL agent that, when acting in the GridworldV2 environment, will generate the following state-action value function using the SARSA algorithm:
Getting ready
To complete this recipe, you will need to activate the tf2rl-cookbook
Python/conda virtual environment and run pip install -r requirements.txt
. If the following import statements run...