Introduction
What is common between a baby learning to walk, birds learning to fly, or an RL agent learning to play an Atari game? Well, all three involve:
- Trial and error: The child (or the bird) tries various ways, fails many times, and succeeds in some ways before it can really stand (or fly). The RL Agent plays many games, winning some and losing many, before it can become reliably successful.
- Goal: The child has the goal to stand, the bird to fly, and the RL agent has the goal to win the game.
- Interaction with the environment: The only feedback they have is from their environment.
So, the first question that arises is what is RL and how is it different from supervised and unsupervised learning? Anyone who owns a pet knows that the best strategy to train a pet is rewarding it for desirable behavior and punishing it for bad behavior. RL, also called learning with a critic, is a learning paradigm where the agent learns in the same manner. The agent here corresponds...