An introduction to RL
What is common between a baby learning to walk, birds learning to fly, and an RL agent learning to play an Atari game? Well, all three involve:
- Trial and error: The child (or the bird) tries various ways, fails many times, and succeeds in some ways before it can really walk (or fly). The RL agent plays many games, winning some and losing many, before it can become reliably successful.
- Goal: The child has the goal to walk, the bird to fly, and the RL agent to win the game.
- Interaction with the environment: The only feedback they have is from their environment.
So, the first questions that arise are what is RL, and how is it different from supervised and unsupervised learning? Anyone who owns a pet knows that the best strategy to train a pet is rewarding it for desirable behavior and disciplining it for bad behavior. RL, also called learning with a critic, is a learning paradigm where the agent learns in the same manner. The agent...