Imagine that you want to learn to ride a bike and ask a friend for advice. They explain how the gears work, how to release the brake and a few other technical details. In the end, you ask the secret to keeping balanced. What kind of answer do you expect? In an imaginary supervised world, you should be able to perfectly quantify your actions and correct the errors by comparing the outcomes with precise reference values. In the real world, you have no idea about the quantities underlying your actions and, above all, you will never know what the right value is. Increasing the level of abstraction, the scenario we're considering can be described as: a generic agent performs actions inside an environment and receives feedback that is somehow proportional to the competence of its actions.
According to this feedback, the agent can correct...