Reinforcement learning versus supervised learning
A lot of current research is focused on supervised learning. RL might seem a bit like supervised learning, but it is not. The process of supervised learning refers to learning from labeled samples. While this is a useful technique, it is not enough to start learning from interactions. When we want to design a machine to navigate unknown terrains, this kind of learning is not going to help us. We don't have training samples available beforehand.
We need an agent that can learn from its own experience by interacting with the unknown terrain. This is where RL really shines.
Let's consider the exploration stage when the agent is interacting with the new environment in order to learn. How much can it explore? At this point, the agent doesn't know how big the environment is, and in many cases, it won't be able to explore all the possibilities. So, what should the agent do? Should it learn from its limited experience...