So far, we've discussed supervised learning and unsupervised learning techniques. The third pillar of machine learning is reinforcement learning (RL). In reinforcement learning, the task isn't supervised nor unsupervised. Specifically, in RL, an agent has an end goal when receiving observations, but it doesn't receive feedback from the environment at every step. Instead, the agent gets positive or negative rewards only after a certain number of steps. This is interesting, because one could argue that, for some tasks, this is the same way humans learn. What makes this type of problem more complicated than normal supervised learning problems is that we don't explicitly now which action in one of the previous steps caused the desired reward. This is called the credit assignment problem.
Reinforcement learning is a hot topic nowadays because...