So far, we have explored many different ways of learning from observed data. Even generative algorithms are, after all, based on a dataset that is used to create a very generic representation of the data that has been used to train them.
Now we are going to examine a completely different learning paradigm, which doesn't need any training dataset or output label: reinforcement learning (RL). RL operates using a different paradigm; the main difference is that, with RL, we want to explore different solutions and, in a certain way, it's the algorithm itself that creates its own dataset.
These learning paradigms appear to be more similar to general human intelligence. This is because most of our learning does not come from explicit learning and clear labels, but by trial and error or generalization.
In this chapter, we will present an overview of the...