Now, that we understand the concept of meta learning, we can move on to meta reinforcement learning. Meta-RL—or RL^2 (RL Squared), as it has been called—is quickly evolving, but the additional complexity still makes this method currently inaccessible. While the concept is very similar to vanilla meta, it still introduces a number of subtle nuances for RL. Some of these can be difficult to understand, so hopefully the following diagram can help. It was taken from a paper titled Reinforcement Learning, Fast and Slow by Botvinick, et al. 2019 (https://www.cell.com/action/showPdf?pii=S1364-6613%2819%2930061-0):
In the diagram, you can see that familiar inner and outer loops that are characteristic of meta learning. This means that we also go from evaluating a policy for any observed state to also now...