Meta-reinforcement learning as partially observed reinforcement learning
Another approach in meta-RL is to focus on the partially observable nature of the tasks and explicitly estimate the state from the observations until that point in time:
And then form a probability distribution over possible tasks based on their likelihood of being active in that episode, or more precisely, some vector that contains the task information:
Then, iteratively sample a task vector from this probability distribution and pass that to the policy in addition to state:
- Sample ,
- Take actions from a policy that receives state and task vector as input, .
With that, we conclude our discussion on three main meta-RL methods. Before we wrap up the chapter, let's discuss some of the challenges in meta-RL.