- Imagination in an agent specifies visualizing and planning before taking any action.
- Imagination core consists of policy network and environmental model for performing imagination.
- Agents repeatedly take feedback from the human and change its goal according to the human preference.
- DQfd uses some demonstration data for training where as DQN doesn't use any demonstrations data upfront.
- Refer section Hindsight Experience Replay (HER).
- Hierarchical reinforcement learning (HRL) is proposed to solve the curse of dimensionality where we decompress large problems into small subproblems in a hierarchy
- We tried to find the optimal policy given the reward function in RL whereas in inverse reinforcement learning, the optimal policy is given and we find the reward function