MaxEnt Inverse Reinforcement Learning
The algorithm for maximum entropy inverse reinforcement learning is given as follows:
- Initialize the parameter and gather the expert demonstrations
- For N number of iterations:
- Compute the reward function
- Compute the policy using the value iteration with the reward function obtained in the previous step
- Compute the state visitation frequency using the policy obtained in the previous step
- Compute the gradient with respect to , that is,
- Update the value of as