REINFORCE Policy Gradient
The algorithm for REINFORCE policy gradient is given as follows:
- Initialize the network parameter with random values
- Generate some N number of trajectories following the policy
- Compute the return of the trajectory
- Compute the gradients
- Update the network parameter,
- Repeat steps 2 to 5 for several iterations