REINFORCE Policy Gradient
The algorithm for REINFORCE policy gradient is given as follows:
- Initialize the network parameter
with random values
- Generate some N number of trajectories
following the policy
- Compute the return of the trajectory
- Compute the gradients
- Update the network parameter,
- Repeat steps 2 to 5 for several iterations