TD Prediction
The algorithm for the TD prediction method is given as follows:
- Initialize the value function V(s) with random values. A policy is given.
- For each episode:
- Initialize the state s
- For each step in the episode:
- Perform an action a in the state s according to given policy , get the reward r, and move to the next state
- Update the value of the state to
- Update (this step implies we are changing the next state to the current state s)
- If s is not a terminal state, repeat steps 1 to 4