Temporal difference learning
Temporal Difference (TD) learning is the central and novel theme of reinforcement learning. TD learning is the combination of both Monte Carlo (MC) and Dynamic Programming (DP) ideas. Like Monte Carlo methods, TD methods can learn directly from the experiences without the model of environment. Similar to Dynamic Programming, TD methods update estimates based in part on other learned estimates, without waiting for a final outcome, unlike MC methods, in which estimates are updated after reaching the final outcome only.
Comparison between Monte Carlo methods and temporal difference learning
Though Monte-Carlo methods and Temporal Difference learning have similarities, there are inherent advantages of TD-learning over Monte Carlo methods.
Monte Carlo methods | Temporal Difference learning |
MC must wait until the end of the episode before the return is known. | TD can learn online after every step and does not need to wait until the end of episode. |
MC has high variance and low... |