TDL was introduced by the father of RL himself, Dr. Richard Sutton, in 1988. Sutton had developed the method as an improvement to MC/DP but, as we will see, the method itself led to the development of Q-learning by Chris Watkins in 1989. The method itself is model-free and does not require episode completion before an agent learns. This makes this method very powerful for exploring unknown environments in real time, as we will see.
Before we get into discovering the updated mathematics to this approach, it may be helpful to look at the backup diagrams of all of the methods covered so far in the next section.