DP is a general algorithmic paradigm that breaks up a problem into smaller chunks of overlapping subproblems, and then finds the solution to the original problem by combining the solutions of the subproblems.
DP can be used in reinforcement learning and is among one of the simplest approaches. It is suited to computing optimal policies by being provided with a perfect model of the environment.
DP is an important stepping stone in the history of RL algorithms and provides the foundation for the next generation of algorithms, but it is computationally very expensive. DP works with MDPs with a limited number of states and actions as it has to update the value of each state (or action-value), taking into consideration all the other possible states. Moreover, DP algorithms store value functions in an array or in a table. This way of storing information is effective...