Problems with Gradient-Based Methods
In this section, you will learn about the differences between value-based and policy-based methods and the use of gradient-based methods in policy search algorithms. You will then examine the advantages and disadvantages of using gradient-based methods in policy-based approaches and implement stochastic gradient descent using TensorFlow to solve a cubic function with two unknowns.
There are two approaches when doing RL: value-based and policy-based. These approaches are used to solve complex decision problems related to Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs). Value-based approaches rely on identifying and deriving the optimal policy based on the identification of the optimal value function. Algorithms such as Q-learning or SARSA(λ) are included within this category, and for tasks involving lookup tables, their implementation leads to convergence on a return that is optimal, globally...