The TD method can learn the Q-function during an episode but is not scalable. For example, the number of states in a chess game is around 1,040, and 1,070 in a Go game. Moreover, it seems infeasible to learn the values for continuous state using the TD method. Hence, we need to solve such problems using function approximation (FA), which approximates the state space using a set of features.
In this first recipe, we will begin by getting familiar with the Mountain Car environment, which we will solve with the help of FA methods in upcoming recipes.
Mountain Car (https://gym.openai.com/envs/MountainCar-v0/) is a typical Gym environment with continuous states. As shown in the following diagram, its goal is to get the car to the top of the hill:
On a one-dimensional track, the car is positioned between -1.2 (leftmost) and 0.6 (rightmost...