In the previous section, we manually calculated the state-action values for all combinations. Technically, now that we have calculated the various state-action values we need, we can now identify the action that will be taken in every state. However, in the case of a more complex scenario – for example, when playing video games – it gets tricky to fetch state information. OpenAI's Gym environment comes in handy in this scenario. It contains a pre-defined environment for the game we're playing. Here, it fetches the next state information, given an action that's been taken in the current state. So far, we have considered the scenario of choosing the most optimal path. However, there can be scenarios where we are stuck at the local minima.
In this section, we will learn about Q-learning, which helps with calculating the value associated with the action in a state, as well as about leveraging the Gym environment so that we can play various...