In this section, we'll be going through how a Gym environment works, and some of the functions and environment variables that you'll be making use of.
Note that we will sometimes use the words state and observation interchangeably. We use state as the conventional term referring to the current condition of a finite-state machine, including the MDPs that we are representing and solving with Q-learning, and we use observation when it is the term that is used in the Gym package itself, as in the following example:
Both state and observation in this context refer to the same thing.