Markov models
The problem is set up as a reinforcement learning problem, with a trial and error method. The environment is described using state_values
state_values (?)
, and the state_values
are changed by actions. The actions are determined by an algorithm, based on the current state_value
, in order to achieve a particular state_value
that is termed a Markov model. In an ideal case, the past state_values
does have an influence on future state_values
, but here, we assume that the current state_value
 has all of the previous state_values
encoded. There are two types of state_values
; one is observable, and the other is non-observable. The model has to take non-observable state_values
into account, as well. That is called a Hidden Markov model.
CartPole
At each step of the cart and pole, several variables can be observed, such as the position, velocity, angle, and angular velocity. The possible state_values
of the cart are moved right and left:
state_values
: Four dimensions of continuous values...