To avoid load problems and computational difficulties, the agent-environment interaction is considered a Markov decision process (MDP). An MDP is a discrete time stochastic control process.
Stochastic processes are mathematical models that are used to study the evolution of phenomena following random or probabilistic laws. It is known that in all natural phenomena, both by their very nature and by observational errors, a random or accidental component is present.
This component causes the following: at every instance of t, the result of the observation of the phenomenon is a random number or random variable, st. It is not possible to predict with certainty what the result will be; you can only state that it will take one of several possible values, each of which has a given probability.
A stochastic process is called Markovian when, having chosen a...