In any RL formalization, we talk in terms of a state space and an action space. Action space is a set of finite numbers of actions that can be taken by the agent, represented by A. State space is a finite set of states that the environment can be in, represented by S.
The goal of the agent is to learn a policy, denoted by . A policy can be deterministic or stochastic. A policy basically represents the model, using which the agent to select the best action to take. Thus, the policy maps the rewards and observations received from the environment to actions.
When an agent follows a policy, it results in a sequence of state, action, reward, state, and so on. This sequence is known as a trajectory or an episode.
An important component of reinforcement learning formalizations is the return. The return is the estimate of the total...