RL is a general-purpose framework for artificial intelligence. It is used to solve sequential decision-making problems. In RL, the computer is given a goal to achieve and it learns how to accomplish that goal by learning from interactions with its environment. A typical RL setup consists of five components, known as the Agent, Environment, Action, State, and Reward.
In RL, an agent interacts with the environment using an action from a set of Actions (A). Based on the action taken by the agent, the environment transitions from an initial state to a new state, where each state belongs to a set of States within the environment. The transition generates a feedback Reward signal (a scalar quantity) from the environment. The reward is an estimate of the agent's performance, and the reward value depends on the current state and the action...