The steps involved in typical RL algorithm are as follows:
- First, the agent interacts with the environment by performing an action
- The agent performs an action and moves from one state to another
- And then the agent will receive a reward based on the action it performed
- Based on the reward, the agent will understand whether the action was good or bad
- If the action was good, that is, if the agent received a positive reward, then the agent will prefer performing that action or else the agent will try performing an other action which results in a positive reward. So it is basically a trial and error learning process