RL problems feature several elements that set it apart from the ML settings we have covered so far. The following two sections outline the key features required for defining and solving an RL problem by learning a policy that automates decisions. They use the notation and generally follow Reinforcement Learning: An Introduction (http://incompleteideas.net/book/RLbook2018.pdf) by Richard Sutton and Andrew Barto (2018), and David Silver's UCL lectures (http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html), both of which are recommended for further study beyond the brief summary that the scope of this chapter permits.
RL problems aim to optimize an agent's decisions based on an objective function vis-a-vis an environment. The environment presents information about its state to the agent, assigns rewards for actions, and transitions the agent to new...