RL methods aim to learn from experience on how to take actions that achieve a long-term goal. To this end, the agent and the environment interact over a sequence of discrete time steps via the interface of actions, state observations, and rewards that we described in the previous section.
How to solve RL problems
Key challenges in solving RL problems
Solving RL problems requires us to address two unique challenges: the credit assignment problem and the exploration-exploitation trade-off.
Credit assignment
In RL, reward signals can occur significantly later than actions...