How to solve reinforcement learning problems
RL methods aim to learn from experience how to take actions that achieve a long-term goal. To this end, the agent and the environment interact over a sequence of discrete time steps via the interface of actions, state observations, and rewards described in the previous section.
Key challenges in solving RL problems
Solving RL problems requires addressing two unique challenges: the credit-assignment problem and the exploration-exploitation trade-off.
Credit assignment
In RL, reward signals can occur significantly later than actions that contributed to the result, complicating the association of actions with their consequences. For example, when an agent takes 100 different positions and trades repeatedly, how does it realize that certain holdings performed much better than others if it only learns about the portfolio return?
The credit-assignment problem is the challenge of accurately estimating the benefits and costs...