- Why do we choose to use the words state and observation interchangeably? When would be a more appropriate time to use the word state?
- How do we know when the Q-function has converged?
- What happens to the Q-table when the Q-function has converged?
- When do we know the agent has found the optimal path to the goal? Describe in terms of the previous two questions.
- What does numpy.argmax() return?
- What does numpy.max() return?
- Why does the randomly-acting agent take thousands of time steps to reach the goal? How does the Q-learning agent perform better?
- Describe one benefit of decaying alpha.
- What is overfitting and how does it apply in the context of an RL model?
- By what order of magnitude does the number of time steps needed to reach the goal reduce when the number of training episodes is multiplied by 10? Give a general response to this; there may be multiple valid...




















































