- Define a control process.
- What is the difference between a Markov chain and an MDP?
- What does it mean for a system to have the Markov property? Explain this in the context of memorylessness.
- Explain why the Taxi-v2 environment has 500 states. Describe the three state variables and enumerate the state space.
- Why are some states unreachable and why do we include them in our description of the state space?
- Describe a systematic way to choose the optimal hyperparameters for a Q-learning model.
- Why do we choose to decay epsilon, and how do we refer to the decision-making phenomenon that results?
- What type of environment will an alpha value of 1 be ideal for? What will an alpha value of 0 result in?
- What is one good reason to decay gamma? Why might you want a lower value for gamma toward the end of a simulation?
- Briefly describe the greedy strategy and give an example...