- Come up with a situation around you that can be defined as an MDP problem.
- Do you think we can use the state-value function to solve the MDP problem in the same way as we use the action-value function?
- Explore the active-function result by changing the following hyperparameters for the four-states MDP introduced here:
- Discount ratio
- Learning rate
- Reward in the transition from state 2 to 3
- Use the following policy for the four-states MDP introduced in the chapter:
- Always choose action 1.
- Always choose action 2.
- Choosing the action maximizing the action value.
- Try to run the CartPole example in the example code and see how the behavior is changed.