Use these questions and exercises to reinforce the material you just learned. The exercises may be fun to attempt, so be sure to try atleast two to four questions/exercises:
Questions:
- What are the names of the main components of an RL system? Hint, the first one is Environment.
- Name the four elements of an RL system. Remember that one element is optional.
- Name the three main threads that compose modern RL.
- What makes a Markov state a Markov property?
- What is a policy?
Exercises:
- Using Chapter_1_2.py, alter the code so the agent pulls from a bandit with 1,000 arms. What code changes do you need to make?
- Using Chapter_1_3.py, alter the code so that the agent pulls from the average value, not greedy/max. How did this affect the agent's exploration?
- Using Chapter_1_3.py, alter the learning_rate variable to determine how fast or slow you can make the agent learn. How few episodes are you required to run for the agent to solve the problem?
- Using Chapter_1_5.py, alter the code so that the agent uses a different policy (either the greedy policy or something else). Take points off yourself if you look ahead in this book or online for solutions.
- Using Chapter_1_4.py, alter the code so that the bandits are connected. Hence, when an agent pulls an arm, they receive a reward and are transported to another specific bandit, no longer at random. Hint: This likely will require a new destination table to be built and you will now need to include the discounted reward term we removed.
Even completing a few of these questions and/or exercises will make a huge difference to your learning this material. This is a hands-on book after all.