Let's evaluate our newly acquired knowledge by answering these questions:
- How does RL differ from other ML paradigms?
- What is called the environment in the RL setting?
- What is the difference between a deterministic and a stochastic policy?
- What is an episode?
- Why do we need a discount factor?
- How does the value function differ from the Q function?
- What is the difference between deterministic and stochastic environments?