In this project, we are not interested in developing a heuristic (a still valid approach to solving many problems in artificial intelligence) or constructing a working PID. We intend instead to use deep learning to provide an agent with the necessary intelligence to operate a Lunar Lander video game session successfully.
Reinforcement learning theory offers a few frameworks to solve such problems:
- Value-based learning: This works by figuring out the reward or outcome from being in a certain state. By comparing the reward of different possible states, the action leading to the best state is chosen. Q-learning is an example of this approach.
- Policy-based learning: Different control policies are evaluated based on the reward from the environment. It is decided upon the policy achieving the best results.
- Model-based learning...