Thus far, our discussion of RL has looked at simpler techniques for building agents with bandits and Q-learning. Q-learning is a popular algorithm, and as we learned, deep Q neural networks provide us with a great foundation to use to solve more difficult problems, such as a cart balancing a pole. The following table summarizes the various RL algorithms, what conditions they are capable of working in, and how they function:
Algorithm | Model | Policy | Action | Observation | Operator |
Q-Learning | Model-free | Off-policy | Discrete | Discrete | Q value |
SARSA – State Action Reward State Action | Model-free | On-policy | Discrete | Discrete | Q value |
DQN – Deep Q Network | Model-free | Off-policy | Discrete | Continuous | Q value |
DDPG – Deep Deterministic Policy Gradient | Model-free | Off-policy | Continuous | Continuous | Q value |
TRPO – Trust Region Policy Optimization... |