Reinforcement learning is a type of machine learning where the agent learns to act in the current environment by predicting a reward (or outcome) based on feedback from cumulative past reward signals. Q-learning, introduced by Christopher Watkins in the paper titled Learning from Delayed Rewards, is one of the most popular algorithms in reinforcement learning. The Q means quality—this is the value of a given action in generating a reward:
- At each learning state, the Q table stores the value of the state, action, and corresponding reward.
- The agent searches through the Q table to make the next action that maximizes the long-term cumulative reward.
- Reinforced learning differs from supervised and unsupervised learning in one key way: it does not require input labels (supervised) or an underlying structure (unsupervised) to classify objects...