Reinforcement learning is the last of the three most broad categories of machine learning. We have already studied supervised learning and unsupervised learning. Reinforcement learning is the third broad category and differs from the other two types in significant ways. Reinforcement learning neither trains on labeled data nor adds labels to data. Instead, it seeks to find an optimal solution for an agent to receive the highest reward.
The environment is the space where the agent completes its task. In our case, the environment will be the 3 x 3 grid used to play the game tic-tac-toe. The agent performs tasks within the environment. In this case, the agent places the X's or O's on the grid. The environment also contains rewards and penalties—that is, the agent needs to be rewarded for certain actions and penalized...