After the background on reinforcement learning that we provided in the previous chapter, we will go one step forward with GoPiGo3, making it not only perform perception tasks, but also trigger chained actions in sequence to achieve a pre-defined goal. That it is to say, it will have to decide what action to execute at every step of the simulation to achieve the goal. At the end of the execution of every action, it will be provided with a reward, which will show how good the decision was by the amount of reward given. After some training, this reinforcement will naturally drive its next decisions, improving the performance of the task.
For example, let's say that we set a target location and instruct the robot that it has to carry an object there. The way in which GoPiGo3 will be told that it is performing well is by giving it...