Setting up the solution
We will call the act of setting the motors to a different position an action, and we will call the position of the robot arm and hand the state. An action applied to a state results in the arm being in a new state.
We are going to have the robot associate states (a beginning position of the hand) and an action (the motor commands used when at that state) with the probability of generating either a positive or negative outcome – we will be training the robot to figure out which sets of actions result in maximizing the reward. What’s a reward? It’s just an arbitrary value that we use to define whether the learning the robot accomplished was positive – something we wanted – or negative – something we did not want. If the action resulted in positive learning, then we increment the reward, and if it does not, then we decrement the reward. The robot will use an algorithm to both try and maximize the reward, and to incrementally...