Notation, Policy, and Utility in RL
You may notice that reinforcement learning jargon involves anthropomorphizing the algorithm into taking actions in situations to receive rewards. In fact, the algorithm is often referred to as an agent that acts with the environment. You can just think of it like an intelligent hardware agent sensing with sensors and interacting with the environment using its actuators.
Therefore, it shouldn't be a surprise that much of RL theory is applied in robotics. Figure 2 demonstrates the interplay between states, actions, and rewards. If you start at state s1, you can perform action a1 to obtain a reward r (s1, a1). Actions are represented by arrows, and states are represented by circles:
A robot performs actions to change between different states. But how does it decide which action to take? Well, it's all about using different or a concrete policy.
Policy
In reinforcement learning lingo, we call...