Action masking
One final machine teaching approach we will use is action masking. With that, we can prevent the agent to take certain actions in certain steps based on conditions we define. For mountain car, assume that we have this intuition of building momentum before trying to climb the hill. So, we want the agent to apply force to left if the car is already moving left around the valley. So, for these conditions, we will mask all the actions except left.
def update_avail_actions(self): self.action_mask = np.array([1.0] * \ self.action_space.n) pos, vel = self.wrapped.unwrapped.state # 0: left, 1: no action, 2: right ...