So far, we have randomly picked an action and applied it. Now let us apply some logic to picking the action instead of random chance. The third observation refers to the angle. If the angle is greater than zero, that means the pole is tilting right, thus we move the cart to the right (1). Otherwise, we move the cart to the left (0). Let us look at an example:
- We define two policy functions as follows:
def policy_logic(env,obs):
return 1 if obs[2] > 0 else 0
def policy_random(env,obs):
return env.action_space.sample()
- Next, we define an experiment function that will run for a specific number of episodes; each episode runs until the game is lost, namely when done is True. We use rewards_max to indicate when to break out of the loop as we do not wish to run the experiment forever:
def experiment(policy, n_episodes, rewards_max...