The multi-armed bandit problem
Imagine you are in Las Vegas, in your favorite casino. You are in a room containing five slot machines. For each of them the game is the same: you bet a certain amount of money, say 1 dollar, you pull the arm, and then the machine will either take your money, or give you twice your money back. Remember the rewards we talked about in the previous chapter? Let's say that if the machine takes your money, your reward is -1, and if the machine returns you twice your money, your reward is +1.
As you can see, you're already starting to define an AI environment, which I'll remind you is absolutely fundamental when solving a problem with AI. So far, the AI isn't there, but it will come soon. You always start by defining the environment.
You've defined the rewards; you'll define the states (inputs) and actions (outputs) later. Now, still in the process of defining the environment, let's say that you know, somehow...