The Thompson Sampling model
You're going to build this model straight away. Right now, you'll build a simple implementation of this method, and later you will be shown the theory behind it. Let's get right into it!
As we defined previously, our problem is trying to find the best slot machine with the highest winning chance out of many. A not-so-optimal solution would be to play 100 rounds on each of our slot machines and see which one has the highest winning rate. A better solution is a method called Thompson Sampling.
I won't go too deeply into the theory behind it; we'll cover that later. For now, it is enough to say that Thompson Sampling uses a distribution function (distributions will be explained further in this chapter), called Beta, that takes two arguments. For simplicity's sake, let's say that the higher the first argument is, the better our slot machine is, and the higher the second argument is, the worse our slot machine...