In the previous recipe, we simply averaged the returns from the behavior policy with importance ratios of their probabilities in the target policy. This technique is formally called ordinary importance sampling. It is known to have high variance and, therefore, we usually prefer the weighted version of importance sampling, which we will talk about in this recipe.
Weighted importance sampling differs from ordinary importance sampling in the way it averages returns. Instead of simply averaging, it takes the weighted average of the returns:
It often has a much lower variance compared to the ordinary version. If you have experimented with ordinary importance sampling for Blackjack, you will find the results vary a lot in each experiment.