Automatic optimization through reinforcement learning
You can improve your recommendations by providing online training techniques, which will retrain your recommender systems after every user-item interaction. By replacing the feedback function with a reward function and adding a reinforcement learning model, we can now make recommendations, take decisions, and optimize choices that optimize the reward function.
This is a fantastic new approach to training recommender models. The Azure Personalizer service offers exactly this functionality, to make and optimize decisions and choices by providing contextual features and a reward function to the user. Azure Personalizer uses contextual bandits, an approach to reinforcement learning that is framed around making decisions or choices between discrete actions in a given context.
Note
Under the hood, Azure Personalizer uses the Vowpal Wabbit (https://github.com/VowpalWabbit/vowpal_wabbit/wiki) learning system from Microsoft Research...