Using Reinforcement Learning for recommendations
The core machine learning concept that is the basis for the Personalizer service is Reinforcement Learning. Along with Supervised Learning and Unsupervised Learning, it makes up the basic foundational pillars of machine learning. As discussed previously in the chapter, RL is a mix of the exploration of options and exploitation of an existing model. This concept differs from Supervised Learning because, with this pillar, we are expected to provide feedback for each activity, such as a labeled input/output pair for presentation or feedback when an incorrect option is presented and there’s a need for feedback for correction. With Unsupervised Learning, patterns are learned from untagged data, which will mimic what is extrapolated from the data to build expected predictions.
So, as this method is applied to the service, we take our reward feedback from the application we’re using to better train the model for the exploitation...