Reinforcement Learning is rooted in animal and behavioral psychology, where it is used in many applications of Machine Learning, from games and simulations to control optimization, information theory, statistics, and many more areas every day. RL, at its most basic level, describes an agent acting with an environment that receives either positive or negative rewards based on those actions. The following is a diagram showing the stateless RL model:
Stateless Reinforcement Learning
Conveniently, our multi-armed bandit problem we built in the last chapter fits well with this simpler form of RL. That problem only had a single state, or what we refer to as a one-step RL problem. Since the agent doesn't need to worry about state, we can greatly simplify our RL equations to just write the value of each action using the following equation:
Consider the following...