2018 prediction: Was reinforcement learning applied to many real-world situations?

Back in 2017, we predicted that reinforcement learning would be an important subplot in the growth of artificial intelligence. After all, a machine learning agent that adapts and ‘learns’ according to environmental changes has all the makings of an incredibly powerful strain of artificial intelligence. Surely, then, the world was going to see new and more real-world uses for reinforcement learning.

But did that really happen? You can bet it did. However, with all things intelligent subsumed into the sexy, catch-all term artificial intelligence, you might have missed where reinforcement learning was used.

Let’s go all the way back to 2017 to begin. This was the year that marked a genesis in reinforcement learning. The biggest and most memorable event was perhaps when Google’s AlphaGo defeated the world’s best Go player. Ultimately, this victory could be attributed to reinforcement learning; AlphaGo ‘played’ against itself multiple times, each time becoming ‘better’ at the game, developing an algorithmic understanding of how it could best defeat an opponent.

However, reinforcement learning went well beyond board games in 2018.

Reinforcement learning in cancer treatment

MIT researchers used reinforcement learning to improve brain cancer treatment. Essentially, the reinforcement learning system is trained on a set of data on established treatment regimes for patients, and then ‘learns’ to find the most effective strategy for administering cancer treatment drugs. The important point is that artificial intelligence here can help to find the right balance between administering and withholding the drugs.

Reinforcement learning in self-driving cars

In 2018, UK self-driving car startup Wayve trained a car to drive using its ‘imagination’. Real world data was collected offline to train the model, which was then used to observe and predict the ‘motion’ of items in a scene and drive on the road. Even though the data was collected in sunny conditions, the system can also drive in rainy situations adjusting itself to reflections from puddles etc. As the data is collected from the real world, there aren’t any major differences in simulation versus real application.

UC Berkeley researchers also developed a deep reinforcement learning method to optimize SQL joins. The join ordering problem is formulated as a Markov Decision Process (MDP). A method called Q-learning is applied to solve the join-ordering MDP. The deep reinforcement learning optimizer called DQ offers out solutions that are close to an optimal solution across all cost models. It does so without any previous information about the index structures.

Robot prosthetics

OpenAI researchers created a robot hand called Dactyl in 2018. Dactyl has human-like dexterity for performing complex in hand manipulations, achieved through the use of reinforcement learning.

Finally, it’s back to Go. Well, not just Go - chess, and a game called Shogi too. This time, Deepmind’s AlphaZero was the star. Whereas AlphaGo managed to master Go, AlphaZero mastered all three.

This was significant as it indicates that reinforcement learning could help develop a more generalized intelligence than can currently be developed through artificial intelligence. This is an intelligence that is able to adapt to new contexts and situations - to almost literally understand the rules of very different games.

But there was something else impressive about AlphaZero - it was only introduced to a set of basic rules for each game. Without any domain knowledge or examples, the newer program outperformed the current state-of-the-art programs in all three games with only a few hours of self-training.