Reinforcement learning concepts
AI is constantly evolving. The classical approach states that:
- AI covers all domains
- Machine learning is a subset of AI, with clustering, classification, regression, and reinforcement learning
- Deep learning is a subset of machine learning that involves neural networks
However, these domains often overlap and it's difficult to fit neuromorphic computing, for example, with its sub-symbolic approach, into these categories (see Chapter 18, Neuromorphic Computing).
In this chapter, RL clearly fits into machine learning. Let's have a brief look into the scientific foundations of the MDP, the RL algorithm we are going to explore. The main concepts to keep in mind are the following:
- Optimal transport: In 1781, Gaspard Monge defined transport optimizing from one location to another using the shortest and most cost-effective path; for example, mining coal and then using the most cost-effective path to a factory. This was subsequently generalized to any form of path from point AÂ to point B.
- Boltzmann equation and constant: In the late 19th century, Ludwig Boltzmann changed our vision of the world with his probabilistic distribution of particles beautifully summed up in his entropy formula:
S = k * log W
S represents the entropy (energy, disorder) of a system expressed. k is the Boltzmann constant, and W represents the number of microstates. We will explore Boltzmann's ideas further in Chapter 14, Preparing the Input of Chatbots with Restricted Boltzmann Machines (RBMs) and Principal Component Analysis (PCA).
- Probabilistic distributions advanced further: Josiah Willard Gibbs took the probabilistic distributions of large numbers of particles a step further. At that point, probabilistic information theory was advancing quickly. At the turn of the 19th century, Andrey Markov applied probabilistic algorithms to language, among other areas. A modern era of information theory was born.
- When Boltzmann and optimal transport meet: 2011 Fields Medal winner, Cédric Villani, brought Boltzmann's equation to yet another level. Villani then went on to unify optimal transport and Boltzmann. Cédric Villani proved something that was somewhat intuitively known to 19th century mathematicians but required proof.
Let's take all of the preceding concepts and materialize them in a real-world example that will explain why reinforcement learning using the MDP, for example, is so innovative.
Analyzing the following cup of tea will take you right into the next generation of AI:
Figure 1.1: Consider a cup of tea
You can look at this cup of tea in two different ways:
- Macrostates: You look at the cup and content. You can see the volume of tea in the cup and you could feel the temperature when holding the cup in your hand.
- Microstates: But can you tell how many molecules are in the tea, which ones are hot, warm, or cold, their velocity and directions? Impossible right?
Now, imagine, the tea contains 2,000,000,000+ Facebook accounts, or 100,000,000+ Amazon Prime users with millions of deliveries per year. At this level, we simply abandon the idea of controlling every item. We work on trends and probabilities.
Boltzmann provides a probabilistic approach to the evaluation of the features of our real world. Materializing Boltzmann in logistics through optimal transport means that the temperature could be the ranking of a product, the velocity can be linked to the distance to delivery, and the direction could be the itineraries we will study in this chapter.
Markov picked up the ripe fruits of microstate probabilistic descriptions and applied it to his MDP. Reinforcement learning takes the huge volume of elements (particles in a cup of tea, delivery locations, social network accounts) and defines the probable paths they take.
The turning point of human thought occurred when we simply could not analyze the state and path of the huge volumes facing our globalized world, which generates images, sounds, words, and numbers that exceed traditional software approaches.
With this in mind, we can start exploring the MDP.