Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Artificial Intelligence By Example

You're reading from   Artificial Intelligence By Example Develop machine intelligence from scratch using real artificial intelligence use cases

Arrow left icon
Product type Paperback
Published in May 2018
Publisher Packt
ISBN-13 9781788990547
Length 490 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Denis Rothman Denis Rothman
Author Profile Icon Denis Rothman
Denis Rothman
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Become an Adaptive Thinker 2. Think like a Machine FREE CHAPTER 3. Apply Machine Thinking to a Human Problem 4. Become an Unconventional Innovator 5. Manage the Power of Machine Learning and Deep Learning 6. Don't Get Lost in Techniques – Focus on Optimizing Your Solutions 7. When and How to Use Artificial Intelligence 8. Revolutions Designed for Some Corporations and Disruptive Innovations for Small to Large Companies 9. Getting Your Neurons to Work 10. Applying Biomimicking to Artificial Intelligence 11. Conceptual Representation Learning 12. Automated Planning and Scheduling 13. AI and the Internet of Things (IoT) 14. Optimizing Blockchains with AI 15. Cognitive NLP Chatbots 16. Improve the Emotional Intelligence Deficiencies of Chatbots 17. Quantum Computers That Think 18. Answers to the Questions

The lessons of reinforcement learning

Unsupervised reinforcement machine learning, such as MDP and Bellman's equation, will topple traditional decision-making software in the next few years. Memoryless reinforcement learning requires few to no business rules and thus doesn't require human knowledge to run.

Being an adaptive AI thinker involves three requisites—the effort to be an SME, working on mathematical models, and understanding source code's potential and limits:

  • Lesson 1: Machine learning through reinforcement learning can beat human intelligence in many cases. No use fighting! The technology and solutions are already here.
  • Lesson 2: Machine learning has no emotions, but you do. And so do the people around you. Human emotions and teamwork are an essential asset. Become an SME for your team. Learn how to understand what they're trying to say intuitively and make a mathematical representation of it for them. This job will never go away, even if you're setting up solutions such as Google's AutoML that don't require much development.

Reinforcement learning shows that no human can solve a problem the way a machine does; 50,000 iterations with random searching is not an option. The days of neuroscience imitating humans are over. Cheap, powerful computers have all the leisure it takes to compute millions of possibilities and choose the best trajectories.

Humans need to be more intuitive, make a few decisions, and see what happens because humans cannot try 50,000 ways of doing something. Reinforcement learning marks a new era for human thinking by surpassing human reasoning power.

On the other hand, reinforcement learning requires mathematical models to function. Humans excel in mathematical abstraction, providing powerful intellectual fuel to those powerful machines.

The boundaries between humans and machines have changed. Humans' ability to build mathematical models and every-growing cloud platforms will serve online machine learning services.

Finding out how to use the outputs of the reinforcement learning program we just studied shows how a human will always remain at the center of artificial intelligence.

How to use the outputs

The reinforcement program we studied contains no trace of a specific field, as in traditional software. The program contains Bellman's equation with stochastic (random) choices based on the reward matrix. The goal is to find a route to C (line 3, column 3), which has an attractive reward (100):

# Markov Decision Process (MDP) - Bellman's equations adapted to
# Reinforcement Learning with the Q action-value(reward) matrix
# R is The Reward Matrix for each state
R = ql.matrix([ [0,0,0,0,1,0],
[0,0,0,1,0,1],
[0,0,100,1,0,0],
[0,1,1,0,1,0],
[1,0,0,1,0,0],
[0,1,0,0,0,0] ])

That reward matrix goes through Bellman's equation and produces a result in Python:

Q :
[[ 0. 0. 0. 0. 258.44 0. ]
[ 0. 0. 0. 321.8 0. 207.752]
[ 0. 0. 500. 321.8 0. 0. ]
[ 0. 258.44 401. 0. 258.44 0. ]
[ 207.752 0. 0. 321.8 0. 0. ]
[ 0. 258.44 0. 0. 0. 0. ]]
Normed Q :
[[ 0. 0. 0. 0. 51.688 0. ]
[ 0. 0. 0. 64.36 0. 41.5504]
[ 0. 0. 100. 64.36 0. 0. ]
[ 0. 51.688 80.2 0. 51.688 0. ]
[ 41.5504 0. 0. 64.36 0. 0. ]
[ 0. 51.688 0. 0. 0. 0. ]]

The result contains the values of each state produced by the reinforced learning process, and also a normed Q (highest value divided by other values).

As Python geeks, we are overjoyed. We made something rather difficult to work, namely reinforcement learning. As mathematical amateurs, we are elated. We know what MDP and Bellman's equation mean.

However, as natural language thinkers, we have made little progress. No customer or user can read that data and make sense of it. Furthermore, we cannot explain how we implemented an intelligent version of his/her job in the machine. We didn't.

We hardly dare say that reinforcement learning can beat anybody in the company making random choices 50,000 times until the right answer came up.

Furthermore, we got the program to work but hardly know what to do with the result ourselves. The consultant on the project cannot help because of the matrix format of the solution.

Being an adaptive thinker means knowing how to be good in all the dimensions of a subject. To solve this new problem, let's go back to step 1 with the result.

By formatting the result in Python, a graphics tool, or a spreadsheet, the result that is displayed as follows:

A B C D E F
A - - - - 258.44 -
B - - - 321.8 - 207.752
C - - 500 321.8 - -
D - 258.44 401. - 258.44 -
E 207.752 - - 321.8 - -
F - 258.44 - - - -

Now, we can start reading the solution:

  • Choose a starting state. Take F for example.
  • The F line represents the state. Since the maximum value is 258.44 in the B column, we go to state B, the second line.
  • The maximum value in state B in the second line leads us to the D state in the fourth column.
  • The highest maximum of the D state (fourth line) leads us to the C state.

Note that if you start at the C state and decide not to stay at C, the D state becomes the maximum value, which will lead you to back to C. However, the MDP will never do this naturally. You will have to force the system to do it.

You have now obtained a sequence: F->B->D->C. By choosing other points of departure, you can obtain other sequences by simply sorting the table.

The most useful way of putting it remains the normalized version in percentages. This reflects the stochastic (random) property of the solution, which produces probabilities and not certainties, as shown in the following matrix:

A B C D E F
A - - - - 51.68% -
B - - - 64.36% - 41.55%
C - - 100% 64.36% - -
D - 51.68% 80.2% - 51.68% -
E 41.55% - - 64.36% - -
F - 51.68% - - - -

Now comes the very tricky part. We started the chapter with a trip on a road. But I made no mention of it in the result analysis.

An important property of reinforcement learning comes from the fact that we are working with a mathematical model that can be applied to anything. No human rules are needed. This means we can use this program for many other subjects without writing thousands of lines of code.

Case 1: Optimizing a delivery for a driver, human or not

This model was described in this chapter.

Case 2: Optimizing warehouse flows

The same reward matrix can apply to going from point F to C in a warehouse, as shown in the following diagram:

In this warehouse, the F->B->D->C sequence makes visual sense. If somebody goes from point F to C, then this physical path makes sense without going through walls.

It can be used for a video game, a factory, or any form of layout.

Case 3: Automated planning and scheduling (APS)

By converting the system into a scheduling vector, the whole scenery changes. We have left the more comfortable world of physical processing of letters, faces, and trips. Though fantastic, those applications are social media's tip of the iceberg. The real challenge of artificial intelligence begins in the abstract universe of human thinking.

Every single company, person, or system requires automatic planning and scheduling (see Chapter 12, Automated Planning and Scheduling). The six A to F steps in the example of this chapter could well be six tasks to perform in a given unknown order represented by the following vector x:

The reward matrix then reflects the weights of constraints of the tasks of vector x to perform. For example, in a factory, you cannot assemble the parts of a product before manufacturing them.

In this case, the sequence obtained represents the schedule of the manufacturing process.

Case 4 and more: Your imagination

By using physical layouts or abstract decision-making vectors, matrices, and tensors, you can build a world of solutions in a mathematical reinforcement learning model. Naturally, the following chapters will enhance your toolbox with many other concepts.

Machine learning versus traditional applications

Reinforcement learning based on stochastic (random) processes will evolve beyond traditional approaches. In the past, we would sit down and listen to future users to understand their way of thinking.

We would then go back to our keyboard and try to imitate the human way of thinking. Those days are over. We need proper datasets and ML/DL equations to move forward. Applied mathematics has taken reinforcement learning to the next level. Traditional software will soon be in the museum of computer science.

An artificial adaptive thinker sees the world through applied mathematics translated into machine representations.

Use the Python source code example provided in this chapter in different ways. Run it; try to change some parameters to see what happens. Play around with the number of iterations as well. Lower the number from 50,000 down to where you find its best. Change the reward matrix a little to see what happens. Design your own reward matrix trajectory. It can be an itinerary or a decision-making process.
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £16.99/month. Cancel anytime