Artificial intelligence and one of its subsets, machine learning, are extremely broad collections of data-driven techniques and models. They are used to help find patterns in information, learn from the information, and automatically perform more ‘‘intelligently.’’ They also give humans help and insight that might have been difficult to get otherwise.
Here is a way to start thinking about how quantum computing might be applicable to large, complicated, computation-intensive systems of processes such as those found in AI and elsewhere. These three cases are in some sense the ‘‘small, medium, and large’’ ways quantum computing might complement classical techniques:
-
There is a single mathematical computation somewhere in the middle of a software component that might be sped up via a quantum algorithm.
- There is a well described component of a classical process that could be replaced with a quantum version.
- There is a way to avoid the use of some classical components entirely in the traditional method because of quantum, or the entire classical algorithm can be replaced by a much faster or more effective quantum alternative.
As I write this, quantum computers are not ‘‘big data’’ machines. This means you cannot take millions of records of information and provide them as input to a quantum calculation. Instead, quantum may be able to help where the number of inputs are modest but the computations ‘‘blow up’’ as you start examining relationships or dependencies in the data. Quantum, with its exponentially growing working memory, as we saw in the caffeine example in section 1.2, may be able to control and work with the blow up. (See section 2.7 for a discussion of exponential growth.)
In the future, however, quantum computers may be able to input, output, and process much more data. Even if it is just theoretical now, it makes sense to ask if there are quantum algorithms that can be useful in AI someday.
Let’s look at some data. I’m a big baseball fan, and baseball has a lot of statistics associated with it. The analysis of this even has its own name: ‘‘sabermetrics.’’
Suppose I have a table of statistics for a baseball player given by year as shown in Figure 1.1. We can make this look more mathematical by creating a matrix of the same data.
Given such information, we can manipulate it using machine learning techniques to make predictions about the player’s future performance or even how other similar players may do. These techniques make use of the matrix operations we discuss in chapter 5. There are 30 teams in Major League Baseball in the United States. With their training and feeder ‘‘minor league’’ teams, each major league team may each have more than 400 players throughout their systems. That would give us over 12,000 players, each with their complete player histories. There are more statistics than I have listed, so we can easily get greater than 100,000 values in our matrix.
In the area of entertainment, it’s hard to make an estimate of how many movies exist, but it is well above 100,000. For each movie, we can list features such as whether it is a comedy or a drama or a romance or an action film, who each of the actors are, who each of the directorial and production staff are, geographic locations shown in the fim, languages used, and so on. There are hundreds of such features and millions of people who have watched the films!
For each person, we can also add features such as whether they like or dislike a kind of movie, actor, scene location, or director. Using all this information, which film should I recommend to you on a Saturday night in December based on what you and people similar to you like?
Think of each feature or each baseball player or film as a dimension. While you may think of two and three dimensions in nature, in AI we might have thousands or millions of dimensions.
Matrices as above for AI can grow to millions of rows and entries. How can we make sense of them to get insights and see patterns? Aside from manipulating that much information, can we even eventually do the math on classical computers quickly and accurately enough?
While it was originally thought that quantum algorithms might offer exponential improvements of such classical recommender systems, a 2019 ‘‘quantum-inspired algorithm’’ by Ewin Tang showed a classical method to gain such a huge improvement. [17] An example of being exponentially faster is doing something in 6 days instead of 106 = 1 million days. That’s approximately 2,740 years.
Tang’s work is a fascinating example of the interplay of progress in both classical and quantum algorithms. People who develop algorithms for classical computing look to quantum computing, and vice versa. Also, any particular solution to a problem many include classical and quantum components.
Nevertheless, many believe that quantum computing will show very large improvements for some matrix computations. One such example is the HHL algorithm, whose abbreviation comes from the first letters of the last names of its authors, Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd. This is also an example of case number 1 above.
Algorithms such as these may find use in fields as diverse as economics and computational fluid dynamics. They also place requirements on the structure and density of the data and may use properties such as the condition number we discuss in subsection 5.7.6 .
To learn more
When you complete this book you will be equipped to read the original paper describing the HHL algorithm and more recent surveys about how to apply quantum computing to linear algebraic problems. [7]
An important problem in machine learning is classification. In its simplest form, a binary classifier separates items into one of two categories, or buckets. Depending on the definitions of the categories, it may be more or less easy to do the classification.
Examples of binary categories include:
- book you like or book you don’t like
- comedy movie or dramatic movie
- gluten-free or not gluten-free
- fish dish or chicken dish
- UK football team or Spanish football team
- hot sauce or very hot sauce
- cotton shirt or permanent press shirt
- open source or proprietary
- spam email or valid email
- American League baseball team or National League team
The second example of distinguishing between comedies and dramas may not be well designed since there are movies that are both.
Mathematically, we can imagine taking some data as input and classifying it as either +1 or −1. We take a reasonably large set of data and label it by hand as either being a +1 or −1. We then learn from this training set how to classify future data.
Machine learning binary classification algorithms include random forest, k-nearest neighbor, decision tree, neural networks, naive Bayes classifiers, and support vector machines (SVMs).
In the training phase, we are given a list of pre-classified objects (books, movies, proteins, operating systems, baseball teams, etc.). We then use the above algorithms to learn how to put a new object in one bucket or another.
The SVM is a straightforward approach with a clear mathematical description. In the two-dimensional case, we try to draw a line that separates the objects (represented by points in the plot to the right) into one category or the other.
The line should maximize the gap between the sets of objects.
Below is an example of a line that separates the red points below from the blue points above the line.
Given a new point, we plot it and determine whether it is above or below the line. That will classify it as blue or red, respectively.
Suppose we know that the point is correctly classified with those above the line. We accept that and move on.
If the point is misclassified, we add the point to the training set and try to compute a new and better line. This may not be possible.
In the plot to the right, I added a new red point above the line close to 2 on the vertical axis. With this extra point, there is no line we can compute to separate the points.
Had we represented the objects in three dimensions, we would try to find a plane that separated the points with maximum gap. We would need to compute some new amount that the points are above or below the plane. In geometric terms, if we are given x and y only, we somehow need to compute a z to work in that third dimension.
For a representation using n dimensions, we try to compute an n − 1 separating hyperplane. We look at two and three dimensions in chapter 4 and the general case in chapter 5.
In this three-dimensional plot, I take the same values from the last two-dimensional version and lay the coordinate plane flat. I then add a vertical dimension. I push the red points below the plane and the blue ones above. With this construction, the coordinate plane itself separates the values.
While we can’t separate the points in two dimensions, we can in three dimensions. This kind of mapping into a higher dimension is called the kernel trick. While the coordinate plane in this case might not be the ideal separating hyperplane, it gives you an idea of what we are trying to accomplish. The benefit of kernel functions (as part of the similarly named ‘‘trick’’) is that we can do far fewer explicit geometric computations than you might expect in these higher dimensional spaces.
It’s worth mentioning now that we don’t need to try quantum methods on small problems that are handled quite well using traditional means. We won’t see any kind of quantum advantage until the problems are big enough to overcome the quantum circuit overhead versus classical circuits. Also, if we come up with a quantum approach that can be simulated easily on a classical computer, we don’t really need a quantum computer.
A quantum computer with 1 qubit provides us with a two-dimensional working space. Every time we add a qubit, we double the number of dimensions. This is due to the properties of superposition and entanglement that I introduce in chapter 7. For 10 qubits, we get 210 = 1024 dimensions. Similarly, for 50 qubits we get 250 = 1,125,899,906,842,624 dimensions.
Remember all those dimensions for the features and baseball players and films? We want to use a sufficiently large quantum computer to do the AI calculations in a quantum feature space. This is the main point: handle the extremely large number of dimensions coming out of the data in a large quantum feature space.
There is a quantum approach that can generate the separating hyperplane in the quantum feature space. There is another that skips the hyperplane step and produces a highly accurate classifying kernel function. As the ability to entangle more qubits increases, the successful classification rate improves as well. [8] This is an active area of research: how can we use entanglement, which does not exist classically, to find new or better patterns than we can do with strictly traditional methods?
To learn more
There are an increasing number of research papers being written connecting quantum computing with machine learning and other AI techniques. The results are somewhat fragmented. The best book pulling together the state of the art is Wittek. [19].
I do warn you again that quantum computers cannot process much data now!
For an advanced application of machine learning for quantum computing and chemistry, see Torial et al. [18]