The capability of machine learning is actually the capability of adding new knowledge, or refining previous knowledge, that will help us in making the best or optimum decisions. Note the following, according to the economist and political scientist, Herbert Simon:
"Learning is any process by which a system improves performance from experience."
There is a standard definition that has been given by Tom Mitchell, who is a computer scientist and E. Fredkin University Professor at the Carnegie Mellon University (CMU), that is as follows:
"A program is said to learn from experience E with respect to some class of task T and performance measure P. If its performance at tasks in T, as measured by P, improves with experience E, then it is machine learning."
What this means is that when we have certain data and experiences available to us along with the help of a human expert, we are able to classify that particular data. For example, let's say we have some emails. With the help of a human, we can filter the emails as spam, business, marketing, and so on. This means that we are classifying our emails based on our experiences and classes of task T are the classes/filters that we have assigned to the emails.
With this data in mind, if we train our model, we can make a model that will classify emails according to our preferences. This is machine learning. We can always check whether the system has learned perfectly or not, which would be considered as a performance measure.
In this way, we will receive more data in the form of emails and we will be able to classify them, and it would be an improvement of the data. With that gained experience from the new data, the system's performance would improve.
This is the basic idea of machine learning.
The question is, why are we actually doing this?
We do this because we want to develop systems that are too difficult or expensive to construct manually – whether that's because they require specific detailed skills or knowledge tuned to a specific task. This is known as a knowledge engineering bottleneck. As humans, we don't have enough time to actually develop rules for each and every thing, so we look at data and we learn from data in order to make our systems predict things based on learning from data.
The following diagram illustrates the basic architecture of a learning system:
In the preceding diagram, we have a Teacher, we have Data, and we add Labels onto them, and we also have a Teacher who has assigned these labels. We give it to a Learner Component, which keeps it in a Knowledge Base, from which we can evaluate its performance and send it to a Performance Component. Here, we can have different evaluation measures, which we'll look at in future chapter, using which we can send Feedback to the Learner Component. This process can be improved and built upon over time.
The following diagram illustrates a basic architecture of how our supervised learning system looks:
Suppose we have some Training Data. Based on that, we can do some Preprocessing and extract features that are important. These Features will be given to a Learning Algorithm with some Labels attached that have been assigned by a human expert. This algorithm will then learn and create a Model. Once the Model has been created, we can take the new data, preprocess it, and extract features from it; based on those Features, we then send the data to a Model, which will do some kind of a Classification before providing a Decision. When we complete this process, and when we have a human who provides us with Labels, this kind of learning is known as supervised learning.
On the other hand, there is unsupervised learning, which is illustrated in the following diagram:
In unsupervised learning, we extract data and later Features before giving it to a Learning Algorithm, but there is no kind of human intervention that provides classification. In this case, the machine would group the data into smaller clusters, which is how the Model will learn. The next time features are extracted and given to a Model, the Model will provide us with four emails that belong to cluster 1, five emails that belong to cluster 3, and so on. This is known as unsupervised learning, and the algorithms that we use are known as clustering algorithms.