Packt+ | Advance your knowledge in tech

You're reading from Machine Learning Quick Reference

Product type Book

Published in Jan 2019

Publisher Packt

ISBN-13 9781788830577

Pages 294 pages

Edition 1st Edition

Languages

Python

Concepts

Machine Learning

Author (1):

Rahul Kumar

Table of Contents (18) Chapters

Title Page

About Packt

Contributors

Preface

1. Quantifying Learning Algorithms

2. Evaluating Kernel Learning

3. Performance in Ensemble Learning

4. Training Neural Networks

5. Time Series Analysis

6. Natural Language Processing

7. Temporal and Sequential Pattern Discovery

8. Probabilistic Graphical Models

9. Selected Topics in Deep Learning

10. Causal Inference

11. Advanced Methods

1. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Learning curve

The basic premise behind the learning curve is that the more time you spend doing something, the better you tend to get. Eventually, the time to perform a task keeps on plummeting. This is known by different names, such as improvement curve, progress curve, and startup function.

For example, when you start learning to drive a manual car, you undergo a learning cycle. Initially, you are extra careful about operating the break, clutch, and gear. You have to keep reminding yourself when and how to operate these components.

But, as the days go by and you continue practicing, your brain gets accustomed and trained to the entire process. With each passing day, your driving will keep getting smoother and your brain will react to the situation without any realization. This is called subconscious intelligence. You reach this stage with lots of practice and transition from a conscious intelligence to a subconscious intelligence that has got a cycle.

Machine learning

Let me define machine learning and its components so that you don't get bamboozled by lots of jargon when it gets thrown at you.

In the words of Tom Mitchell, "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."Also, another theory says that machine learning is the field that gives computers the ability to learn without being explicitly programmed.

For example, if a computer has been given cases such as, [(father, mother), (uncle, aunt), (brother, sisters)], based on this, it needs to find out (son, ?). That is, given son, what will be the associated item? To solve this problem, a computer program will go through the previous records and try to understand and learn the association and pattern out of these combinations as it hops from one record to another. This is called learning, and it takes place through algorithms. With more records, that is, more experience, the machine gets smarter and smarter.

Let's take a look at the different branches of machine learning, as indicated in the following diagram:

We will explain the preceding diagram as follows:

Supervised learning: In this type of learning, both the input variables and output variables are known to us. Here, we are supposed to establish a relationship between the input variables and the output, and the learning will be based on that. There are two types of problems under it, as follows:
- Regression problem: It has got a continuous output. For example, a housing price dataset wherein the price of the house needs to be predicted based on input variables such as area, region, city, number of rooms, and so on. The price to be predicted is a continuous variable.
- Classification: It has got a discrete output. For example, the prediction that an employee would leave an organization or not, based on salary, gender, the number of members in their family, and so on.
Unsupervised learning: In this type of scenario, there is no output variable. We are supposed to extract a pattern based on all the variables given. For example, the segmentation of customers based on age, gender, income, and so on.
Reinforcement learning: This is an area of machine learning wherein suitable action is taken to maximize reward. For example, training a dog to catch a ball and give it—we reward the dog if they carry out this action; otherwise, we tell them off, leading to a punishment.

Wright's model

In Wright's model, the learning curve function is defined as follows:

The variables are as follows:

Y: The cumulative average time per unit
X: The cumulative number of units produced
a: Time required to produce the first unit
b: Slope of the function when plotted on graph paper (log of the learning rate/log of 2)

The following curve has got a vertical axis (y axis) representing the learning with respect to a particular work and a horizontal axis that corresponds to the time taken to learn. A learning curve with a steep beginning can be comprehended as a sign of rapid progress. The following diagram shows Wright's Learning Curve Model:

However, the question that arises is, How is it connected to machine learning? We will discuss this in detail now.

Let's discuss a scenario that happens to be a supervised learning problem by going over the following steps:

We take the data and partition it into a training set (on which we are making the system learn and come out as a model) and a validation set (on which we are testing how well the system has learned).
The next step would be to take one instance (observation) of the training set and make use of it to estimate a model. The model error on the training set will be 0.
Finally, we would find out the model error on the validation data.

Step 2 and Step 3 are repeated by taking a number of instances (training size) such as 10, 50, and 100 and studying the training error and validation error, as well as their relationship with a number of instances (training size). This curve—or the relationship—is called a learning curve in a machine learning scenario.

Let's work on a combined power plant dataset. The features comprised hourly average ambient variables, that is, temperature (T), ambient pressure (AP), relative humidity (RH), and exhaust vacuum (V), to predict the net hourly electrical energy output (PE) of the plant:

# importing all the libraries
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import learning_curve
import matplotlib.pyplot as plt

#reading the data
data= pd.read_excel("Powerplant.xlsx")

#Investigating the data
print(data.info())
data.head()

From this, we are able to see the data structure of the variables in the data:

The output can be seen as follows:

The second output gives you a good feel for the data.

The dataset has five variables, where ambient temperature (AT) and PE (target variable).

Let's vary the training size of the data and study the impact of it on learning. A list is created for train_size with varying training sizes, as shown in the following code:

# As discussed here we are trying to vary the size of training set
train_size = [1, 100, 500, 2000, 5000]
features = ['AT', 'V', 'AP', 'RH']
target = 'PE'
# estimating the training score & validation score
train_sizes, train_scores, validation_scores = learning_curve(estimator = LinearRegression(), X = data[features],y = data[target], train_sizes = train_size, cv = 5,scoring ='neg_mean_squared_error')

Let's generate the learning_curve:

# Generating the Learning_Curve
train_scores_mean =-train_scores.mean(axis =1)
validation_scores_mean =-validation_scores.mean(axis =1)
import matplotlib.pyplot as plt 
plt.style.use('seaborn')
plt.plot(train_sizes, train_scores_mean, label ='Train_error')
plt.plot(train_sizes, validation_scores_mean, label ='Validation_error')
plt.ylabel('MSE', fontsize =16)
plt.xlabel('Training set size', fontsize =16)
plt.title('Learning_Curves', fontsize = 20, y =1)
plt.legend()

We get the following output:

From the preceding plot, we can see that when the training size is just 1, the training error is 0, but the validation error shoots beyond 400.

As we go on increasing the training set's size (from 1 to 100), the training error continues rising. However, the validation error starts to plummet as the model performs better on the validation set. After the training size hits the 500 mark, the validation error and training error begin to converge. So, what can be inferred out of this? The performance of the model won't change, irrespective of the size of the training post. However, if you try to add more features, it might make a difference, as shown in the following diagram:

The preceding diagram shows that the validation and training curve have converged, so adding training data will not help at all. However, in the following diagram, the curves haven't converged, so adding training data will be a good idea: