Introducing least squares
In a simple one-feature model, our hypothesis function is as follows:
If we graph this, we can see that it is a straight line crossing the y axis at w0 and having a slope of w1. The aim of a linear model is to find the parameter values that will create a straight line that most closely matches the data. We call these the functions parameter values. We define an objective function, Jw, which we want to minimize:
Here, m is the number of training samples, hw(x(i)) is the estimated value of the ith training sample, and yi is its actual value. This is the cost function of h, because it measures the cost of the error; the greater the error, the higher the cost. This method of deriving the cost function is sometime referred to as the sum of the squared error because it sums up the difference between the predicted value and the actual value. This sum is halved as a convenience, as we will see. There are actually two ways that we can solve this. We can either use an iterative...