Understanding the maths behind linear regression
Let us assume that we have a hypothetical dataset containing information about the costs of several houses and their sizes (in square feet):
Size (square feet) X |
Cost (lakh INR) Y |
---|---|
1500 |
45 |
1200 |
38 |
1700 |
48 |
800 |
27 |
There are two kinds of variables in a model:
The input or predictor variable, the one which helps predict the value of output variable
The output variable, the one which is predicted
In this case, cost is the output variable and the size is the input variable. The output and the input variables are generally referred as Y and X respectively.
In the case of linear regression, we assume that Y (Cost) is a linear function of X (Size) and to estimate Y, we write:
Where Y e is the estimated or predicted value of Y based on our linear equation.
The purpose of linear regression is to find statistically significant values of a and ß, which minimize the difference between Y and Y e. If we are able to determine...