Classifying with linear regression
Even though we know classification problems involve qualitative outputs, it seems natural to ask whether we could use our existing knowledge of linear regression and apply it to the classification setting. We could do this by training a linear regression model to predict a value in the interval [0, 1], remembering that we've chosen to label our two classes as 0 and 1. Then, we could apply a threshold to the output of our model in such a way that, if the model outputs a value below 0.5, we would predict class 0; otherwise, we would predict class 1.
The following graph demonstrates this concept for a simple linear regression with a single input feature X1 and for a binary classification problem.
Our output variable y is either 0 or 1, so all the data lies on two horizontal lines. The solid line shows the output of the model, and the dashed line shows the decision boundary, which arises when we put a threshold on the model's predicted output at the value 0...