Introducing logistic regression
Logistic regression is a binary classification model. It is still a linear model, but now the output is constrained to be a binary variable, taking the value of 0
or 1
, instead of modeling a continuous outcome as in the case of linear regression. In other words, we will observe and model the outcome y = 1 or y = 0. For example, in the case of credit risk modeling, y = 0 refers to a non-default loan application, while y = 1 indicates a default loan.
However, instead of directly predicting the binary outcome, the logistic regression model predicts the probability of y taking a specific value, such as P(y = 1). The probability of assuming the other category is P(y = 0) = 1 − P(y = 1), since the total probability should always sum to 1
. The final prediction would be the winner of the two, taking the value of 1
if P(y = 1) > P(y = 0), and 0
otherwise. In the credit risk example, P(y = 1) would be interpreted as the probability of a loan defaulting...