Logistic Regression
If a response variable has binary values, the assumptions of linear regression are not valid for the following reasons:
- The relationship between the independent variable and the predictor variable is not linear.
- The error terms are heteroscedastic. Recall that heteroscedastic means that the variance of the error terms is not the same throughout the range of x (input data).
- The error terms are not normally distributed.
If we proceed, considering these violations, the results would be as follows:
- The predicted probabilities could be greater than 1 or less than 0.
- The magnitude of the effects of independent variables may be underestimated.
With logistic regression, we are interested in modeling the mean of the response variable, p, in terms of an explanatory variable, x, as a probabilistic model in terms of the odds ratio. The odds ratio is the ratio of two probabilities – the probability of the event occurring, and the probability...