Logistic regression
Even if called regression, this is a classification method which is based on the probability for a sample to belong to a class. As our probabilities must be continuous in R and bounded between (0, 1), it's necessary to introduce a threshold function to filter the term z. The name logistic comes from the decision to use the sigmoid (or logistic) function:
A partial plot of this function is shown in the following figure:
As you can see, the function intersects x=0 in the ordinate 0.5, and y<0.5 for x<0 and y>0.5 for x>0. Moreover, its domain is R and it has two asymptotes at 0 and 1. So, we can define the probability for a sample to belong to a class (from now on, we'll call them 0 and 1) as:
At this point, finding the optimal parameters is equivalent to maximizing the log-likelihood given the output class:
Therefore, the optimization problem can be expressed, using the indicator notation, as the minimization of the loss function:
If y=0, the first term becomes null...