Classification with logistic regression
Logistic regression is a type of a classification algorithm (see http://en.wikipedia.org/wiki/Logistic_regression). This algorithm can be used to predict probabilities associated with a class or an event occurring. A classification problem with multiple classes can be reduced to a binary classification problem. In this simplest case, a high probability for one class, means a low probability for another class. Logistic regression is based on the logistic function, which has values in the range between 0 and 1—just like for probabilities. The logistic function can therefore be used to transform arbitrary values into probabilities.
We can define a function that performs classification with logistic regression. Create a classifier object as follows:
clf = LogisticRegression(random_state=12)
The random_state
parameter acts like a seed for a pseudorandom generator. We touched upon the importance of cross-validation earlier in this book as a technique...