Model validation and evaluation
The preceding logistic regression model is built on the entire data. Let us now split the data into training and testing sets, build the model using the training set, and then check the accuracy using the testing set. The ultimate goal is to see whether it improves the accuracy of the prediction or not:
from sklearn.cross_validation import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=0)
The preceding code snippet creates testing and training datasets for a predictor and also outcome variables. Let us now build a logistic regression model over the training set:
from sklearn import linear_model from sklearn import metrics clf1 = linear_model.LogisticRegression() clf1.fit(X_train, Y_train)
The preceding code snippet creates the model. If you remember the equation behind the model, you will know that the model predicts probabilities and not the classes (binary output, that is, 0 or 1). One needs to select a...