Exploring the Response Variable and Concluding the Initial Exploration
We have now looked through all the features to see whether any data is missing, as well as to generally examine them. The features are important because they constitute the inputs to our machine learning algorithm. On the other side of the model lies the output, which is a prediction of the response variable. For our problem, this is a binary flag indicating whether or not an account will default the next month, which would have been October for our historical dataset.
Our overall job on this project is to come up with a predictive model for this target. Since the response variable is a yes/no flag, this problem is called a binary classification task. In our labeled data, the samples (accounts) that defaulted (that is, 'default payment next month' = 1) are said to belong to the positive class, while those that didn't belong to the negative class. The key piece of information to examine regarding the response of a binary...