HR attrition data example
In this section, we will be using IBM Watson's HR Attrition data (the data has been utilized in the book after taking prior permission from the data administrator) shared in Kaggle datasets under open source license agreement https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset to predict whether employees would attrite or not based on independent explanatory variables:
>>> import pandas as pd
>>> hrattr_data = pd.read_csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")
>>> print (hrattr_data.head())
There are about 1470 observations and 35 variables in this data, the top five rows are shown here for a quick glance of the variables:
The following code is used to convert Yes or No categories into 1 and 0 for modeling purposes, as scikit-learn does not fit the model on character/categorical variables directly, hence dummy coding is required to be performed for utilizing the variables in models:
>>> hrattr_data['Attrition_ind...