Predicting users who will churn
In this example, we will train logistic regression, random forest, and SVM machine learning models to predict the users that will churn based on the observed variables. We will need to scale the variables first and we will use the sklearn MinMaxScaler
functionality to do so:
- We will start with logistic regression and scale all the variables to a range of 0 to 1:
from sklearn.preprocessing import MinMaxScaler
y = data['Churn'].values
x = data.drop(columns = ['customerID','Churn']).fillna(0)
scaler = MinMaxScaler(feature_range = (0,1))
x_scaled = scaler.fit_transform(x)
x_scaled = pd.DataFrame(x_scaled,columns=x.columns)
x_scaled.head()
The preceding code will create the x
and y
variables, out of which we only need to scale x
.
Figure 7.14: Model input features
It is important to scale the variables in logistic regression so that all of them are within a range of 0 to 1.
- Next, we can...