Variable selection
The model we just worked with had a limited number of variables, so mechanical variable selection methods when dealing with a large number of variables were not really that pertinent. We were able to pinpoint the important ones via the regression model. However, for a model with a large number of variables we could use the glmulti
package for the purpose of performing variable selection.
For the churn example that was generated, we have a small number of variables, so it is easy to demonstrate a variable selection and not so time consuming.
In the following code, we will set the maximum number of terms to include in the best regression to 10 in order to limit the computational time needed to perform an exhaustive search. We will also use the genetic algorithm option (method = "g"
) which can be much faster with larger datasets, since it only considers the best subsets of all of the combinations.
If you wish to perform an exhaustive search, use method = "h"
. However, be forewarned...