Chapter 9: Capstone Project - Based on Research Papers
Activity 14: Getting the Binary Performance Step with classif.C50 Learner Instead of classif.rpart
Define the algorithm adaptation methods:
multilabel.lrn3 = makeLearner("multilabel.rFerns") multilabel.lrn4 = makeLearner("multilabel.randomForestSRC") multilabel.lrn3
The output is as follows:
## Learner multilabel.rFerns from package rFerns ## Type: multilabel ## Name: Random ferns; Short name: rFerns ## Class: multilabel.rFerns ## Properties: numerics,factors,ordered ## Predict-Type: response ## Hyperparameters:
Use the problem transformation method, and change the classif.rpart learner to classif.C50:
lrn = makeLearner("classif.C50", predict.type = "prob") multilabel.lrn1 = makeMultilabelBinaryRelevanceWrapper(lrn) multilabel.lrn2 = makeMultilabelNestedStackingWrapper(lrn)
Note
You need to install the C50 package for this code to work.
Print the learner details:
lrn
The output is as follows:
## Learner classif.C50 from package C50 ## Type: classif ## Name: C50; Short name: C50 ## Class: classif.C50 ## Properties: twoclass,multiclass,numerics,factors,prob,missings,weights ## Predict-Type: prob ## Hyperparameters:
Print the multilabel learner details:
multilabel.lrn1
The output is as follows:
## Learner multilabel.binaryRelevance.classif.C50 from package C50 ## Type: multilabel ## Name: ; Short name: ## Class: MultilabelBinaryRelevanceWrapper ## Properties: numerics,factors,missings,weights,prob,twoclass,multiclass ## Predict-Type: prob ## Hyperparameters:
Train the model using the same dataset with training dataset:
df_nrow <- nrow(df_scene) df_all_index <- c(1:df_nrow) train_index <- sample(1:df_nrow, 0.7*df_nrow) test_index <- setdiff(df_all_index,train_index) scene_classi_mod = train(multilabel.lrn1, scene.task, subset = train_index)
Print the model details:
scene_classi_mod
The output is as follows:
## Model for learner.id=multilabel.binaryRelevance.classif.C50; learner.class=MultilabelBinaryRelevanceWrapper ## Trained on: task.id = multi; obs = 1684; features = 294 ## Hyperparameters:
Predict the output using the C50 model we created for the test dataset:
pred = predict(scene_classi_mod, task = scene.task, subset = test_index) names(as.data.frame(pred))
The output is as follows:
## [1] "id" "truth.Beach" "truth.Sunset" ## [4] "truth.FallFoliage" "truth.Field" "truth.Mountain" ## [7] "truth.Urban" "prob.Beach" "prob.Sunset" ## [10] "prob.FallFoliage" "prob.Field" "prob.Mountain" ## [13] "prob.Urban" "response.Beach" "response.Sunset" ## [16] "response.FallFoliage" "response.Field" "response.Mountain" ## [19] "response.Urban"
Print the performance measures:
MEASURES = list(multilabel.hamloss, multilabel.f1, multilabel.subset01, multilabel.acc, multilabel.tpr, multilabel.ppv) performance(pred, measures = MEASURES)
The output is as follows:
## multilabel.hamloss multilabel.f1 multilabel.subset01 ## 0.1258645 0.5734901 0.5532503 ## multilabel.acc multilabel.tpr multilabel.ppv ## 0.5412633 0.6207930 0.7249104
Print the performance measures for the listMeasures variable:
listMeasures("multilabel")
The output is as follows:
## [1] "featperc" "multilabel.tpr" "multilabel.hamloss" ## [4] "multilabel.subset01" "timeboth" "timetrain" ## [7] "timepredict" "multilabel.ppv" "multilabel.f1" ## [10] "multilabel.acc"
Run the resampling with cross-validation method:
rdesc = makeResampleDesc(method = "CV", stratify = FALSE, iters = 3) r = resample(learner = multilabel.lrn1, task = scene.task, resampling = rdesc,measures = list(multilabel.hamloss), show.info = FALSE) r
The output is as follows:
## Resample Result ## Task: multi ## Learner: multilabel.binaryRelevance.classif.C50 ## Aggr perf: multilabel.hamloss.test.mean=0.1335695 ## Runtime: 72.353
Print the binary performance:
getMultilabelBinaryPerformances(r$pred, measures = list(acc, mmce, auc))
The output is as follows:
## acc.test.mean mmce.test.mean auc.test.mean ## Beach 0.8608226 0.13917740 0.8372448 ## Sunset 0.9401745 0.05982551 0.9420085 ## FallFoliage 0.9081845 0.09181554 0.9008202 ## Field 0.8998754 0.10012464 0.9134458 ## Mountain 0.7710843 0.22891566 0.7622767 ## Urban 0.8184462 0.18155380 0.7837401