Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from R Data Mining Implement data mining techniques through practical use cases and real-world datasets

Product type Paperback

Published in Nov 2017

Publisher Packt

ISBN-13 9781787124462

Length 442 pages

Edition 1st Edition

Languages

Tools

ggplot

Concepts

Data Mining

Author (1):

Andrea Cirillo

View More author details

Table of Contents (16) Chapters

Preface

1. Why to Choose R for Your Data Mining and Where to Start

2. A First Primer on Data Mining Analysing Your Bank Account Data FREE CHAPTER

3. The Data Mining Process - CRISP-DM Methodology

4. Keeping the House Clean – The Data Mining Architecture

5. How to Address a Data Mining Problem – Data Cleaning and Validation

6. Looking into Your Data Eyes – Exploratory Data Analysis

7. Our First Guess – a Linear Regression

8. A Gentle Introduction to Model Performance Evaluation

9. Don't Give up – Power up Your Regression Including Multiple Variables

10. A Different Outlook to Problems with Classification Models

11. The Final Clash – Random Forests and Ensemble Learning

12. Looking for the Culprit – Text Data Mining with R

13. Sharing Your Stories with Your Stakeholders through R Markdown

14. Epilogue

15. Dealing with Dates, Relative Paths and Functions

Applying the majority vote ensemble technique on predicted data

It is now time to finally draw our list, applying the majority vote technique we learned previously to our predictions. As done before, we are going to apply a threshold on values predicted from the logistic and SVM models, to map the original predictions on the [0,1] domain. Finally, with a piece of code really similar to the one we have seen before, let's create an ensemble_prediction attribute, storing a final prediction defined from results coming from the three estimated models:

me_customer_list %>% 
mutate(logistic_threshold = case_when(as.numeric(logistic)>0.5 ~ 1,
TRUE ~ 0),
svm_threshold = case_when(as.numeric(svm)>0.5 ~ 1,
TRUE ~ 0)) %>% 
mutate(ensemble_prediction = case_when(logistic_threshold+svm_threshold+ as.numeric(as.character(random_forest)) >=2 ~ 1,
TRUE...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

Andrea Cirillo

Andrea Cirillo is currently working as an audit quantitative analyst at Intesa Sanpaolo Banking Group. He gained financial and external audit experience at Deloitte Touche Tohmatsu and internal audit experience at FNM, a listed Italian company. His main responsibilities involve the evaluation of credit risk management models and their enhancement, mainly within the field of the Basel III capital agreement. He is married to Francesca and is the father of Tommaso, Gianna, Zaccaria, and Filippo. Andrea has written and contributed to a few useful R packages such as updateR, ramazon, and paletteR, and regularly shares insightful advice and tutorials on R programming. His research and work mainly focus on the use of R in the fields of risk management and fraud detection, largely by modeling custom algorithms and developing interactive applications. Andrea has previously authored RStudio for R Statistical Computing Cookbook for Packt Publishing.

See other products by Andrea Cirillo