Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
R Data Mining

You're reading from   R Data Mining Implement data mining techniques through practical use cases and real-world datasets

Arrow left icon
Product type Paperback
Published in Nov 2017
Publisher Packt
ISBN-13 9781787124462
Length 442 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Andrea Cirillo Andrea Cirillo
Author Profile Icon Andrea Cirillo
Andrea Cirillo
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Why to Choose R for Your Data Mining and Where to Start 2. A First Primer on Data Mining Analysing Your Bank Account Data FREE CHAPTER 3. The Data Mining Process - CRISP-DM Methodology 4. Keeping the House Clean – The Data Mining Architecture 5. How to Address a Data Mining Problem – Data Cleaning and Validation 6. Looking into Your Data Eyes – Exploratory Data Analysis 7. Our First Guess – a Linear Regression 8. A Gentle Introduction to Model Performance Evaluation 9. Don't Give up – Power up Your Regression Including Multiple Variables 10. A Different Outlook to Problems with Classification Models 11. The Final Clash – Random Forests and Ensemble Learning 12. Looking for the Culprit – Text Data Mining with R 13. Sharing Your Stories with Your Stakeholders through R Markdown 14. Epilogue
15. Dealing with Dates, Relative Paths and Functions

Introducing summary EDA


Have you ever heard about summary EDA? Since you are new to the job, I guess the answer is no. I will tell you something about this while I download the data you sent me, and open it within the RStudio project I prepared for the occasion. I hope you don't mind if I tell you something you already know.

Summary, EDA encompasses all the activities that are based on the computation of one or more indexes useful to describe the data we are dealing with. What differentiates this branch of the EDA from its relatives is the non-graphical nature of this set of measures: here, we are going to compute just a bunch of numbers, while with the graphical EDA we will perform later, plot and visualization will be the core of our techniques.

While we were talking, our data became ready, so we can start working on it. I will start looking at the cash_flows report, since it probably has enough info to reveal to us where this drop is coming from.

Describing the population distribution

First...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime