You're reading from Mastering Machine Learning with R Master machine learning techniques with R to deliver insights for complex projects

Product type Paperback

Published in Oct 2015

Publisher

ISBN-13 9781783984527

Length 400 pages

Edition 1st Edition

Languages

Tools

RStudio

Concepts

Machine Learning

Author (1):

Cory Lesmeister

View More author details

Table of Contents (15) Chapters

Preface

1. A Process for Success

2. Linear Regression – The Blocking and Tackling of Machine Learning FREE CHAPTER

3. Logistic Regression and Discriminant Analysis

4. Advanced Feature Selection in Linear Models

5. More Classification Techniques – K-Nearest Neighbors and Support Vector Machines

6. Classification and Regression Trees

7. Neural Networks

8. Cluster Analysis

9. Principal Components Analysis

10. Market Basket Analysis and Recommendation Engines

11. Time Series and Causality

12. Text Mining

A. R Fundamentals

Index

Data understanding

After enduring the all-important pain of the first step, you can now get your hands on the data. The tasks in this process consist of the following:

Collect the data
Describe the data
Explore the data
Verify the data quality

This step is the classic case of ETL is Extract, Transform, Load. There are some considerations here. You need to make an initial determination that the data available is adequate to meet your analytical needs. As you explore the data, visually and otherwise, determine if the variables are sparse and identify the extent to which the data may be missing. This may drive the learning method that you use and/or whether the imputation of the missing data is necessary and feasible.

Verifying the data quality is critical. Take the time to understand who collects the data, how it is collected, and even why it is collected. It is likely that you may stumble upon an incomplete data collection, cases where unintended IT issues led to errors in the data, or there were planned changes in the business rules. This is critical in the time series where often business rules change over time on how the data is classified. Finally, it is a good idea to begin documenting any code at this step. As a part of the documentation process, if a data dictionary is not available, save yourself the heartache later on and make one.

The rest of the chapter is locked

You're reading from Mastering Machine Learning with R Master machine learning techniques with R to deliver insights for complex projects

Table of Contents (15) Chapters

Data understanding

Authors (1)

Personalised recommendations for you

You're reading from Mastering Machine Learning with R Master machine learning techniques with R to deliver insights for complex projects

Table of Contents (15) Chapters

Data understanding

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you