Packt+ | Advance your knowledge in tech

You're reading from Machine Learning with R R gives you access to the cutting-edge software you need to prepare data for machine learning. No previous knowledge required ‚Äì this book will take you methodically through every stage of applying machine learning.

Product type Paperback

Published in Oct 2013

Publisher Packt

ISBN-13 9781782162148

Length 396 pages

Edition 1st Edition

Languages

Concepts

Machine Learning

Author (1):

Brett Lantz

View More author details

Table of Contents (19) Chapters

Machine Learning with R

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Introducing Machine Learning FREE CHAPTER

2. Managing and Understanding Data

3. Lazy Learning – Classification Using Nearest Neighbors

4. Probabilistic Learning – Classification Using Naive Bayes

5. Divide and Conquer – Classification Using Decision Trees and Rules

6. Forecasting Numeric Data – Regression Methods

7. Black Box Methods – Neural Networks and Support Vector Machines

8. Finding Patterns – Market Basket Analysis Using Association Rules

9. Finding Groups of Data – Clustering with k-means

10. Evaluating Model Performance

11. Improving Model Performance

12. Specialized Machine Learning Topics

Index

Understanding clustering

Clustering is an unsupervised machine learning task that automatically divides the data into clusters, or groupings of similar items. It does this without having been told what the groups should look like ahead of time. As we may not even know what we're looking for, clustering is used for knowledge discovery rather than prediction. It provides an insight into the natural groupings found within data.

Without advance knowledge of what comprises a cluster, how could a computer possibly know where one group ends and another begins? The answer is simple. Clustering is guided by the principle that records inside a cluster should be very similar to each other, but very different from those outside. As you will see later, the definition of similarity might vary across applications, but the basic idea is always the same: group the data such that related elements are placed together.

The resulting clusters can then be used for action. For instance, you might find clustering...