Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Training Systems Using Python Statistical Modeling

You're reading from   Training Systems Using Python Statistical Modeling Explore popular techniques for modeling your data in Python

Arrow left icon
Product type Paperback
Published in May 2019
Publisher Packt
ISBN-13 9781838823733
Length 290 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Curtis Miller Curtis Miller
Author Profile Icon Curtis Miller
Curtis Miller
Arrow right icon
View More author details
Toc

Exploring the k-means algorithm

In this section, we will look at applying the k-means clustering algorithm. We will learn about the k-means algorithm, and demonstrate how it's used.

When clustering with k-means, we start with a dataset we want to cluster, as seen here:

We choose the initial cluster centers. This is an important step, as badly chosen centers can lead to bad clusters, as shown in the following diagram:

The default options for the KMeans class in scikit-learn, however, helps you to avoid the problems associated with badly chosen starting-cluster centers. I won't go into the details of how this class does this. In this section, all I'm going to do is choose a random subset of the dataset to serve as the initial cluster points. This is not necessarily the best approach, and you probably shouldn't deviate from what the class is doing by default...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime