K-means is a method of clustering data. The problem is posed as this—given a dataset of N items, we wish to partition the data into K groups. How do you do so?
Allow me to take a side bar and explore the wonderful world of coordinates. No, no, don't run! It's very visual.
data:image/s3,"s3://crabby-images/c5167/c51672b2c06b4e078b297ec49522061cd0927d4e" alt=""
Which line is longer? How do you know?
data:image/s3,"s3://crabby-images/46aad/46aad4d2544daa5bc6acc06bb644a744d4c01d9b" alt=""
You know which line is longer because you can measure each line from points a, b, c, and d. Now, let's try something different:
data:image/s3,"s3://crabby-images/a8261/a82616bc22f48ba88df870e7b3a03f5b8ea2d0ba" alt=""
Which dot is closest to X? How do you know?
You know because again, you can measure the distance between the dots. And now, for our final exercise:
data:image/s3,"s3://crabby-images/b7273/b727308f4a912b4e3c5e6d2ba1088c7491700ace" alt=""
Consider the distance between the following:
- A and X
- A and Y
- A and Z
- B and X
- B and Y
- B and Z
- C and X
- C and Y
- C and Z
What is the average distance between A and X, B and X, and C and X? What is the average distance between A and Y, B and Y and C and Y? What is the...