K-means is a method of clustering data. The problem is posed as this—given a dataset of N items, we wish to partition the data into K groups. How do you do so?
Allow me to take a side bar and explore the wonderful world of coordinates. No, no, don't run! It's very visual.
Which line is longer? How do you know?
You know which line is longer because you can measure each line from points a, b, c, and d. Now, let's try something different:
Which dot is closest to X? How do you know?
You know because again, you can measure the distance between the dots. And now, for our final exercise:
Consider the distance between the following:
- A and X
- A and Y
- A and Z
- B and X
- B and Y
- B and Z
- C and X
- C and Y
- C and Z
What is the average distance between A and X, B and X, and C and X? What is the average distance between A and Y, B and Y and C and Y? What is the...