Clustering countries by population
We will first understand this with one indicator that we are familiar with (population), and then make it interactive. We will cluster groups of countries based on their population.
Let's start with a possible practical situation. Imagine you were asked to group countries by population. You are supposed to have two groups of countries, of high and low populations. How do you do that? Where do you draw the line(s), and what does the total of the population have to be in order for it to qualify as "high"? Imagine that you were then asked to group countries into three or four groups based on their population. How would you update your clusters?
We can easily see how KMeans
clustering is ideal for that.
Let's now do the same exercise with KMeans
using one dimension, and then combine that with our knowledge of mapping, as follows:
- Import
pandas
and open thepoverty
dataset, like this:import pandas as pd poverty = pd...