In order to gain further insights into our dataset's structure and relationships, we will use the t-SNE approach, with ensembles of size 20 and base k-Nearest Neighbors (k-NN) clusterers with a K value of 10. First, we create and train the cluster. Then, we add the cluster assignments to the DataFrame as an additional pandas column. We then calculate the means for each cluster and create a bar plot for each feature:
# DATA LOADING SECTION START #
# Use the 2017 data and fill any NaNs
recents = data[data.Year == 2017]
recents = recents.dropna(axis=1, how="all")
recents = recents.fillna(recents.median())
# Use only these specific features
columns = ['Log GDP per capita',
'Social support', 'Healthy life expectancy at birth',
'Freedom to make life choices', 'Generosity',
'Perceptions of corruption&apos...