Clustering model
Since you've already learned how to perform prediction and classification tasks in data analytics, in this chapter, you will learn about clustering analysis. In clustering, we strive to meaningfully group the data objects in a dataset. We will learn about clustering analysis through an example.
Clustering example using a two-dimensional dataset
In this example, we will use WH Report_preprocessed.csv
to cluster the countries based on two scores called Life_Ladder
and Perceptions_of_corruption
in 2019.
The following code reads the data into report_df
and uses Boolean masking to preprocess the dataset into report2019_df
, which only includes the data of 2019
:
report_df = pd.read_csv('WH Report_preprocessed.csv') BM = report_df.year == 2019 report2019_df = report_df[BM]
The result of the preceding code is that we have a DataFrame, reprot1019_df
, that only includes the data of 2019
, as requested by the prompt.
Since we only have two dimensions...