Preprocessing the data
The very first step in preprocessing for clustering analysis is to be clear about which data objects will be clustered, and that is clear here: counties. So, at the end of the data preprocessing, we will need to have a dataset whose rows are counties, and with columns based on how we want to group the counties. As shown in the following screenshot, which is a summary of the data preprocessing that we will perform during this chapter, we will get to county_df
, which has the characteristics that were just described.
As shown in the preceding summarizing screenshot, we will first transform election_df
into partisan_df
, and then integrate the partisan_df
, edu_df
, pov_df
, pop_df
, and employ_df
DataFrames. Of course, there will be more detail to all of these steps than the preceding screenshot shows; however, this serves as a great summary and a general map for our understanding.
Let...