Cluster analysis is generally done in a series of steps. Here are things to consider in a typical cluster analysis:
- Objects to cluster: What are the objects? Typically, they should be representative of the cluster structure to be present. Also, they should be randomly sampled if generalization of a population is required.
- Variables to be used: The input variables are the basis on which clusters are formed. Popular clustering techniques assume that the variables are numeric in scale, although you might work with binary data or a mix of numeric and categorical data.
- Missing values: Typically, you begin with the flat file of objects in rows and variables in columns. In the presence of missing data, you might either delete the case or input the missing value, while special clustering methods might allow other handling of missing data.
- Scale the data...