Performing cluster analysis with the built-in KMeans algorithm
In this recipe, we will demonstrate how to use the KMeans algorithm to perform cluster analysis with the synthetic dataset. Cluster analysis involves identifying subgroups of records within the dataset that exhibit similar properties. This helps solve different problems and requirements related to market segmentation, fraud detection, and document analysis.
Getting ready
This recipe continues from Generating a synthetic dataset for analysis and transformation.
How to do it…
The next set of steps focus on using the unlabeled dataset we generated in the Generating a synthetic dataset for analysis and transformation recipe to prepare the KMeans model we will use for cluster analysis:
- Navigate to the
my-experiments/chapter04
directory inside your SageMaker notebook instance. Feel free to create this directory if it does not exist yet. - Create a new notebook using the
conda_python3
kernel inside...