Generating a synthetic dataset for analysis and transformation
In this recipe, we will generate a synthetic dataset that will be used in the next four recipes involving dimensionality reduction, cluster analysis, and conversion to protobuf recordIO
format. We will generate one labeled version of the dataset and one unlabeled version of the dataset. This dataset will have two easily identifiable clusters, as shown in Figure 4.32. It will also have six columns for the labeled version of the dataset and five columns for the unlabeled version of the dataset.
After we have completed this recipe, we should have a synthetic dataset similar to what is shown in Figure 4.32. In the Performing dimensionality reduction with the built-in PCA algorithm recipe, we will use the PCA algorithm to perform dimensionality reduction with this synthetic dataset. In the Performing cluster analysis with the built-in KMeans algorithm recipe, we...