Generating a synthetic dataset with additional columns containing random values
In this recipe, we will generate a synthetic dataset using scikit-learn. This dataset will serve as a dummy dataset for the experiments in this chapter:
Just by looking at the preceding scatterplot, we can infer that we are generating a synthetic dataset for a binary classification problem. In addition to the primary predictor columns, a
and b
, that were generated by the make_blobs()
function of scikit-learn
, the dataset will include two columns, c
and d
, that contain random values that show us what the generated model explainability report looks like with these additional columns. This model explainability report will be generated in the Creating and monitoring a SageMaker Autopilot experiment in SageMaker Studio (console) recipe.
Tip
Since we will show the steps for how to generate a synthetic...