Applying transformation
You can easily apply data transformation using SageMaker Data Wrangler because there are numerous built-in transformations you can use out of the box without any coding. So far, we have observed the following from the analyses that we need to handle next in order to build up an ML dataset:
- Missing data in some features.
- The
Churn?
column is now in string format withTrue.
andFalse.
as values. - Redundant
CustomerID_*
columns after joins. - Features that are not providing predictive power, including but not limited to
Phone
,VMail Plan
, andInt'l Plan
.
We also would like to perform the following transformations for ML purposes because we want to train an XGBoost model to predict the Churn?
status afterwards.
- Encoding categorical variables, that is,
State
andArea Code
features.
Let's get started:
- In the Data Flow tab, click on the plus sign next to the 2nd Join node, and select Add transform. You should...