Transforming data with Amazon SageMaker Data Wrangler
Collecting and labeling data samples is only the first step in preparing a dataset. Indeed, it's very likely that you'll have to pre-process your dataset in order to do the following, for example:
- Convert it to the input format expected by the machine learning algorithm you're using.
- Rescale or normalize numerical features.
- Engineer higher-level features, for example, one-hot encoding.
- Clean and tokenize text for natural language processing applications
In the early stage of a machine learning project, it's not always obvious which transformations are required, or which ones are most efficient. Thus, practioners often need to experiment with lots of different combinations, transforming data in many different ways, training models, and evaluating results.
In this section, we're going to learn about Amazon SageMaker Data Wrangler, a graphical interface integrated in SageMaker...