The data transformation logic that is used to process data for model training is the same as the logic that's used to prepare data for obtaining inferences. It is redundant to repeat the same logic twice.
The goal of this chapter is to walk you through how SageMaker and other AWS services can be employed to create machine learning (ML) pipelines that can process big data, train algorithms, deploy trained models, and run inferences, all while using the same data processing logic for model training and inference.
In this chapter, we will cover the following topics:
- Understanding the architecture of the inference pipeline in SageMaker
- Creating features using Amazon Glue and SparkML
- Identifying topics by training NTM in SageMaker
- Running online as opposed to batch inference in SageMaker
Let's look at the technical requirements...