Building the model
Before we get into the heart of using Amazon SageMaker to develop the ML model, we have a little more data engineering to consider. SageMaker contains a good number of built-in algorithms and several pre-trained models – one of which we will use in the example. The Random Cut Forest (RCF) algorithm is an unsupervised learning algorithm that detects anomalies in data points from within a set – that is, data points that diverge from a well-structured data series.
RCF is a good algorithm for looking at time series data and determining spikes in data, or possibly some latency or spikes in a dataset due to production or seasonal issues. Because our current raw data is pretty well structured, assuming the value from our simulator is constant or within slight variations, RCF can analyze this data and determine when data points are outside the given target.
A note about architecture and data science
Data science is a growing and complex field. I...