Polynomial regression in Amazon ML
We will use Boto3
and Python SDK and follow the same method of generating the parameters for datasources that we used in Chapter 7, Command Line and SDK, to do the Monte Carlo validation: we will generate features corresponding to power 2 of x to power P
of x
and run N
Monte Carlo cross-validation. The pseudo-code is as follows:
for each power from 2 to P: write sql that extracts power 1 to P from the nonlinear table do N times Create training and evaluation datasource Create model Evaluate model Get evaluation result Delete datasource and model Average results
In this exercise, we will go from 2 to 5 powers of x and do 5 trials for each model. The Python code to create a datasource from Redshift using create_data_source_from_rds()
is as follows:
response = client.create_data_source_from_redshift( DataSourceId='string', DataSourceName='string', DataSpec={ 'DatabaseInformation': { ...