We will use Boto3 and Python SDK and follow the same method of generating the parameters for datasources that we used in Chapter 7, Command Line and SDK, to do the Monte Carlo validation: we will generate features corresponding to power 2 of x to power P of x and run N Monte Carlo cross-validation. The pseudo-code is as follows:
for each power from 2 to P:
write sql that extracts power 1 to P from the nonlinear table
do N times
Create training and evaluation datasource
Create model
Evaluate model
Get evaluation result
Delete datasource and model
Average results
In this exercise, we will go from 2 to 5 powers of x and do 5 trials for each model. The Python code to create a datasource from Redshift using create_data_source_from_rds() is as follows:
response = client.create_data_source_from_redshift(
DataSourceId...