Recall our regression problem from Chapter 3, Predicting House Value with Regression Algorithms, in which we constructed a linear regression to estimate the value of houses. At that point, we used a few arbitrary values for our hyperparameters.
In the following code block, we will show how Apache Spark can test 18 different hyperparameter combinations for elasticNetParam, regParam, and solver:
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml import Pipeline
linear = LinearRegression(featuresCol="features", labelCol="medv")
param_grid = ParamGridBuilder() \
.addGrid(linear.elasticNetParam, [0.01, 0.02, 0.05]) \
.addGrid(linear.solver, ['normal', 'l-bfgs']) \
.addGrid(linear.regParam, [0.4, 0.5, 0.6]).build()
pipeline = Pipeline(stages=[vector_assembler, linear])
crossval =...