Forecasting the income levels of census respondents
In this recipe, we will show you how to solve a classification problem with MLlib by building two models: the ubiquitous logistic regression and a slightly more sophisticated model, the SVM ( Support Vector Machine).
Getting ready
To execute this recipe, you need to have a working Spark environment. You would have already gone through the Creating an RDD for training recipe where we created training and testing datasets for estimating classification models.
No other prerequisites are required.
How to do it...
Just like with the linear regression, building a logistic regression starts with creating a LogisticRegressionWithSGD
object:
import pyspark.mllib.classification as cl income_model_lr = cl.LogisticRegressionWithSGD.train(final_data_income_train)
How it works...
As with the LinearRegressionWithSGD
model, the only required parameter is the RDD with labeled points. Also, you can specify the same set of parameters:
- The number of iterations; the...