Beyond logistic regression
We have concentrated on logistic regression in this chapter, but MLlib offers many alternative algorithms that will capture non-linearity in the data more effectively. The consistency of the pipeline API makes it easy to try out different algorithms and see how they perform. The pipeline API offers decision trees, random forest and gradient boosted trees for classification, as well as a simple feed-forward neural network, which is still experimental. It offers lasso and ridge regression and decision trees for regression, as well as PCA for dimensionality reduction.
The lower level MLlib API also offers principal component analysis for dimensionality reduction, several clustering methods including k-means and latent Dirichlet allocation and recommender systems using alternating least squares.