Using SparkR for machine learning
SparkR supports a growing list of learning algorithms, such as Generalized Linear Model (glm), Naive Bayes Model, K-Means Model, Logistic Regression Model, Latent Dirichlet Allocation (LDA) Model, Multilayer Perceptron Classification Model, Gradient Boosted Tree Model for Regression Classification, Random Forest Model for Regression and Classification, Alternating Least Squares (ALS) matrix factorization Model, and so on.
SparkR uses Spark MLlib to the model. The summary predict functions are used to print a summary of the fitted model and make predictions on new data, respectively. The write.ml
/read.ml
operations can be used to save/load the fitted models. SparkR also supports a subset of the available R formula operators for model fitting, such as ~
, .
, :
, +
, and -
.
For the following examples, we use a quality Dataset available at https://archive.ics.uci.edu/ml/Datasets/Wine+Quality:
> library(magrittr) > csvPath <- "file:///Users/aurobindosarkar...