Linear regression using SparkR
In the following example, we will illustrate how to use SparkR for machine learning. For this, we will use the same dataset of energy efficiency measurements that we used for linear regression in Chapter 5, Bayesian Regression Models:
>library(SparkR) >sc <- sparkR.init(master="local") >sqlContext <- sparkRSQL.init(sc) #Importing data >df <- read.csv("/Users/harikoduvely/Projects/Book/Data/ENB2012_data.csv",header = T) >#Excluding variable Y2,X6,X8 and removing records from 768 containing mainly null values >df <- df[1:768,c(1,2,3,4,5,7,9)] >#Converting to a Spark R Dataframe >dfsr <- createDataFrame(sqlContext,df) >model <- glm(Y1 ~ X1 + X2 + X3 + X4 + X5 + X7,data = dfsr,family = "gaussian") > summary(model)