Standalone programs
So far, we have been using Spark SQL and DataFrames through the Spark shell. To use it in standalone programs, you will need to create it explicitly, from a Spark context:
val conf = new SparkConf().setAppName("applicationName") val sc = new SparkContext(conf) val sqlContext = new org.apache.spark.sql.SQLContext(sc)
Additionally, importing the implicits
object nested in sqlContext
allows the conversions of RDDs to DataFrames:
import sqlContext.implicits._
We will use DataFrames extensively in the next chapter to manipulate data to get it ready for use with MLlib.