Executing SQL commands and SQL-style functions on a DataFrame
Alright, open up the sparksql.py
file that's included in the download files for this book. Let's take a look at it as a real-world example of using SparkSQL in Spark 2.0. You should see the following code in your editor:
Notice that we're importing a few things here. We're importing the SparkSession
object and the Row
object. The SparkSession
object is basically Spark 2.0's way of creating a context to work with SparkSQL. We'll also import collections here:
from pyspark.sql import SparkSession from pyspark.sql import Row import collections
Earlier, we used to create sparkContext
objects, but now, we'll create a SparkSession
object:
# Create a SparkSession (Note, the config section is only for Windows!) spark = SparkSession.builder.config("spark.sql.warehouse.dir", "file:///C:/temp").appName("SparkSQL").getOrCreate()
So what we're doing here is creating something called spark
that's going to be a SparkSession
object. We'll use...