Using SQL to interact with DataFrames
In the previous recipe, we learned how to create or replace temporary views.
In this recipe, we will learn how to play with the data within a DataFrame using SQL queries.
Getting ready
To execute this recipe, you need to have a working Spark 2.3 environment. You should have gone through the Specifying the schema programmatically recipe, as we will be using the sample_data_schema
DataFrame we created there.
There are no other requirements.
How to do it...
In this example, we will extend our original data with the form factor for each model of Apple's computer:
models_df = sc.parallelize([ ('MacBook Pro', 'Laptop') , ('MacBook', 'Laptop') , ('MacBook Air', 'Laptop') , ('iMac', 'Desktop') ]).toDF(['Model', 'FormFactor']) models_df.createOrReplaceTempView('models') sample_data_schema.createOrReplaceTempView('sample_data_view') spark.sql(''' SELECT a.* , b.FormFactor FROM sample_data_view AS a LEFT JOIN models AS b ...