Querying with SQL
Let's run the same queries, except this time, we will do so using SQL queries against the same DataFrame. Recall that this DataFrame is accessible because we executed the .createOrReplaceTempView
method for swimmers
.
Number of rows
The following is the code snippet to get the number of rows within your DataFrame using SQL:
spark.sql("select count(1) from swimmers").show()
The output is as follows:
Running filter statements using the where Clauses
To run a filter statement using SQL, you can use the where
clause, as noted in the following code snippet:
# Get the id, age where age = 22 in SQL spark.sql("select id, age from swimmers where age = 22").show()
The output of this query is to choose only the id
and age
columns where age
= 22
:
As with the DataFrame API querying, if we want to get back the name of the swimmers who have an eye color that begins with the letter b
only, we can use the like
syntax as well:
spark.sql( "select name, eyeColor from swimmers where eyeColor like 'b%...