Basic analysis of data with Spark SQL
Spark SQL is a spark module for structured data processing. Almost all the developers know SQL. Spark SQL provides an SQL interface to your Spark data (RDDs). Using Spark SQL you can fire SQL queries or SQL-like queries on your big data set and fetch data in objects called dataframes.
A dataframe is like a relational database table. It has columns in it and we can apply functions to these columns such as groupBy
, and so on. It is very easy to learn and use.
In the next section, we will cover a few examples on how we can use the dataframe and run regular analysis tasks.
Building SparkConf and context
This is just boilerplate code and is the entry point for the usage of our Spark SQL code. Every spark program will start with this boiler plate code for initialization. In this code we build the Spark configuration and then apply the configuration parameters (like application name and master location) and also build the SparkSession
object. This SparkSession...