Architecture of Spark SQL
Spark SQL is a library on top of the Spark core execution engine, as shown in Figure 4.2. It exposes SQL interfaces using JDBC/ODBC for Data Warehousing applications or through a command-line console for interactively executing queries. So, any Business Intelligence (BI) tools can connect to Spark SQL to perform analytics at memory speeds. It also exposes a Dataset API and DataFrame API, which are supported in Java, Scala, Python, and R. Spark SQL users can use the Data Source API to read and write data from and to a variety of sources to create a DataFrame or a Dataset. Figure 4.2 also indicates the traditional way of creating and operating on RDDs from programming languages to the Spark core engine.
Spark SQL also extends the Dataset API, DataFrame API, and Data Sources API to be used across all other Spark libraries such as SparkR, Spark Streaming, Structured Streaming, Machine Learning Libraries, and GraphX as shown in Figure...