Apache Spark as a distributed SQL engine
One common application of SQL has been its use with BI and SQL analysis tools. These SQL-based tools connect to a relational database management system (RDBMS) using a JDBC or ODBC connection and traditional RDBMS JDBC/ODBC connectivity built in. In the previous chapters, you have seen that Spark SQL can be used using notebooks and intermixed with PySpark, Scala, Java, or R applications. However, Apache Spark can also double up as a powerful and fast distributed SQL engine using a JDBC/OCBC connection or via the command line.
Note
JDBC is a SQL-based application programming interface (API) used by Java applications to connect to an RDBMS. Similarly, ODBC is a SQL-based API created by Microsoft to provide RDBMS access to Windows-based applications. A JDBC/ODBC driver is a client-side software component either developed by the RDBMS vendor themselves or by a third party that can be used with external tools to connect to an RDBMS via the...