Summary
In this chapter, you have explored how you can take advantage of Apache Spark's Thrift server to enable JDBC/ODBC connectivity and use Apache Spark as a distributed SQL engine. You learned how the HiveServer2 service allows external tools to connect to Apache Hive using JDBC/ODBC standards and how Spark Thrift Server extends HiveServer2 to enable similar functionality on Apache Spark clusters. Steps required for connecting SQL analysis tools such as SQL Workbench/J were presented in this chapter, along with detailed instructions required for connecting BI tools such as Tableau Online with Spark clusters. Finally, steps required for connecting arbitrary Python applications, either locally on your machine or on remote servers in the cloud or a data center, to Spark clusters using Pyodbc were also presented. In the following and final chapter of this book, we will explore the Lakehouse paradigm that can help organizations seamlessly cater to all three workloads of data analytics...