Integrating data with Synapse Spark pools
If you are a Spark developer and want to use Synapse Spark to wrangle and load your data into your dedicated SQL pools, this is quite an easy thing to accomplish.
JDBC was, and still is, the way to establish the connection and the exchange. There is one caveat regarding the use of JDBC; only interact with the dedicated SQL pools. It will only talk to the control node of your dedicated pool. This is a suboptimal way as both Spark, but also dedicated SQL pools, have a lot of parallelism to offer.
Microsoft adjusted the JDBC driver slightly to benefit from the parallel workers that are part of this game. The JDBC driver will establish a connection between the control node of the dedicated SQL pool and the driver node of the Spark cluster. The Spark engine will issue CETAS statements and send filters and projections over this channel. The data itself will otherwise be exchanged using PolyBase and the Data Lake storage that is attached to...