You're reading from Cloud Scale Analytics with Azure Data Services Build modern data warehouses on Microsoft Azure

Product type Paperback

Published in Jul 2021

Publisher Packt

ISBN-13 9781800562936

Length 520 pages

Edition 1st Edition

Tools

Azure

Concepts

Data Streaming

Author (1):

Patrik Borosch

View More author details

Table of Contents (20) Chapters

Preface

1. Section 1: Data Warehousing and Considerations Regarding Cloud Computing

2. Chapter 1: Balancing the Benefits of Data Lakes Over Data Warehouses FREE CHAPTER

3. Chapter 2: Connecting Requirements and Technology

4. Section 2: The Storage Layer

5. Chapter 3: Understanding the Data Lake Storage Layer

6. Chapter 4: Understanding Synapse SQL Pools and SQL Options

7. Section 3: Cloud-Scale Data Integration and Data Transformation

8. Chapter 5: Integrating Data into Your Modern Data Warehouse

9. Chapter 6: Using Synapse Spark Pools

10. Chapter 7: Using Databricks Spark Clusters

11. Chapter 8: Streaming Data into Your MDWH

12. Chapter 9: Integrating Azure Cognitive Services and Machine Learning

13. Chapter 10: Loading the Presentation Layer

14. Section 4: Data Presentation, Dashboarding, and Distribution

15. Chapter 11: Developing and Maintaining the Presentation Layer

16. Chapter 12: Distributing Data

17. Chapter 13: Introducing Industry Data Models

18. Chapter 14: Establishing Data Governance

19. Other Books You May Enjoy

Integrating data with Synapse Spark pools

If you are a Spark developer and want to use Synapse Spark to wrangle and load your data into your dedicated SQL pools, this is quite an easy thing to accomplish.

JDBC was, and still is, the way to establish the connection and the exchange. There is one caveat regarding the use of JDBC; only interact with the dedicated SQL pools. It will only talk to the control node of your dedicated pool. This is a suboptimal way as both Spark, but also dedicated SQL pools, have a lot of parallelism to offer.

Microsoft adjusted the JDBC driver slightly to benefit from the parallel workers that are part of this game. The JDBC driver will establish a connection between the control node of the dedicated SQL pool and the driver node of the Spark cluster. The Spark engine will issue CETAS statements and send filters and projections over this channel. The data itself will otherwise be exchanged using PolyBase and the Data Lake storage that is attached to...

The rest of the chapter is locked

You're reading from Cloud Scale Analytics with Azure Data Services Build modern data warehouses on Microsoft Azure

Table of Contents (20) Chapters

Integrating data with Synapse Spark pools

Authors (1)

Other recommended products

Personalised recommendations for you

You're reading from Cloud Scale Analytics with Azure Data Services Build modern data warehouses on Microsoft Azure

Table of Contents (20) Chapters

Integrating data with Synapse Spark pools

Unlock this book and the full library FREE for 7 days

Authors (1)

Other recommended products

Personalised recommendations for you