You're reading from Cloud Scale Analytics with Azure Data Services Build modern data warehouses on Microsoft Azure

Product type Paperback

Published in Jul 2021

Publisher Packt

ISBN-13 9781800562936

Length 520 pages

Edition 1st Edition

Tools

Azure

Concepts

Data Streaming

Author (1):

Patrik Borosch

View More author details

Table of Contents (20) Chapters

Preface

1. Section 1: Data Warehousing and Considerations Regarding Cloud Computing

2. Chapter 1: Balancing the Benefits of Data Lakes Over Data Warehouses FREE CHAPTER

3. Chapter 2: Connecting Requirements and Technology

4. Section 2: The Storage Layer

5. Chapter 3: Understanding the Data Lake Storage Layer

6. Chapter 4: Understanding Synapse SQL Pools and SQL Options

7. Section 3: Cloud-Scale Data Integration and Data Transformation

8. Chapter 5: Integrating Data into Your Modern Data Warehouse

9. Chapter 6: Using Synapse Spark Pools

10. Chapter 7: Using Databricks Spark Clusters

11. Chapter 8: Streaming Data into Your MDWH

12. Chapter 9: Integrating Azure Cognitive Services and Machine Learning

13. Chapter 10: Loading the Presentation Layer

14. Section 4: Data Presentation, Dashboarding, and Distribution

15. Chapter 11: Developing and Maintaining the Presentation Layer

16. Chapter 12: Distributing Data

17. Chapter 13: Introducing Industry Data Models

18. Chapter 14: Establishing Data Governance

19. Other Books You May Enjoy

Understanding the loading strategy with Synapse-dedicated SQL pools

The different options that you have available for the table design of a dedicated SQL pool, distributed or replicated tables, and the decision regarding the use of column stores or heaps and partitioning on top will influence the way in which you load data into it.

Certainly, loading into a hash-distributed table can be quite a quick process. But when you consider the additional compute step to calculate the hash keys to distribute the incoming rows to their target distribution and compare it to a round-robin-distributed table, where this step is not required, you can imagine that loading data into the latter will be faster.

Another consideration for a staging table in a dedicated SQL pool would be to use heap tables instead of column store ones. Again, you can avoid additional compute overhead for the column store and load data quickly.

In the end, it all comes down to performance. Therefore, following...