You're reading from Cloud Scale Analytics with Azure Data Services Build modern data warehouses on Microsoft Azure

Product type Paperback

Published in Jul 2021

Publisher Packt

ISBN-13 9781800562936

Length 520 pages

Edition 1st Edition

Tools

Azure

Concepts

Data Streaming

Author (1):

Patrik Borosch

View More author details

Table of Contents (20) Chapters

Preface

1. Section 1: Data Warehousing and Considerations Regarding Cloud Computing

2. Chapter 1: Balancing the Benefits of Data Lakes Over Data Warehouses FREE CHAPTER

3. Chapter 2: Connecting Requirements and Technology

4. Section 2: The Storage Layer

5. Chapter 3: Understanding the Data Lake Storage Layer

6. Chapter 4: Understanding Synapse SQL Pools and SQL Options

7. Section 3: Cloud-Scale Data Integration and Data Transformation

8. Chapter 5: Integrating Data into Your Modern Data Warehouse

9. Chapter 6: Using Synapse Spark Pools

10. Chapter 7: Using Databricks Spark Clusters

11. Chapter 8: Streaming Data into Your MDWH

12. Chapter 9: Integrating Azure Cognitive Services and Machine Learning

13. Chapter 10: Loading the Presentation Layer

14. Section 4: Data Presentation, Dashboarding, and Distribution

15. Chapter 11: Developing and Maintaining the Presentation Layer

16. Chapter 12: Distributing Data

17. Chapter 13: Introducing Industry Data Models

18. Chapter 14: Establishing Data Governance

19. Other Books You May Enjoy

Loading data

With all the parallel options that the database can offer to you, you want to use them when you load data to your database, too. Remember the purpose of the control and the compute nodes? When loading data to your database, you want to use a technique that makes use of the compute nodes as much as possible.

Using the COPY statement

The COPY statement will support you in doing so. It will talk directly to the compute nodes and will therefore use the whole parallelism that the database can offer. It comes as part of the T-SQL dialect of the Synapse Analytics database and offers many options to influence the loading of data to the database.

When you talk to the control node, in contrast to the capability of the COPY statement, you will create a bottleneck during your load. The load would be single-threaded instead and all the rows that need to be written to the database would first flow through the control node and would then be spread to the distributions using...