Achieving parallelism in data loading using PolyBase
PolyBase is the one of best data loading methods when it comes to performance after the COPY Into
command. So, it's recommended to use PolyBase where it's supported or possible when it comes to parallelism.
How it achieves parallelism is similar to what a parallel data warehouse system does. There will be a control node, followed by multiple compute nodes, which are then linked to multiple data nodes. The data nodes will consist of the actual data, which will get the process by the compute nodes in parallel, and all these operations and instructions are governed by the control node.
In the following architecture diagrams, each Hadoop Distributed File System (HDFS) bridge of the service from every compute node can connect to an external resource, such as Azure Blob storage, and then bidirectionally transfer data between a SQL data warehouse and the external resource. This is fully scalable and highly robust: as you...