Implementing distributions
A dedicated SQL pool massively parallel processing (MPP) engine splits the data into 60 parallel partitions and executes them in parallel. Each of these smaller partitions, along with the compute resources to run the queries, is called a distribution. A distribution is a basic unit of processing and storage for a dedicated SQL pool.
Dedicated SQL pools provide three options for distribution. Let's look at each of them in detail.
Hash distribution
This type of distribution distributes the data based on a hash function. Rows with the same values for the hashed column will always move to the same partition. This can be implemented by providing the DISTRIBUTION = HASH (COLUMN_ID)
value in the WITH
clause of CREATE TABLE
. Here is an example:
CREATE TABLE dbo.TripTable ( [tripId] INT NOT NULL, [driverId] INT NOT NULL, [customerID] INT NOT NULL, [tripDate...