Optimizing analytics with dedicated SQL pool and working on data distribution
In this section, we will understand the in-depth details of dedicated SQL pool for optimizing analytics on a larger dataset. We need to understand the basics of column storage; know when to use round robin, hash distribution, and replicated data distributions; know when to partition a table and check for data skew and space usage; know the best practices and how to effectively use workload management for dedicated SQL pool.
Understanding columnstore storage details
A columnar store is logically organized as a table with rows and columns. It is physically stored in a column-wise data format. Generally, a rowgroup (group of rows) is compressed in columnar store format. A rowgroup consists of the maximum number of rows per rowgroup. The columnstore index slices the table into rowgroups and then compresses the rowgroups column-wise.
A clustered columnstore index is the primary storage for the entire...