Unique counts and Snowflake
Estimating unique counts for rows in distributed systems is a compute-intensive process. Snowflake uses a distributed algorithm that is state of the art. This technique is different from those used in other databases. It is faster but, at the same time, approximate. The recipe shall show you how to use the uniqueness functions that are available in Snowflake.
Getting ready
Note that this recipe's steps can be run either in the Snowflake WebUI or the SnowSQL command-line client. An extra small warehouse will be used in the recipe.
How to do it…
As part of this recipe, we shall explore the different count functions/capabilities of Snowflake. We shall start with the typical count and distinct count functions and how different combinations can yield different results. Then, we shall explore the HyperLogLog algorithm implementation in Snowflake, which can efficiently approximate count
over groups
. This is recommended for use cases where...