SQL Server 2019 big data cluster components
With an understanding of big data, its processing, architecture, and technologies, and the ability for SQL Server to use these technologies, it's time to put it all together.
All of the components for SQL Server 2019 big data clusters run on a Kubernetes cluster, which you've already learned about. Various nodes on the cluster provide the capabilities you need to run your data workloads, from SQL Server itself, to control systems, storage, Spark, and HDFS. This increases the number of external data sources that you can query, scales out your query processing, provides push-down compute to Parquet
and CSV
files, and even allows you to mount other storage.
Once again, using a diagram to understand how each part fits together is useful:
Figure 9.5: SQL Server 2019 big data cluster components
Note
You can specify the number of Kubernetes pods/nodes for many of these components, so the preceding...