Explaining the SQL Server Big Data Clusters architecture and deployment
SQL Server Big Data Clusters (BDC) is a piece of technology – a combination of three distinct services – available in the latest release of SQL Server. The BDC combine SQL Server, Apache Spark, and the HDFS filesystem to store data. All three components run in the Kubernetes environment. These three components run side-by-side to provide you with the capability to process and analyze big data, as well as combine a relational workload with a big data workload.
The BDC heavily rely on numerous open source technologies, which are used together for deploying, maintaining, and monitoring the solution.
BDC deployment is based on a full installation of SQL Server 2019 running in a container based on a Linux OS image, orchestrated via the Kubernetes engine. You can use various Kubernetes environments, such as the following:
- Azure Kubernetes Service (AKS)
- Azure Red Hat OpenShift (ARO) ...