Working with a SQL Server Big Data Clusters workload
While working with BDC, you can combine two types of data — data stored in relational databases that's hosted by SQL Server and data stored in HDFS that's hosted by data nodes.
The BDC team has provided a sample script that will load data into your BDC deployment, both for your SQL Server workload and HDFS. You can use this script to populate your environment with usable sample data for experiments.
One of the possible approaches to this is to directly query the data stored in the Data node with the external table approach, as shown in the Using PolyBase to access external data section. The major difference here is that the external data source can be hosted on the Storage pool. To configure such a data source, use the following query:
CREATE EXTERNAL DATA SOURCE SqlStoragePool WITH (LOCATION = 'sqlhdfs://controller-svc/default')
Considering we have a CSV
file stored...