Creating a parallel scoring pipeline
Standard ML pipelines work just fine for the majority of ML use cases, but when you need to score a large amount of data at once, you need a more powerful solution. That's where ParallelRunStep
comes in. ParallelRunStep
is Azure's answer to scoring big data in batch. When you use ParallelRunStep
, you leverage all of the cores on your compute cluster simultaneously.
Say you have a compute cluster consisting of eight Standard_DS3_v2
virtual machines. Each Standard_DS3_v2
node has four cores, so you can perform 32 parallel scoring processes at once. This parallelization essentially lets you score data many times faster than if you used a single machine. Furthermore, it can easily scale vertically (increasing the size of each virtual machine in the cluster) and horizontally (increasing the node count).
This section will allow you to become a big data scientist who can score large batches of data. Here, you will again be using simulated...