Exploring parallel processing with ForEach-Object
Situations often arise where you want to run many commands in parallel. For example, you might have a list of computer names, and for each of those computers, you want to run a script on that computer. You might wish to verify the status and resource usage of various services on each computer. In this scenario, you might use Get-Content
to get an array of computer names, and then use either ForEach
or ForEach-Object
to run the script on the computer. If there are 10 computers and the script takes 10 minutes, the total runtime is over 100 minutes.
With Windows PowerShell, the only built-in methods of running scripts in parallel were using background jobs or using workflows. With background jobs, you could create a set of jobs, each of which starts a script on a single computer. In that case, PowerShell runs each job in a separate process, which provides isolation between each job but is resource-intensive. The Windows PowerShell team added workflows with Windows PowerShell V4, which also allow you to run script blocks in parallel. However, workflows are not carried forward into PowerShell 7. Like other features no longer available in PowerShell 7, you can continue to use Windows PowerShell to run workflows and gradually convert them as and when appropriate.
An alternative to background jobs is to use the ThreadJob
module you can download from the PowerShell gallery. For more details on this module, see its repository page at https://github.com/PaulHigin/PSThreadJob.
With PowerShell 7, the PowerShell team added an option to the ForEach-Object
command to allow you to run script blocks in parallel. This option simplifies running script blocks or scripts, especially long-running ones, in parallel and avoids the need for third-party modules or having to deal with the complexity of workflows.
This recipe demonstrates running operations in parallel traditionally, using background jobs, and using ForEach-Object -Parallel
.
Getting ready
You run this recipe on SRV1
after you have installed PowerShell 7 and, optionally, VS Code.
How to do it...
- Simulating a long-running script block
$SB1 = { 1..3 | ForEach-Object { "In iteration $_" Start-Sleep -Seconds 5 } } Invoke-Command -ScriptBlock $SB1
- Timing the expression
Measure-Command -Expression $SB1
- Refactoring into code that uses jobs
$SB2 = { 1..3 | ForEach-Object { Start-Job -ScriptBlock {param($X) "Iteration $X " ; Start-Sleep -Seconds 5} -ArgumentList $_ } Get-Job | Wait-Job | Receive-Job -Keep }
- Invoking the script block
Invoke-Command -ScriptBlock $SB2
- Removing any old jobs and timing the script block
Get-Job | Remove-Job Measure-Command -Expression $SB2
- Defining a script block using
ForEach-Object -Parallel
$SB3 = { 1..3 | ForEach-Object -Parallel { "In iteration $_" Start-Sleep -Seconds 5 } }
- Executing the script block
Invoke-Command -ScriptBlock $SB3
- Measuring the script block execution time
Measure-Command -Expression $SB3
- Creating and running two short script blocks
$SB4 = { 1..3 | ForEach-Object { "In iteration $_" } } Invoke-Command -ScriptBlock $SB4 $SB5 = { 1..3 | ForEach-Object -Parallel { "In iteration $_" } } Invoke-Command -ScriptBlock $SB5
- Measuring the execution time for both script blocks
Measure-Command -Expression $SB4 Measure-Command -Expression $SB5
How it works...
In step 1, you create and then invoke a script block. The script block simulates how you can run several long script blocks traditionally using the ForEach-Object
cmdlet, with output like this:
Figure 2.16: Simulating a long-running script block
In step 2, you determine how long it takes PowerShell to run this script block, with output like this:
Figure 2.17: Timing the expression
In step 3, you refactor the $SB1
script block to use PowerShell background jobs. The script block runs the simulated long-running task using jobs and then waits for and displays the output from each job. The concept is that instead of doing each iteration serially, all the jobs run in parallel. Defining the function creates no output.
In step 4, you invoke the script block to view the results, which looks like this:
Figure 2.18: Invoking the script block
In step 5, you remove any existing jobs and then re-run the updated script block. This step enables you to determine the runtime for the entire expression. The output of this step looks like this:
Figure 2.19: Removing any existing jobs and timing the script block
In step 6, you create another script block that uses the PowerShell 7 ForEach-Object -Parallel
construct. When you define this script block, PowerShell creates no output.
In step 7, you run the script block, which looks like this:
Figure 2.20: Executing the script block created in step 6
In step 8, you time the execution of the script block, making use of the ForEach-Object -Parallel
feature, which looks like this:
Figure 2.21: Timing the script block execution
In step 9, you define and then invoke two script blocks, which looks like this:
Figure 2.22: Creating and running two short script blocks
In the final step in this recipe, step 10, you measure the execution time of these two script blocks, which looks like this:
Figure 2.23: Measuring execution time of the script blocks created in step 9
There's more...
In steps 1 and 2, you invoke a long-running task multiple times. As you can see from Figure 2.17, running these script blocks, one at a time, takes just over 15 seconds. In step 5, you see that by refactoring the long-running task into PowerShell background jobs, you reduce the runtime to 6.83 seconds. Finally, in step 8, you measure the elapsed runtime when you use ForEach-Object -Parallel
, which is now a little over 5 seconds.
As this recipe shows, if you have independent script blocks, you can run them in parallel to reduce the overall runtime, in this case, from just over 15 seconds to just over 5. And the gains would have been even higher had you run the loop more than three times. Running the loop serially 10 times would have taken over 50 seconds, compared to just over 5 for ForEach-Object -Parallel
.
However, there is a default limit of five script blocks that PowerShell can run simultaneously. You can use the -ThrottleLimit
parameter to allow more or less than that default. One thing to note: if you attempt to run more parallel script blocks than you have processor cores, PowerShell just uses a processor core queue. This all takes time and would end up raising the overall runtime. The good news is that PowerShell handles all this, so if you run, say, 1,000 parallel script blocks on a system with 12 processor cores, PowerShell works as fast as your host computer allows.
It is also worth remembering that there is some overhead involved in ForEach-Object -Parallel
. Under the hood, the command has to set up and then manage separate threads of execution. If the script block is very short, you can find that the overhead involved results in slower runtimes. In this case, the runtime went from 2.9 ms to 83.7 ms. The critical point here is that this construct is useful for non-trivial script blocks (or scripts) that you run in parallel. You benefit up to the number of cores you have available.
Another thing to note is that when you use the ForEach-Object {script}
syntax, you are using a positional parameter (-Process
). On the other hand, when you use the -Parallel
parameter, the parameter value is the script block you wish PowerShell to run in parallel.