Pipelines
PowerShell expressions involving cmdlets and functions can be connected together using pipelines. Pipelines are not a new concept, and have been in DOS for a long time (and in Unix/Linux forever). The idea of a pipeline is similar to a conveyor belt in a factory. Materials on a conveyor belt move from one station to the next as workers or machinery work on the materials to connect, construct, or somehow modify the work in progress. In the same way, pipelines allow data to move from one command to the next, as the output of one command is treated as the input for the next. There is no practical limit to the number of commands that can be connected this way, but readability does keep command lines from continuing forever. It can be tempting to string more and more expressions together to create a single-line solution, but troubleshooting a pipeline evaluation can be tricky.
Tip
When working with long pipeline constructions, consider breaking the line into several expressions to make the execution clearer.
Prior to PowerShell, pipelines dealt with output and input in terms of text, passing strings from one program to the next regardless of what kind of information was being processed. PowerShell makes a major change to this paradigm by treating all input and output as objects. By doing this, PowerShell cmdlets are able to work with the properties, methods, and events that are exposed by the data rather than simply dealing with the string representation. The PowerShell community often refers to the methods used by string-based pipelines as parse-and-pray, which is named after the twin operations of string parsing based on an understanding of the text format and hoping that the format of the output doesn't ever change. An example, shown in the following screenshot, illustrates this quite well:
It's easy to think of the output of the MS-DOS dir
command as a sequence of files and folders, but if the output is carefully studied, something different becomes clear. There is a tremendous amount of other information provided:
- Volume information
- Volume serial number
- A directory-level caption
- A list of files and folders
- A count of files
- The total size of those files
- The number of directories
- The space free on the drive
To work with this output and deal with, for instance, the file names, there's a tremendous amount of work that would need to be done to analyze the formatting of all of these elements. Also, there are several different formatting parameters that can be used with the MS-DOS dir
command that would affect the output. By passing data between cmdlets as objects, all of this work is eliminated. The PowerShell Get-ChildItem
cmdlet, which is similar to the MS-DOS dir
command, outputs a sequence of file and directory objects if the current location is a filesystem location.
How pipelines change the game
To see how the choice of an object-oriented pipeline changes the way work is done, it is sufficient to look at the MS-DOS dir
command. I am picking on the dir
command because it has a simple function and everyone in IT has some level of experience with it. If you wanted to sort the output of a dir
command, you would need to know what the parameters built into the command are. To do that, you'd do something like this:
It's clear that the designer of the command had sorting in mind, because there is a /O
option with five different ways to sort (ten if you include reverse). That is helpful, but files have a lot more than five properties. What if you wanted to sort by more than one property? Ignoring those questions for a moment, does the collection of sorting options for this command help you at all if you were trying to sort the output of a different command (say an ATTRIB
or SET
command)? You might hope that the same developer wrote the code for the second command, or that they used the same specifications, but you would be disappointed. Even the simple operation of sorting output is either not implemented or implemented differently by MS-DOS commands.
PowerShell takes an entirely different approach. If you were to look at the help for Get-ChildItem
, you would find no provision for sorting at all. In fact, PowerShell cmdlets do not use parameters to supply sorting information. Instead, they use the object-oriented pipeline. MS-DOS developers needed to encode the sort parameters for the dir
command inside the dir
command itself is because that is the only place that the properties exist (including sorting criteria). Once the command has been executed, all that is left is text, and sorting text based on properties is a complex parse-and-pray operation (which we have already discussed). In PowerShell, however, the output of Get-ChildItem
is a sequence of objects, so cmdlets downstream can still access the properties of the objects directly. Sorting in PowerShell is accomplished with the Sort-Object
cmdlet, which is able to take a list of properties (among other things) on which to sort the sequence of objects that it receives as input. The following are some examples of sorting a directory listing in MS-DOS and also in PowerShell:
Sorting method |
DOS command |
PowerShell equivalent |
---|---|---|
Sort by filename |
|
|
Sort by extension |
|
|
Sort by size |
|
|
Sort by write date |
|
|
Sort by creation date |
|
|
Sort by name and size |
|
|
It can be clearly seen by these examples that:
- PowerShell examples are longer
- PowerShell examples are easier to read (at least the sorting options)
- PowerShell techniques are more flexible
The most important thing about learning how to sort directory entries using Sort-Object
is that sorting any kind of objects is done the exact same way. For instance, if you retrieved a list of applied hotfixes on the current computer using Get-hotfix
, in order to sort it by HotFixID, you would issue the Get-Hotfix | Sort-Object –Property HotFixID
command:
Another point to note about sorting objects by referring to properties is that the sorting is done according to the type of the property. For instance, sorting objects by a numeric property would order the objects by the magnitude of the property values, not by the string representation of those values. That is, a value of 10 would sort after 9, not between 1 and 2. This is just one more thing that you don't have to worry about.
What's the fuss about sorting?
You might be asking, why is sorting such a big deal? You'd be correct; sorting is not necessarily a tremendously important concept. The point is, the method that the designers of PowerShell took with the pipeline (that is, using objects rather than strings) that allows this sorting method also allows other powerful operations such as filtering, aggregating, summarizing, and narrowing.
Filtering is the operation of selecting which (entire) objects in the pipeline will continue in the pipeline. Think of filtering like a worker who is inspecting objects on the conveyor belt, picking up objects that are bad and throwing them away (in the bit bucket). In the same way, objects that do not match the filter criteria are discarded and do not continue as output. Filtering in PowerShell is done via the Where-Object
cmdlet and takes two forms. The first form is somewhat complicated to look at, and requires some explaining. We will start with an example such as the following:
Get-ChildItem | Where-object {$_.Size –lt 100}
Hopefully, even without an explanation, it is clear that the output would be a list of files that have a size less than 100. This form of the Where-Object
cmdlet takes a piece of code as a parameter (called a scriptblock), which is evaluated for each object in the pipeline. If the script evaluates to true when the object in the pipeline is assigned to the special variable $_
, the object will continue on the pipeline. If it evaluates to false, the object is discarded.
PowerShell 3.0 made a couple of changes to the Where-Object
cmdlet. First, it added an easier-to-read option for the $_
variable, called $PSItem
. Using that, the previous command can be rewritten as follows:
Get-ChildItem | Where-object {$PSItem.Size –lt 100}
This is slightly more readable, but Version 3.0 also added a second form that simplifies it even more. If the script block is referring to a single property, a single operator, and a constant value, the simplified syntax can be used, shown as follows:
Where-Object Property Operator Value
Note that there are no braces indicating a scriptblock, and no $_ or $PSItem
. The simplified syntax for our sample filter command is this:
Get-ChildItem | Where-Object Size –lt 100