Shell Wildcards and Globbing
In the preceding exercises and activities, notice that we often perform the same operation on multiple files or folders. The point of a computer is to never have to manually instruct it to do something more than once. If we perform any repeated action using a computer, there is usually some way that it can be automated to reduce the drudgery. Hence, in the context of the shell too, we need an abstraction that lets us handle a bunch of files together. This abstraction is called a wildcard.
The term wildcard originates from card games where a certain card can substitute for whatever card the player wishes. When any command is sent to the shell, before it is executed, the shell performs an operation called wildcard expansion or globbing on each of the strings that make up the command line. The process of globbing replaces a wildcard expression with all file or pathnames that match it.
Note
This wildcard expansion is not performed on any quoted strings that are quoted with single or double quotes. Quoted arguments will be discussed in detail in a future chapter.
Wildcard Syntax and Semantics
A wildcard is any string that contains any of the following special characters:
- A
?
matches one occurrence of any character. For example,?at
matches cat, bat, and rat, and every other three letter string that ends with "at". - A
*
matches zero or more occurrences of any character. For example,image.*
matches image.png, image.jpg, image.bmp.zip, and so on. - A
!
followed by a pair of parentheses containing another wildcard expands to strings that do not match the contained expression.Note
The exclamation operator is an "extended glob" syntax and may not be enabled by default on your system. To enable it, the following command needs to be executed:
shopt -s extglob
.
There are a few more advanced shell glob expressions, but we will restrict ourselves to these most commonly used ones for now.
Wildcard Expansion or Globbing
When the shell encounters a wildcard expression on the command line, it is internally expanded to all the files or pathnames that match it. This process is called globbing. Even though it looks as though one wildcard argument is present, the shell has converted that into multiple ones before the command runs.
Note that a wildcard can match paths across the whole filesystem:
*
matches all the directories and files in the current directory/*
matches everything in the root directory/*/*
matches everything exactly two levels deep from the root directory/home/*/.bashrc
matches a file named.bashrc
that is in every user's home directory
At this point, a warning is due: this powerful matching mechanism of wildcards can end up matching files that the user never intended if the wildcard was not specified correctly. Hence, you must exercise great care when running commands that use wildcards and modify or delete files. For safety, run echo
with the glob expression to view what files it gets expanded to. Once we are sure that the wildcard is correct, we can run the actual command that affects the files.
Note
Since the shell expands wildcards as individual arguments, we can run into a situation where the number of arguments exceeds the limit that the system supports. We should be aware of this limitation when using wildcards.
Let's dive into an exercise and see how we can use wildcards.
Exercise 8: Using Wildcards
In this exercise, we will practice the use of wildcards for file management by creating folders and moving files with specific file formats to those folders.
Note
Some of the commands used in this exercise produce many screenfuls of output, so we only show them partially or not at all.
- Open the command line shell and navigate to the
~/Lesson1/data1
folder:robin ~ $ cd Lesson1/data1
There are over 11,000 files in this folder, all of which are empty dummy files, but their names come from a set of real-world files.
- Use a wildcard to list all the GIF files:
*.gif
matches every file that ends with.gif
:robin ~/Lesson1/data1 $ ls *.gif
The output is shown here:
Figure 1.17: A screenshot of the output displaying a list of all GIF files within the folder
- Create a new folder named
gif
, and use the wildcard representing all GIF files to move all of them into that folder:robin ~/Lesson1/data1 $ mkdir gif robin ~/Lesson1/data1 $ mv *.gif gif
- Verify that there are no GIF files left in the CWD:
robin ~/Lesson1/data1 $ ls *.gif ls: cannot access '*.gif': No such file or directory
- Verify that all of the GIFs are in the
gif
folder:robin ~/Lesson1/data1 $ ls gif/
The output is shown here:
Figure 1.18: A screenshot of a partial output of the gif files within the folder
- Make a new folder called
jpeg
and use multiple wildcard arguments withmv
to move all JPEG files into that folder:robin ~/Lesson1/data1 $ mkdir jpeg robin ~/Lesson1/data1 $ mv *.jpeg *.jpg jpeg
- Verify with
ls
that no JPEG files remain in the CWD:robin ~/Lesson1/data1 $ ls *.jpeg *.jpg ls: cannot access '*.jpeg': No such file or directory ls: cannot access '*.jpg': No such file or directory
- List the
jpeg
folder to verify that all the JPEGs are in it:robin ~/Lesson1/data1 $ ls jpeg
The output is shown here:
Figure 1.19: A screenshot of a partial output of the .jpeg files within the folder
- List all
.so
(shared object library) files that have only a single digit as the trailing version number:robin ~/Lesson1/data1 $ ls *.so.?
The output is shown here:
Figure 1.20: A screenshot of a partial output of the .so files ending with a dot, followed by a one-character version number
- List all files that start with "google" and have an extension;
robin ~/Lesson1/data1 $ ls google*.* google_analytics.png google_cloud_dataflow.png google_drive.png google_fusion_tables.png google_maps.png google.png
- List all files that start with "a", have the third character "c", and have an extension:
robin ~/Lesson1/data1 $ ls a?c*.* archer.png archive_entry.h archive.h archlinux.png avcart.png
- List all of the files that do not have the
.jpg
extension:robin ~/Lesson1/data1 $ ls !(*.jpg)
The output is shown here:
Figure 1.21: A screenshot of a partial output of the non-.jpeg files in the folder
- Before we conclude this exercise, get the sample data back to how it was before in preparation for the next activity. First, move the files within the
jpeg
andgif
folders back to the current directory:robin ~/Lesson1/data1 $ mv gif/* . robin ~/Lesson1/data1 $ mv jpeg/* .
Then, delete the empty folders:
robin ~/Lesson1/data1 $ rm -r gif jpeg
Now, having learned the basic syntax, we can write wildcards to match almost any group of files and paths, so we rarely ever need to specify filenames individually.
Even in a GUI, it takes more effort than this to select groups of files in a file manager (for example, all .gifs) and this can be error-prone or frustrating when hundreds or thousands of files are involved.
Activity 4: Using Simple Wildcards
The supplied sample data in the Lesson1/data1
folder has about 11,000 empty files of various types. Use wildcards to copy each file to a directory representing its category, namely images, binaries, and misc., and count how many of each category exist. Through this activity, you will get familiar with using simple wildcards for file management. Follow these steps to complete this activity:
- Create the three directories representing the categories specified.
- Move all of the files with the extensions
.jpg
,.jpeg
,.gif
, and.png
to theimages
folder. - Move all of the files with the extensions
.a
,.so
, and.so
, followed by a period and a version number, into thebinaries
folder. - Move the remaining files with any extension into the
misc
folder. - Count the files in each folder using a shell command.
You should get the following answers: 3,674 images, 5,368 binaries, and 1,665 misc.
Note
The solution for this activity can be found on page 273.
Activity 5: Using Directory Wildcards
The supplied sample data inside the Lesson1/data
folder has a taxonomy of tree species. Use wildcards to get the count of the following:
- The species whose family starts with the character
p
, and the genus hasa
as the second character. - The species whose family starts with the character
p
, the genus hasi
as the second character, and species hasu
as the second character. - The species whose family as well as genus starts with the character
t
.
This activity will help you get familiar with using simple wildcards that match directories.
Follow these steps to complete this activity:
- Navigate to the
data
folder. - Use the
tree
command with a wildcard for each of the three conditions to get the count of species.
You should get the following answers: 83 species, 26 species, and 19 species.
Note
The solution for this activity can be found on page 273.