Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Command Line Fundamentals

You're reading from   Command Line Fundamentals Learn to use the Unix command-line tools and Bash shell scripting

Arrow left icon
Product type Paperback
Published in Dec 2018
Publisher
ISBN-13 9781789807769
Length 314 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Vivek Nagarajan Vivek Nagarajan
Author Profile Icon Vivek Nagarajan
Vivek Nagarajan
Arrow right icon
View More author details
Toc

Shell Wildcards and Globbing

In the preceding exercises and activities, notice that we often perform the same operation on multiple files or folders. The point of a computer is to never have to manually instruct it to do something more than once. If we perform any repeated action using a computer, there is usually some way that it can be automated to reduce the drudgery. Hence, in the context of the shell too, we need an abstraction that lets us handle a bunch of files together. This abstraction is called a wildcard.

The term wildcard originates from card games where a certain card can substitute for whatever card the player wishes. When any command is sent to the shell, before it is executed, the shell performs an operation called wildcard expansion or globbing on each of the strings that make up the command line. The process of globbing replaces a wildcard expression with all file or pathnames that match it.

Note

This wildcard expansion is not performed on any quoted strings that are quoted with single or double quotes. Quoted arguments will be discussed in detail in a future chapter.

Wildcard Syntax and Semantics

A wildcard is any string that contains any of the following special characters:

  • A ? matches one occurrence of any character. For example, ?at matches cat, bat, and rat, and every other three letter string that ends with "at".
  • A * matches zero or more occurrences of any character. For example, image.* matches image.png, image.jpg, image.bmp.zip, and so on.
  • A ! followed by a pair of parentheses containing another wildcard expands to strings that do not match the contained expression.

    Note

    The exclamation operator is an "extended glob" syntax and may not be enabled by default on your system. To enable it, the following command needs to be executed: shopt -s extglob.

There are a few more advanced shell glob expressions, but we will restrict ourselves to these most commonly used ones for now.

Wildcard Expansion or Globbing

When the shell encounters a wildcard expression on the command line, it is internally expanded to all the files or pathnames that match it. This process is called globbing. Even though it looks as though one wildcard argument is present, the shell has converted that into multiple ones before the command runs.

Note that a wildcard can match paths across the whole filesystem:

  • * matches all the directories and files in the current directory
  • /* matches everything in the root directory
  • /*/* matches everything exactly two levels deep from the root directory
  • /home/*/.bashrc matches a file named .bashrc that is in every user's home directory

At this point, a warning is due: this powerful matching mechanism of wildcards can end up matching files that the user never intended if the wildcard was not specified correctly. Hence, you must exercise great care when running commands that use wildcards and modify or delete files. For safety, run echo with the glob expression to view what files it gets expanded to. Once we are sure that the wildcard is correct, we can run the actual command that affects the files.

Note

Since the shell expands wildcards as individual arguments, we can run into a situation where the number of arguments exceeds the limit that the system supports. We should be aware of this limitation when using wildcards.

Let's dive into an exercise and see how we can use wildcards.

Exercise 8: Using Wildcards

In this exercise, we will practice the use of wildcards for file management by creating folders and moving files with specific file formats to those folders.

Note

Some of the commands used in this exercise produce many screenfuls of output, so we only show them partially or not at all.

  1. Open the command line shell and navigate to the ~/Lesson1/data1 folder:
    robin ~ $ cd Lesson1/data1 

    There are over 11,000 files in this folder, all of which are empty dummy files, but their names come from a set of real-world files.

  2. Use a wildcard to list all the GIF files: *.gif matches every file that ends with .gif:
    robin ~/Lesson1/data1 $ ls *.gif

    The output is shown here:

    Figure 1.17: A screenshot of the output displaying a list of all GIF files within the folder
    Figure 1.17: A screenshot of the output displaying a list of all GIF files within the folder
  3. Create a new folder named gif, and use the wildcard representing all GIF files to move all of them into that folder:
    robin ~/Lesson1/data1 $ mkdir gif 
    robin ~/Lesson1/data1 $ mv *.gif gif 
  4. Verify that there are no GIF files left in the CWD:
    robin ~/Lesson1/data1 $ ls *.gif 
    ls: cannot access '*.gif': No such file or directory 
  5. Verify that all of the GIFs are in the gif folder:
    robin ~/Lesson1/data1 $ ls gif/ 

    The output is shown here:

    Figure 1.18: A screenshot of a partial output of the gif files within the folder
    Figure 1.18: A screenshot of a partial output of the gif files within the folder
  6. Make a new folder called jpeg and use multiple wildcard arguments with mv to move all JPEG files into that folder:
    robin ~/Lesson1/data1 $ mkdir jpeg 
    robin ~/Lesson1/data1 $ mv *.jpeg *.jpg jpeg 
  7. Verify with ls that no JPEG files remain in the CWD:
    robin ~/Lesson1/data1 $ ls *.jpeg *.jpg 
    ls: cannot access '*.jpeg': No such file or directory 
    ls: cannot access '*.jpg': No such file or directory
  8. List the jpeg folder to verify that all the JPEGs are in it:
    robin ~/Lesson1/data1 $ ls jpeg 

    The output is shown here:

    Figure 1.19: A screenshot of a partial output of the .jpeg files within the folder
    Figure 1.19: A screenshot of a partial output of the .jpeg files within the folder
  9. List all .so (shared object library) files that have only a single digit as the trailing version number:
    robin ~/Lesson1/data1 $ ls *.so.? 

    The output is shown here:

    Figure 1.20: A screenshot of a partial output of the .jpeg .so files ending with a dot, followed by a one-character version number
    Figure 1.20: A screenshot of a partial output of the .so files ending with a dot, followed by a one-character version number
  10. List all files that start with "google" and have an extension;
    robin ~/Lesson1/data1 $ ls google*.* 
    google_analytics.png  google_cloud_dataflow.png  google_drive.png  google_fusion_tables.png google_maps.png  google.png
  11. List all files that start with "a", have the third character "c", and have an extension:
    robin ~/Lesson1/data1 $ ls a?c*.* 
    archer.png  archive_entry.h  archive.h  archlinux.png  avcart.png
  12. List all of the files that do not have the .jpg extension:
    robin ~/Lesson1/data1 $ ls !(*.jpg) 

    The output is shown here:

    Figure 1.21: A screenshot of a partial output of the non-.jpeg files in the folder
    Figure 1.21: A screenshot of a partial output of the non-.jpeg files in the folder
  13. Before we conclude this exercise, get the sample data back to how it was before in preparation for the next activity. First, move the files within the jpeg and gif folders back to the current directory:
    robin ~/Lesson1/data1 $ mv gif/* .
    robin ~/Lesson1/data1 $ mv jpeg/* .

    Then, delete the empty folders:

    robin ~/Lesson1/data1 $ rm -r gif jpeg

Now, having learned the basic syntax, we can write wildcards to match almost any group of files and paths, so we rarely ever need to specify filenames individually.

Even in a GUI, it takes more effort than this to select groups of files in a file manager (for example, all .gifs) and this can be error-prone or frustrating when hundreds or thousands of files are involved.

Activity 4: Using Simple Wildcards

The supplied sample data in the Lesson1/data1 folder has about 11,000 empty files of various types. Use wildcards to copy each file to a directory representing its category, namely images, binaries, and misc., and count how many of each category exist. Through this activity, you will get familiar with using simple wildcards for file management. Follow these steps to complete this activity:

  1. Create the three directories representing the categories specified.
  2. Move all of the files with the extensions .jpg, .jpeg, .gif, and .png to the images folder.
  3. Move all of the files with the extensions .a, .so, and .so, followed by a period and a version number, into the binaries folder.
  4. Move the remaining files with any extension into the misc folder.
  5. Count the files in each folder using a shell command.

You should get the following answers: 3,674 images, 5,368 binaries, and 1,665 misc.

Note

The solution for this activity can be found on page 273.

Activity 5: Using Directory Wildcards

The supplied sample data inside the Lesson1/data folder has a taxonomy of tree species. Use wildcards to get the count of the following:

  1. The species whose family starts with the character p, and the genus has a as the second character.
  2. The species whose family starts with the character p, the genus has i as the second character, and species has u as the second character.
  3. The species whose family as well as genus starts with the character t.

This activity will help you get familiar with using simple wildcards that match directories.

Follow these steps to complete this activity:

  1. Navigate to the data folder.
  2. Use the tree command with a wildcard for each of the three conditions to get the count of species.

You should get the following answers: 83 species, 26 species, and 19 species.

Note

The solution for this activity can be found on page 273.

You have been reading a chapter from
Command Line Fundamentals
Published in: Dec 2018
Publisher:
ISBN-13: 9781789807769
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image