Mastering Bash is the art of taking advantage of your environment to make the best out of it. It is not just a matter of dealing with boring routine tasks that can be automated. It is crafting your working space so that it becomes more efficient for your goals. Thus, even though Bash scripting is not as expressive as other more complex languages, such as Python or JavaScript, it is simple enough to be grabbed in a short time, and so flexible that it will suffice for most of your everyday tasks, even the trickiest ones.
But is Bash so plain and easy? Let's have a look at our first lines in Bash. Let's begin with something easy:
gzarrelli:~$ time echo $0
/bin/bash
real 0m0.000s
user 0m0.000s
sys 0m0.000s
gzarrelli:~$
Now, let us do it again in a slightly different way:
gzarrelli:~$ time /bin/echo $0/bin/bash
real 0m0.001s
user 0m0.000s
sys 0m0.000s
What is interesting here is that the value of real is slightly different between the two commands. OK, but why? Let's dig a bit further with the following commands:
gzarrelli:~$ type echo
echo is a shell builtin
gzarrelli:~$ type /bin/echo
/bin/echo is /bin/echo
Interestingly enough, the first seems to be a shell builtin, the second simply a system program, an external utility, and it is here that lies the difference. builtin is a command that is built into the shell, the opposite of a system program, which is invoked by the shell. An internal command, the opposite to an external command.
To understand the difference between internal and external shell commands that lead to such different timing, we have to understand how an external program is invoked by the shell. When an external program is to be executed, Bash creates a copy of itself with the same environment of the parent shell, giving birth to a new process with a different process ID number. So to speak, we just saw how forking is carried out. Inside the new address space, a system exec is called to load the new process data.
For the builtin commands, it is a different story, Bash executes them without any forks, and this leads to a couple of the following interesting outcomes:
- The builtin execution is faster because there are no copies and no executables invoked. One side note is that this advantage is more evident with short-running programs because the overhead is before any executable is called: once the external program is invoked, the difference in the pure execution time between the builtin command and the program is negligible.
- Being internal to Bash, the builtin commands can affect its internal state, and this is not possible with the external program. Let's take into account a classic example using builtincd. If cd were an external program, once invoked from shell as:
cd /this_dir
- The first operation would be our shell forking a process for cd, and this latter would change the current directory for its own process, not for the one we are inside and that was forked to give birth to the cd process. The parent shell would remain unaffected. So, we would not go anywhere.
Curious about which bulitins are available? You have some options, to either execute the following builtin:
compgen -b
Or this other builtin:
enable -a | awk '{ print $2 }'
To better understand why there is a difference between the execution of a builtin and an external program, we must see what happens when we invoke a command.
- First, remember that the shell works from left to right and takes all the variable assignments and redirections and saves them in order to process later.
- If nothing else is left, the shell takes the first word from the command line as the name of the command itself, while all the rest is considered as its arguments.
- The next step is dealing with the required input and output redirection.
- Finally, before being assigned to a variable, all the text following the sign = is subject to tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal.
- If no command name comes out as a result of the last operation, the variable can then affect the environment. If an assignment fails, an error is raised and the command invoked exits with a non-zero status.
- If no command name is the outcome of the operation seen before, all the redirections are applied, but differently from variables, they do not affect the current environment. Again, if any error occurs, there is a non-zero status exit.
Once the preceding operations are performed, the command is then executed and exited with a status, depending on whether one or more expansions contain command substitutions. The overall exit status will be the one from the last command substitution, and if no command substitution were performed, the exit status will be zero.
At this point, we are finally left with a command name and some optional arguments. It is at this point the roads of builtins and external programs divert.
- At first, the shell looks at the command name, and if there are no slashes, it searches for its location
- If there are no slashes, the shell tries to see if there is a function with that name and executes it
- If no functions are found, the shell tries to hit builtin, and if there is anyone with that name, it is executed
OK, now if there is any builtin, it already got invoked. What about an external program?
- Our Bash goes on, and if it finds no builtins by that name on the command line, there are three chances:
-
- The full path of the command to execute is already contained into its internal hash table, which is a structure used to speed up the search
- If the full path is not in the hash, the shell looks for it into the content of the environmental PATH variable, and if it finds it, it is added to the hash table
- The full path is not available in the PATH variable, so the shell returns with an exit status of 127
Hash can even be invoked as follows:
gzarrelli:~$ hash
hits command
1 /usr/bin/which
1 /usr/bin/ld
24 /bin/sh
1 /bin/ps
1 /usr/bin/who
1 /usr/bin/man
1 /bin/ls
1 /usr/bin/top
The second column will then tell you not only which commands have been hashed, but also how many times each of them has been executed during the current session (hits).
Let's say that the search found the full path to the command we want to execute; now we have a full path, and we are in the same situation as if the Bash found one or more slashes into the command name. In either case, the shell thinks that it has a good path to invoke a command and executes the latter in a forked environment.
This is when we are lucky, but it can happen that the file invoked is not an executable, and in this case, given that our path does not point to a directory instead of a file, the Bash makes an educated guess and thinks to run a shell script. In this case, the script is executed in a subshell that is at all a new environment, which inherits the content of the hash table of the parent shell.
Before doing anything else, the shell looks at the first line of the script for an optional sha-bang (we will see later what this is) - after the sha-bang, there is the path to the interpreter used to manage the script and some optional arguments.
At this point, and only at this point, your external command, if it is a script, is executed. If it is an executable, it is invoked a bit before, but way after any builtin.
During these first paragraphs, we saw some commands and concepts that should sound familiar to you. The next paragraphs of this chapter will quickly deal with some basic elements of Bash, such as variables, expansions, and redirections. If you already know them, you will be able to use the next pages as a reference while working on your scripts. If, on the contrary, you are not so familiar with them, have a look at what comes next because all you will read will be fundamental in understanding what you can do in and with the shell.