Time for the interpreter: the sha-bang

When the game gets tougher, a few concatenations on the command line cannot be enough to perform the tasks we are meant to accomplish. Too many bits on single lines are too messy, and we lack clarity, so better to store our commands or builtins in a file and have it executed.

When a script is executed, the system loader parses the first line looking for what is named the sha-bang or shebang, a sequence of characters.

#!

This will force the loader to treat the following characters as a path to the interpreter and its optional arguments to be used to further parse the script, which will then be passed as another argument to the interpreter itself. So, at the end, the interpreter will parse the script and, this time, we will ignore the sha-bang, since its first character is a hash, usually indicating a comment inside a script and comments do not get executed. To go a little further, the sha-bang is what we call a 2-bit magic number, a constant sequence of numbers or text values used in Unix to identify file or protocol types. So, 0x23 0x21 is actually the ASCII representation of #!.

So, let's make a little experiment and create a tiny one line script:

gzarrelli:~$ echo "echo \"This should go under the sha-bang\"" > test.sh

Just one line. Let's have a look:

gzarrelli:~$ cat test.sh 
echo "This should go under the sha-bang"

Nice, everything is as we expected. Has Linux something to say about our script? Let's ask:

gzarrelli:~$ file test.sh 
test.sh: ASCII text

Well, the file utility says that it is a plain file, and this is a simple text file indeed. Time for a nice trick:

gzarrelli:~$ sed -i '1s/^/#!\/bin\/sh\n/' test.sh

Nothing special; we just added a sha-bang pointing to /bin/sh:

gzarrelli:~$ cat test.sh 
#!/bin/sh
echo "This should go under the sha-bang"

As expected, the sha-bang is there at the beginning of our file:

gzarrelli:~$ file test.sh 
test.sh: POSIX shell script, ASCII text executable

No way, now it is a script! The file utility makes three different tests to identify the type of file it is dealing with. In order: file system tests, magic number tests, and language tests. In our case, it identified the magic numbers that represent the sha-bang, and thus a script, and this is what it told us: it is a script.

Now, a couple of final notes before moving on.

You can omit the sha-bang if your script is not using a shell builtins or shell internals
- Pay attention to /bin/sh, not everything that looks like an innocent executable is what it seems:

gzarrelli:~$ ls -lah /bin/sh
lrwxrwxrwx 1 root root 4 Nov  8  2014 /bin/sh -> dash

In some systems, /bin/sh is a symbolic link to a different kind of interpreter, and if you are using some internals or builtins of Bash, your script could have unwanted or unexpected outcomes.

Calling your script

Well, we have our two-line script; time to see if it really does what we want it to do:

gzarrelli:~$ ./test.sh
-bash: ./test.sh: Permission denied

No way! It is not executing, and from the error message, it seems related to the file permissions:

gzarrelli:~$ ls -lah test.sh 
-rw-r--r-- 1 gzarrelli gzarrelli 41 Jan 21 18:56 test.sh

Interesting. Let us recap what the file permissions are. As you can see, the line describing the properties of a file starts with a series of letters and lines.

Type	User	Group	Others
`-`	`rw-`	`r--`	`r--`

For type, we can have two main values, d - this is actually a directory, or - and means this is a regular file. Then, we can see what permissions are set for the user owning the file, for the group owning the file, and for all other users. As you may guess, r stands for permission to read; w stands for being able to write; x stands for permission to execute; and - means no right. These are all in the same order, first r, then w, then x. So wherever you see a - instead of an r, w, or x , it means that particular right is not granted.

The same works for directory permission, except that x means you can traverse the directory; r means that you can enumerate the content of it; w means that you can modify the attributes of the directory and removes the entries that are eventually in it.

Indicator	File type
`-`	Regular file
`b`	Block file (disk or partition)
`c`	Character file, like the terminal under /dev
`d`	Directory
`l`	Symbolic link
`p`	Named pipe (FIFO)
`s`	Socket

So, going back to our file, we do not see any execution bit set. Why? Here, a shell builtin can help us:

gzarrelli:~$ umask
0022

Does it make any sense to you? Well, it should, once we see how the permissions on files can be represented in numeric form. Think of permissions as bits of metadata pertaining to a file, one bit for each grant; no grant is 0:

r-- = 100
-w- = 010
--x = 001

Now, let's convert from binary to decimal:

Permission	Binary	Decimal
`r`	`100`	4
`w`	`010`	2
`x`	`001`	1

Now, just combine the decimal values to obtain the final permission, but remember that you have to calculate read, write, and execution grants in triplets - one set for the user owning the file, one for the group, and one for the others.

Back again to our file, we can change its permissions in a couple of ways. Let's say we want it to be readable, writable, and executable by the user; readable and writable by the group; and only readable by the others. We can use the command chmod to accomplish this goal:

chmod u+rwx filename
chmod g+wfilename

So, + or - add or subtract the permissions to the file or directory pointed and u, g, w to define which of the three sets of attributes we are referring to.

But we can speed things up using the numeric values:

User - rwx: 4+2+1 =7
Group - rw: 4+2 = 6
Other - r = 4

So, the following command should do the trick in one line:

chmod  764 test.sh

Time to verify:

gzarrelli:~$ ls -lah test.sh 
-rwxrw-r-- 1 gzarrelli gzarrelli 41 Jan 21 18:56 test.sh

Here we are. So we just need to see whether our user can execute the file, as the permissions granted suggest:

gzarrelli:~$ ./test.sh

This should go under the sha-bang.

Great, it works. Well, the script is not that complex, but served our purposes. But we left one question behind: Why was the file created with that set of permissions? As a preliminary explanation, I ran the command umask, and the result was 0022 but did not go further.

Count the digits in umask, and those in the numeric modes for chmod. Four against three. What does that leading digit means? We have to introduce some special permission modes that enable some interesting features:

Sticky bit. Think of it as a user right assertion on a file or directory. If a sticky bit is set on a directory, the files inside it can be deleted or renamed only by the file owner, the owner of the directory the file is in, or by root. Really useful in a shared directory to prevent one user from deleting or renaming some other user's file. The sticky bit is represented by the t letter at the end of the of the list of permissions or by the octal digit 1 at the beginning. Let's see how it works:

gzarrelli:~$ chmod +t test.sh
gzarrelli:~$ ls -lah test.sh
-rwxrw-r-T 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh

Interestingly, the t is capital, not lower, as we were talking about. Maybe this sequence of commands will make everything clearer:

gzarrelli:~$ chmod +t test.sh 
gzarrelli:~$ ls -lah test.sh 
-rwxrw-r-T 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh 
gzarrelli:~$ chmod o+x test.sh 
gzarrelli:~$ ls -lah test.sh 
-rwxrw-r-t 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh

You probably got it: the t attribute is a capital when, on the file or directory, the execution bix (x) is not set for the others (o).
And now, back to the origins:

gzarrelli:~$ chmod 0764 test.sh 
gzarrelli:~$ ls -lah test.sh 
-rwxrw-r-- 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh

We used the four-digit notations, and the leading 0 cleared out the 1 which referred to the sticky bit. Obviously, we could also use chmod -t to accomplish the same goal. One final note, if sticky bit and GUID are in conflicts, the sticky bit prevails in granting permissions.
- Set UID: The Set User ID (SUID upon execution) marks an executable, so that when it runs, it will do so as the file owner, with his privileges, and not as the user invoking it. Another tricky use is that, if assigned to a directory, all the files created or moved to that directory will have the ownership changed to the owner of the directory and not to the user actually performing the operation. Visually, it is represented by an s in the position of the user execution rights. The octal number referring to it is 4:

gzarrelli:~$ chmod u+s test.sh
gzarrelli:~$ ls -lah test.sh
-rwsrw-r-- 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh

- Set GID: The SGID (Set Group ID upon execution) marks an executable, so that when it is run, it does as the user invoking it was in the group that owns the file. If applied to a directory, every file created or moved to the directory will have the group set to the group owning the directory rather than the one the user performing the operation belongs to. Visually, it is represented by an s in the position of the group execution rights. The octal number referring to it is 2.
Let's reset the permissions on our test file:

gzarrelli:~$ chmod 0764 test.sh
gzarrelli:~$ ls -lah test.sh
-rwxrw-r-- 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh

Now we apply SGID using the octal digit referring to it:

gzarrelli:~$ chmod 2764 test.sh
gzarrelli:~$ ls -lah test.sh
-rwxrwSr-- 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh

In this example, the s is capital because we do not have the execution permission granted on the group; the same applies for SUID.

So, now we can go back again to our umask, and at this point you probably already know what is the meaning of the four-digit notation is. It is a command that modifies the permissions on a file creation, denying the permission bits. Taking our default creation mask for directory:

We can think of umask of 0022 as:

0777 -
0022
------ 
0755

Do not pay attention to the first 0; it is the sticky bit and simply subtracts from the default grant mask for a directory, rwx for user, group, and others, the value of the umask. The remaining value is the current permission mask for file creation. If you are not comfortable with the numeric notation, you can see the umask values in the familiar rwx notation using:

gzarrelli:~$ umask -S
u=rwx,g=rx,o=rx

For the files, the default mask is 666, so:

0666 -
0022
--------
0644

It is actually a tad more complicated than this, but this rule of thumb will let you calculate the masks quickly. Let us try to create a new umask. First, let's reset the umask value:

gzarrelli:~$ umask
0000
gzarrelli:~$ umask -S
u=rwx,g=rwx,o=rwx

As we can see, nothing gets subtracted:

zarrelli:~$ touch test-file
gzarrelli:~$ mkdir test-dir
gzarrelli:~$ ls -lah test-*
-rw-rw-rw- 1 gzarrelli gzarrelli    0 Jan 22 18:01 test-file

test-dir:
total 8.0K
drwxrwxrwx 2 gzarrelli gzarrelli 4.0K Jan 22 18:01 .
drwxr-xr-x 4 gzarrelli gzarrelli 4.0K Jan 22 18:01 ..

The test file has 666 access rights and the directory 777. This is really way too much:

zarrelli:~$ umask o-rwx,g-w
gzarrelli:~$ umask -S
u=rwx,g=rx,o=

gzarrelli:~$ touch 2-test-file
gzarrelli:~$ mkdir 2-test-dir
gzarrelli:~$ ls -lah 2-test-*
-rw-r----- 1 gzarrelli gzarrelli    0 Jan 22 18:03 2-test-file

2-test-dir:
total 8.0K
drwxr-x--- 2 gzarrelli gzarrelli 4.0K Jan 22 18:03 .
drwxr-xr-x 5 gzarrelli gzarrelli 4.0K Jan 22 18:03 ..

As you can see, the permissions are 750 for directories and 640 for files. A bit of math will help:

0777 -
0750
--------
0027

You would get the same result from the umask command:

gzarrelli:~$ umask
0027

All these settings last as long as you are logged in to the session, so if you want to make them permanent, just add the umask call with the appropriate argument to/etc/bash.bashrc, or /etc/profile for a system-wide effect or, for a single user mask, add it to the .bashrc file inside the user home directory.

Something went wrong, let's trace it

So, we have a new tiny script named disk.sh:

gzarrelli:~$ cat disk.sh
#!/bin/bash    
echo "The total disk allocation for this system is: "    
echo -e "\n"    
df -h    
echo -e "\n    
df -h | grep /$ | awk '{print "Space left on root partition: " $4}'

Nothing special, a shebang, a couple of echoes on a new line just to have some vertical spacing, the output of df -h and the same command but parsed by awk to give us a meaningful message. Let's run it:

zarrelli:~$ ./disk.sh

The total disk allocation for this system is:

Filesystem      Size  Used Avail Use% Mounted on
/dev/dm-0        19G   15G  3.0G  84% /
udev             10M     0   10M   0% /dev
tmpfs            99M  9.1M   90M  10% /run
tmpfs           248M   80K  248M   1% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           248M     0  248M   0% /sys/fs/cgroup
/dev/sda1       236M   33M  191M  15% /boot
tmpfs            50M   12K   50M   1% /run/user/1000
tmpfs            50M     0   50M   0% /run/user/0
Space left on root partition: 3.0G

Nothing too complicated, a bunch of easy commands, which in case of failure print an error message on the standard output. However, let's think for a moment that we have a more flexible script, more lines, some variable assignments, loops, and other constructs, and something goes wrong, but the output does not tell us anything. In this case, be handy to see a method that is actually running inside our script so that we can see the output of the commands, the variable assignments, and so forth. In Bash, this is possible; thanks to the set command associated with the -x argument, which shows all the commands and arguments in the script printed to the stdout, after the commands have been expanded and before they are actually invoked. The same behavior can be obtained running a subshell with the -x argument. Let's see what would happen if it was used with our script:

gzarrelli:~$ bash -x disk.sh
+ echo 'The total disk allocation for this system is: '
The total disk allocation for this system is:
+ echo -e '\n'    
+ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/dm-0        19G   15G  3.0G  84% /
udev             10M     0   10M   0% /dev
tmpfs            99M  9.1M   90M  10% /run
tmpfs           248M   80K  248M   1% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           248M     0  248M   0% /sys/fs/cgroup
/dev/sda1       236M   33M  191M  15% /boot
tmpfs            50M   12K   50M   1% /run/user/1000
tmpfs            50M     0   50M   0% /run/user/0
+ echo -e '\n'    
+ awk '{print "Space left on root partition: " $4}'
+ grep /dm-0
+ df -h
Space left on root partition: 3.0G

Now it is quite easy to understand how the stream of data flows inside the script: all the lines beginning with a + sign are commands, and the following lines are outputs.

Let's think for a moment that we have longer scripts; for most parts, we are sure that things work fine. For some lines, we are not completely sure of the outcome. Debugging everything would be noisy. In this case, we can use set-x to enable the logging only for those lines we need to inspect, turning it off with set+x when it is no longer needed. Time to modify the script, as follows:

#!/bin/bash  
set -x 
echo "The total disk allocation for this system is: "  
echo -e "\n"  
df -h  
echo -e "\n"  
set +x  
df -h | grep /dm-0 | awk '{print "Space left on root partition: " $4}'

And now, time to run it again, as follows:

gzarrelli:~$ ./disk.sh
+ echo 'The total disk allocation for this system is: '
The total disk allocation for this system is:
+ echo -e '\n'    
+ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/dm-0        19G   15G  3.0G  84% /
udev             10M     0   10M   0% /dev
tmpfs            99M  9.1M   90M  10% /run
tmpfs           248M   80K  248M   1% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           248M     0  248M   0% /sys/fs/cgroup
/dev/sda1       236M   33M  191M  15% /boot
tmpfs            50M   12K   50M   1% /run/user/1000
tmpfs            50M     0   50M   0% /run/user/0
+ echo -e '\n'    
+ set +x
Space left on root partition: 3.0G

As you can see, we see the instructions given in the block marked by set-x, and we also see the set+x instruction given, but then, after this, the line with awk disappears and we see only its output, filtering out what was not so interesting for us and leaving only the part we want to focus on.

This is not a powerful debugging system typical of more complex programming languages, but it can be really helpful in scripts of hundreds of lines where we can lose track of sophisticated structures, such as evaluations, cycles, or variable assignments, which make the scripts more expressive but even more difficult to get hold of and master. So, now that we are clear on how to debug a file, which permissions are needed to make it safely executable, and how to shell parse the command line, we are ready to spice things up looking at how we can use variables to add more flexibility to our hand-crafted tools.