Chain of trust
An important part of collaborative efforts during the development of a project is ensuring the quality of its code. This includes protection against the accidental corruption of the repository, and also from malicious intent—a task that the version control system can help with. Git needs to ensure trust in the repository contents: both your own and other developers’ (especially trust in the canonical repository of the project).
Content-addressed storage
In Chapter 4, Exploring Project History, in the SHA-1 and the shortened SHA-1 identifier section, we learned that Git currently uses SHA-1 hashes as a native identifier of commit objects (which represent revisions of the project and form the project’s history). This mechanism makes it possible to generate commit identifiers in a distributed way, taking a cryptographic hash of the commit object. This hash is then used to link to the previous commit (to the parent commit or commits).
Moreover, all other data stored in the repository (including the file contents in the revision represented by the blob objects, and the file hierarchy represented by the tree objects) also use the same mechanism. All types of object are addressed by their contents, or to be more accurate, the hash function of the object. You can say that the base of a Git repository is the content-addressed object database.
Thus, Git provides a built-in trust chain through secure SHA-1 hashes, via a kind of a hash tree, also known as a Merkle tree. In one dimension, the SHA-1 hash of a commit depends on its contents, which includes the SHA-1 hash of the parent commit or commits, which depends on the contents of the parent commit, and so forth down to the initial root commit. In the other dimension, the content of a commit object includes the SHA-1 hash of the tree representing the top directory of a project, which in turn depends on its contents, and these contents include the SHA-1 hash of the subdirectory trees and blobs of file contents, and so forth down to the individual files.
Figure 6.6 – Hash tree of a short history of a project, with a tag, two commits, and their contents. The SHA-1 hashes, shown in shortened form, depending on their contents
All of this allows SHA-1 hashes to be used to verify whether objects obtained from a (potentially untrusted) source have been corrupted or modified since they were created.
Lightweight, annotated, and signed tags
The trust chain allows us to verify the contents but does not verify the identity of the person who created the content (the author and committer name are fully configurable and under user control). This is the task for GPG/PGP signatures: signed tags, signed commits, and signed merges.
Since Git version 2.34, you can also use SSH keys for signing by setting the gpg.format
configuration variable to the value ssh
, for example with git config gpg.format ssh
(you may also need to use your public key as the configuration value for the user.signingKey
configuration variable).
Lightweight tags
Git uses two types of tags: lightweight tags and annotated tags (there are also signed tags, which are a special case of annotated tags).
A lightweight tag is very much like a branch that doesn’t change – it’s just a pointer (reference) to a specific commit in the graph of revisions, though in the refs/tags/
namespace rather than in refs/heads/
.
Annotated tags
Annotated tags, however, involve tag objects. Here the tag reference (in refs/tags/
namespace) points to a tag object, which in turn points to a commit. Tag objects contain a creation date, the tagger identity (name and e-mail), and a tagging message. You create an annotated tag with git tag -a
(or --annotate
). If you don’t specify a message for an annotated tag on the command line (for example, with -m "<message>"
), Git will launch your editor so you can enter it.
You can view the tag data along with the tagged commit with the git show
command as follows (commit skipped):
$ git show v0.2 tag v0.2 Tagger: Joe R Hacker <joe@company.com> Date: Sun Jun 1 03:10:07 2014 -0700 random v0.2 commit 5d2584867fe4e94ab7d211a206bc0bc3804d37a9
Signed tags
Signed tags are annotated tags with a clear text PGP signature (or, with modern Git, an SSH signature) of the tag data attached. You can create them with git tag -s
(which uses your committer identity to select the signing key, or user.signingKey
if set), or with git tag -u <key-id>
; both versions assume that you have a private GPG key (created, for example, with gpg --gen-key
).
Lightweight tags versus annotated and signed tags
Annotated or signed tags are meant for marking a release, while lightweight tags are meant for private or temporary revision labels. For this reason, some Git commands (such as git describe
) will ignore lightweight tags by default.
Of course in collaborative workflows, it is important that the signed tag is made public, and that there is a way to verify it; both of those operations will be described in the following sections.
Publishing tags
Git does not push tags by default: you need to do it explicitly. One solution is to individually push a tag with git push <remote> tag <tag-name>
(here, tag <tag>
is equivalent to the longer refspec (describing how refs on the remote translate to refs in the local repository), namely refs/tags/<tag>:refs/tags/<tag>
); however, if you don’t have the naming conflict between a branch and a tag (i.e., you don’t have branch and tag with the same name), then you can skip the word tag
here in this operation.
Another solution is to push tags en masse: either all the tags—both lightweight and annotated—with the use of the --tags
option, or just all annotated tags that point to pushed commits with --follow-tags
. This explicitness allows you to re-tag (using git tag -f
) with impunity if it turns out that you tagged the wrong commit, or there is a need for a last-minute fix—but only if the tag was not made public. Git does not (and should not) change tags behind the user’s back; thus, if you pushed the wrong tag, you need to ask others to delete this old tag to change it.
When fetching changes, Git automatically follows tags, downloading annotated tags that point to fetched commits. This means that downstream developers will automatically get signed tags, and will be able to verify releases.
Tag verification
To verify a signed tag, you use git tag --verify <tag-name>
(or -v
for short). You need the signer’s public GPG key in your keyring for this (imported using gpg --import
or gpg --keyserver <key-server> --recv-key <key-id>
), and of course the tagger’s key needs to be vetted in your chain of trust. For SSH keys there is no web of trust; you need to specify the trusted public keys with the gpg.ssh.allowedSignersFile
configuration variable.
$ git tag --verify v0.2 object 1085f3360e148e4b290ea1477143e25cae995fdd type commit tag signed tagger Joe Random 1411122206 +0200 project v0.2 gpg: Signature made Fri Jul 19 12:23:33 2014 CEST using RSA key ID A0218851 gpg: Good signature from "Joe Random <jrandom@example.com>"
Signed commits
Signed tags are a good solution for users and developers to verify that the tagged release was created by the maintainer. But how do we make sure that a commit purporting to be by somebody named Jane Doe, with the jane@company.com
e-mail, is actually a commit from her? How can we make it so anybody can check this?
One possible solution is to sign individual commits. You can do this with git commit
--gpg-sign[=<keyid>]
(or -S
for short). The key identifier is optional—without this, Git would use your identity as the author. Note that -S
(capital S) is different from -s
(small s); the latter adds a Signed-off-by line at the end of the commit message for the Digital Certificate of Ownership:
$ git commit -a --gpg-sign You need a passphrase to unlock the secret key for user: "Jane Doe <jane@company.com>" 2048-bit RSA key, ID A0218851, created 2014-03-19 [master 1085f33] README: eol at eof 1 file changed, 1 insertion(+), 1 deletion(-)
To make commits available for verification, just push them. Anyone can then verify them with the --show-signature
option to git log
(or git show
), or with one of the %Gx
placeholders in git
log --format=<format>
:
$ git log -1 --show-signature commit 1085f3360e148e4b290ea1477143e25cae995fdd gpg: Signature made Wed Mar 19 11:53:49 2014 CEST using RSA key ID A0218851 gpg: Good signature from "Jane Doe <jane@company.com> Author: Jane Doe <jane@company.com> Date: Wed Mar 19 11:53:48 2014 +0200 README: eol at eof
You can also use the git verify-commit
command for this.
Merging signed tags (merge tags)
The signed commit mechanism, described in the previous section, may be useful in some workflows, but it is inconvenient in an environment where you push commits out early, and only after a while do you decide whether they are worth including in the upstream. In such cases, you would want to sign only those parts that are ready to be published.
This situation can happen if you follow the recommendations in Chapter 10, Keeping History Clean; you know only after the fact (long after the commit was created) that the given iteration of the commit series passes code review. Commits need to be signed at commit creation time, but you can create a signed tag after the fact, after the series of commits gets accepted.
You can deal with this issue by rewriting the whole commit series after its shape is finalized (after passing the review), signing each rewritten commit, or just by amending and signing only the top commit. Both of those solutions would require a forced push to replace the old history where commits were not signed. You can always sign every commit, or you can create an empty commit (with --allow-empty
), sign it, and push it on top of the series. But there is a better solution: requesting the pull of a signed tag.
In this workflow, you work on your changes and, when they are ready, you create and push a signed tag (tagging the last commit in the series). You don’t have to push your working branch—pushing the tag is enough. If the workflow involves sending a pull request to the integrator, you create it using a signed tag instead of the end commit:
$ git tag -s 1253-for-maintainer $ git request-pull origin/master public-repo \ 1253-for-maintainer >msg.txt
The signed tag message is shown between the dashed lines in the pull request, which means that you may want to explain your work in the tag message when creating the signed tag. The maintainer, after receiving such a pull request, can copy the repository line from it, fetching and integrating the named tag. When recording the merge result of pulling the named tag, Git will open an editor and ask for a commit message. The integrator will see a template starting with the following:
Merge tag '1252-for-maintainer' Work on task tsk-1252 # gpg: Signature made Wed Mar 19 12:23:33 2014 CEST using RSA key ID A0218851 # gpg: Good signature from "Jane Doe <jane@company.com>"
This commit template includes the commented-out output of the verification of the signed tag object being merged (so it won’t be in the final merge commit message). The tag message helps describe the merge better.
The signed tag being pulled is not stored in the integrator’s repository, not as a tag object. Its content is stored, hidden, in a merge commit. This is done to avoid polluting the tag namespace with a large number of such working tags. The developer can safely delete the tag (git push public-repo --delete 1252-for-maintainer
) after it gets integrated.
Recording the signature inside the merge commit allows for after-the-fact verification with the --
show-signature
option:
$ git log -1 --show-signature commit 0507c804e0e297cd163481d4cb20f3f48ceb87cb merged tag '1252-for-maintainer' gpg: Signature made Wed Mar 19 12:23:33 2014 CEST using RSA key ID A0218851 gpg: Good signature from "Jane Doe <jane@company.com>" Merge: 5d25848 1085f33 Author: Jane Doe <jane@company.com> Date: Wed Mar 19 12:25:08 2014 +0200 Merge tag 'for-maintainer' Work on task tsk-1252