Using column-level encryption
A user can encrypt data in a database so that it is not visible to the hosting provider. In general, this means that the data cannot then be used for searching or indexing unless you use homomorphic encryption.
The strictest form of encryption would be client-side encryption so that all the database knows about is a blob of data, which would then normally be stored in a bytea
database column but could be others.
Data can also be encrypted server-side before it is returned to the user, using the pgcrypto
contrib
package, provided as an extension with PostgreSQL.
Getting ready
Make sure you (and/or your database server) are in a country where encryption is legal—in some countries, it is either banned completely or a license is required.
In order to create and manage Pretty Good Privacy (PGP) keys, you also need the well-known GnuPG
command-line utility, which is available on practically all distributions.
pgcrypto
is part of the contrib
collection. Starting from version 10, on Debian and Ubuntu, it is part of the main postgresql-NN
server package.
Install it on the database in which you want to use it, following the Adding an external module to PostgreSQL recipe from Chapter 3, Server Configuration.
You also need to have PGP keys set up:
pguser@laptop:~$ gpg --gen-key
Answer some questions here (the defaults are OK unless you are an expert), select the key type as DSA and Elgamal
, and enter an empty password.
Now, export the keys:
pguser@laptop:~$ gpg -a --export "PostgreSQL User (test key for PG Cookbook) <pguser@somewhere.net>" > public.key
pguser@laptop:~$ gpg -a --export-secret-keys "PostgreSQL User (test key for PG Cookbook) <pguser@somewhere.net>" > secret.key
Make sure only you and the postgres
database user have access to the secret key:
pguser@laptop:~$ sudo chgrp postgres secret.key
pguser@laptop:~$ chmod 440 secret.key
pguser@laptop:~$ ls -l *.key
-rw-r--r-- 1 pguser pguser 1718 2016-03-26 13:53 public.key
-r--r----- 1 pguser postgres 1818 2016-03-26 13:54 secret.key
Last but not least, make a copy of the public and the secret key; if you lose them, you’ll lose the ability to encrypt/decrypt.
How to do it…
To ensure that secret keys are never visible in database logs, you should write a wrapper function to retrieve the keys from the file. You need to create a SECURITY DEFINER
function as a user who has the pg_read_server_files
privilege or role assigned to them. For convenience, below is an example that illustrates how to write a function to read a key from a secret file:
CREATE OR REPLACE FUNCTION get_my_public_key() RETURNS TEXT
SECURITY DEFINER
LANGUAGE SQL
AS
$function_body$
SELECT pg_read_file ('/home/pguser/public.key');
$function_body$;
REVOKE ALL ON FUNCTION get_my_public_key() FROM PUBLIC;
CREATE OR REPLACE FUNCTION get_my_secret_key() RETURNS TEXT
SECURITY DEFINER
LANGUAGE SQL
AS
$function_body$
SELECT pg_read_file ('/home/pguser/secret.key');
$function_body$;
REVOKE ALL ON FUNCTION get_my_secret_key() FROM PUBLIC;
This can also be fully implemented in PL/pgSQL, using the built-in pg_read_file
(filename) PostgreSQL system function. To use this function, you must place the files in the data
directory, as required by that function for added security, so that the database user cannot access the rest of the filesystem directly. However, using that file needs pg_read_server_files
privilege unless granted via a role or accessed using security definer functions.
If you don’t want other database users to be able to see the keys, you also need to write wrapper functions for encryption and decryption, and then give end users access to these wrapper functions.
The encryption
function could look like this:
create or replace function encrypt_using_my_public_key(
cleartext text,
ciphertext out bytea
)
AS $$
DECLARE
pubkey_bin bytea;
BEGIN
-- text version of public key needs to be passed through function dearmor() to get to raw key
pubkey_bin := dearmor(get_my_public_key());
ciphertext := pgp_pub_encrypt(cleartext, pubkey_bin);
END;
$$ language plpgsql security definer;
revoke all on function encrypt_using_my_public_key(text) from public;
grant execute on function encrypt_using_my_public_key(text) to bob;
The decryption
function could look like this:
create or replace function decrypt_using_my_secret_key(
ciphertext bytea,
cleartext out text
)
AS $$
DECLARE
secret_key_bin bytea;
BEGIN
-- text version of secret key needs to be passed through function dearmor() to get to raw binary key
secret_key_bin := dearmor(get_my_secret_key());
cleartext := pgp_pub_decrypt(ciphertext, secret_key_bin);
END;
$$ language plpgsql security definer;
revoke all on function decrypt_using_my_secret_key(bytea) from public;
grant execute on function decrypt_using_my_secret_key(bytea) to bob;
Finally, we test the encryption:
test2=# select encrypt_using_my_public_key('X marks the spot!');
This function returns a bytea
(that is, raw binary) result that looks something like this:
encrypt_using_my_public_key |
\301\301N\003\223o\215\2125\203\252;\020\007\376-z\233\211H...
To see that it actually works, you must run both commands:
test2=# select decrypt_using_my_secret_key(encrypt_using_my_public_key('X marks the spot!'));
decrypt_using_my_secret_key
-----------------------------
X marks the spot!
(1 row)
Yes—we got back our initial string!
How it works…
What we have done here is this:
- Hidden the keys from non-superuser database users
- Provided wrappers for authorized users to use encryption and decryption functionalities
To ensure that your sensitive data is not stolen while in transit between the client and the database server, make sure you connect to PostgreSQL, either using an SSL-encrypted connection or from localhost
.
You also have to trust your server administrators and all of the other users with superuser privileges to be sure that your encrypted data is safe. And, of course, you must trust the safety of the entire environment; PostgreSQL can decrypt the data, so any other user or software that has access to the same files can do the same.
If you are using EDB Postgres Advanced Server (EPAS), you can create this function using EDB-SPL, and you can also utilize EDB*Wrap. EDB*Wrap obfuscates code and protects the secret key code from superusers or any other privileged users. For more information, please refer to the following link:
https://www.enterprisedb.com/docs/epas/latest/epas_security_guide/03_edb_wrap/
There’s more…
A higher level of security is possible with more complex procedures and architecture, as shown in the next sections. We also mention a limited pgcrypto
version that does not use OpenSSL.
For really sensitive data
For some data, you wouldn’t want to risk keeping the decryption password on the same machine as the encrypted data.
In those cases, you can use public-key cryptography, also known as asymmetric cryptography, and carry out only the encryption part on the database server. This also means that you only have the encryption key on the database host and not the key needed for decryption. Alternatively, you can deploy a separate, extra-secure encryption server in your server infrastructure that provides just the encrypting and decrypting functionality as a remote call.
This solution is secure because, in asymmetric cryptography, the private (that is, decryption) key cannot be derived from the corresponding public (that is, encryption) key, hence the names public
and private
, which denote the appropriate dissemination policies.
If you wish to prove the identity of the author of a file, the correct method is to use a digital signature, which is an entirely different application of cryptography. Note that this is not currently supported by pgcrypto
, so you must implement your own methods as C functions, or in a procedural language capable of using cryptographic libraries.
For really, really, really sensitive data
For even more sensitive data, you may never want the data to leave the client’s computer unencrypted; therefore, you need to encrypt the data before sending it to the database. In that case, PostgreSQL receives already encrypted data and never sees the unencrypted version. This also means that the only useful indexes you can have are for use in WHERE encrypted_column = encrypted_data
and to ensure uniqueness.
Even these forms can be used only if the encryption algorithm always produces the same ciphertext (output) for the same plaintext (input), which is true only for weaker encryption algorithms. For example, it would be easy to determine the age or sex of a person if the same value were always encrypted into the same ciphertext. To avoid this vulnerability, strong encryption algorithms are able to produce a different ciphertext for the same value.
The versions of pgcrypto
are usually compiled to use the OpenSSL library (http://www.openssl.org). If, for some reason, you don’t have OpenSSL or just don’t want to use it, it is possible to compile pgcrypto
without it, with a smaller number of supported encryption algorithms and a slightly reduced performance.
See also
- The page on
pgcrypto
in the PostgreSQL online documentation, available at http://www.postgresql.org/docs/current/static/pgcrypto.html - The OpenSSL web page, accessed at http://www.openssl.org/
- The GNU Privacy Handbook at http://www.gnupg.org/gph/en/manual.html