Exploring API building blocks
Before we can understand how to secure APIs, we need to dive into the building blocks of APIs. This section will cover the somewhat challenging topics of cryptography, hashing and signatures, encoding, and transport layer security. We will not go into a lot of detail, but it is important to grasp these basics.
Rate limiting
Public APIs are exposed to the internet and can easily be discovered by adversaries. One of the simplest attacks against an API is a denial-of-service (DoS) attack, in which automation is used to repeatedly and persistently attempt to access an API. Sustained DoS attacks can lead to the exhaustion of server resources, leading to a failure of the API or, most commonly, denying legitimate access to the API.
Brute-force attacks can also be used in account takeover (ATO) attacks, where either a sign-up endpoint or a password reset endpoint are flooded in attempts to guess passwords or hashes, using a dictionary attack (where a list of commonly used passwords is used).
Both types of attacks can be mitigated using rate-limiting technology, which limits repeated and frequent access from a particular IP address to a given endpoint. Rate-limiting applies a timeout window on the transactions and will return a 429 Too Many
Requests
error.
Cryptography
Cryptography is a foundational element in securing data electronically – most simply, it is a mathematical transformation applied to data. Typically, cleartext (unencrypted) is transformed into cyphertext (encrypted), using an algorithm and a key. The cyphertext is no longer recognizable as the original cleartext and cannot be reverse-engineered to reveal the original cleartext without using the inverse transformation (decrypted), using the same algorithm and the key.
The choice of algorithm depends on the application; two broad types of algorithms are used:
- Symmetric algorithm: In this type, the same key is used to encrypt and decrypt data. The benefit of symmetric ciphers is that they are fast and safe; however, they pose a challenge in terms of the distribution of the shared key. Common symmetric key algorithms are DES, AES, and IDEA.
- Asymmetric algorithm: In this type, different keys are used to encrypt (using the public key) and decrypt (using the private key) data. Common asymmetric key algorithms are DDS, RSA, and ElGamal.
A fundamental challenge with cryptography is the exchange (and management) of keys between both parties. To this end, robust key-exchange protocols have been developed to securely exchange keys that prevent an eavesdropper from accessing keys in transit. The Diffie-Hellman exchange is the most used protocol.
Cryptography provides the following benefits:
- Authentication: By using public-key cryptography, it is possible to verify the identity of the originating party by using their public key to confirm they signed a message with their private key, which only they can access. By using certificates, it is possible to verify the validity (or trust) associated with public keys – this is the foundation of Transport Layer Security (TLS).
- Nonrepudiation: Using cryptography principles, transactions or documents can be audited to verify which parties had access to the resources. This prevents a receiving party from denying receipt; typically, this is used for bank transactions or document signatures.
- Confidentiality: The most obvious advantage of cryptography is to ensure that data is kept private, both in transit and at rest in storage. Only persons in possession of a valid key can decrypt and access the data.
- Integrity: Finally, cryptography can be used to ensure the integrity of data, verifying that it has not been modified in transit. By transmitting a fingerprint of the data along with it, the receiver can verify that the received data is the same that was transmitted by re-calculating the fingerprint and validating it against the one received.
Hashes, HMACs, and signatures
An important application of cryptography principles relates to ensuring the integrity of messages in transit.
Hashes are the most elementary technique, in which a block of data is passed through an algorithm to produce a digest of the data; typically this is a fixed-length string much shorter than the input data. Common hashing algorithms include SHA2 and MD5. Key properties of hash functions are that they are one-way functions or irreversible (the input cannot be obtained from the digest, and the digests are unique so that no two blocks of data will produce the same digest). Hashes are used to verify the integrity of data.
Hashed Authentication Message Codes (HMACs) are similar to hashes in that they produce a digest that is then encrypted with a symmetric algorithm and passed to the recipient. If the recipient has the correct key, they can decrypt the digest and verify the integrity of the data, and also the authenticity of the sender (via their shared key).
Signatures are the final piece of the puzzle – similar to HMACs, these use an algorithm to encrypt the digest; however, in this case, it is an asymmetric algorithm. The private key is used for encryption, and at the receiver end, the public key of the sender is used to decrypt and verify the integrity. Using robust principles of public key infrastructure (PKI), public keys can be trusted (their ownership can be verified).
The following table summarizes the differences between the three types:
Objective |
Hash |
HMAC |
Signature |
Integrity |
|
|
|
Authenticity |
|
|
|
Non-repudiation |
|
Table 1.1 – A comparison of digest types
Transport security
The TLS protocol is a transport-level cryptographic protocol to ensure secure communications over a TCP/IP network. An encrypted transport layer is essential for APIs to ensure that attackers are unable to eavesdrop on data or tokens over the network and to ensure that the client can validate the identity of the server (via certificate validation). Certificate management has usually presented challenges to organizations; however, with the emergence of providers such as Let’s Encrypt, certificate deployment and management have become a lot simpler.
Encoding
The final building block is that of encoding, which involves changing the representation of data for the purposes of storage or transmission. Encoding converts the character set of input data to a format that can be safely stored or transmitted, and decoding converts that data back to its original format.
This concept is best understood by looking at a few common encoding schemes:
- HTML encoding: In HTML, certain characters have special significance – for example,
<
and>
. If a text block contains these characters, it will change the structure of the rendered HTML, which is undesirable. By encoding these special characters in another format ("<"
and">"
), they can be safely rendered in an HTML document, where they will be displayed correctly as<
and>
but stored in a different form. - URL encoding: Similarly, in a URL, only the ASCII character set is allowed; all other characters are forbidden. Unfortunately, path locations may contain such characters (spaces and underscores, for example). By encoding these to an ASCII text representation, it is possible to get a valid URL version – for example, a space is converted to
%20
. - ASCII, UTF8, and Unicode: Text can be represented in a number of different formats and can be converted from one to the other, depending on the platforms and locales in use.
- Base64: This is a commonly used encoder to transform binary data to text data suitable for transmission over HTTP.
Encoding does not use a key to perform the transformation but, rather, a fixed algorithm, and any content that has been encoded can be decoded to produce exactly the same original content.
Encoding versus encryption
A common misunderstanding is the difference between encoding and encryption. They are two very different topics, solving different problems.
Encoding transforms data from one representation to another using a fixed algorithm. No keys are used, and the encoded data can be trivially converted back to the original format. It does not offer any form of integrity or confidentiality functions.
Encryption performs a transformation of data using a key; the resultant output does not resemble the input at all, and the only way the original data can be obtained is by applying the reverse decryption function using the same key. Encryption does not transform the representation or character set of the data.