Back to our web world, there are some notable API protocols:
- Simple Object Access Protocol (SOAP): This allows access to objects, maintains communication using HTTP, and is based on Extensible Markup Language (XML). It is simple and presents a good way to establish communications between web applications, as it is OS-independent and agnostic about technologies and programming languages.
- REST: Maybe one of the most famous web API protocols in use nowadays, REST is an architectural style to design web services. Therefore, the services that follow such a style are said to be RESTful. The predefined set of REST operations is stateless, and the services have access to constructs to manipulate text-based representations of the data.
- Google Remote Procedure Call (gRPC): Developed by the company behind the search engine, it is another HTTP-based architecture that happens to be open source. It applies buffers to allow data transmissions between pairs.
- JavaScript Object Notation – Remote Procedure Call (JSON-RPC): Just like REST, JSON-RPC is also stateless, uses objects (like SOAP), and can be applied instead of REST when higher performance is necessary.
- Graph Query Language (GraphQL): It was created by Meta (previously Facebook) and designed to be a database query language. GraphQL is open source and allows for complex responses by using simple data structures such as JSON.
Let’s analyze each one of them in more depth.
SOAP
Since SOAP is based on objects, for the sake of simplicity, both peers in a conversation must agree on which elements they would use to exchange information. SOAP messages are implemented by regular XML files containing at least the following elements:
- Body: It keeps information about the call and the response.
- Envelope: This identifies a file as a SOAP message.
- Fault: It carries information about errors and status.
- Header: As the name implies, holds header information.
Although SOAP messages must use XML as their structure, such documents cannot contain processing instructions or Document Type Definitions (DTDs). An XML document has its attributes defined inside a DTD. The SOAP 1.1 specification had three parts:
- The envelope, where the contents of the message are defined, the responsible structures that should handle it, and a specification if it is mandatory or optional.
- The encoding rules that define the mechanism to be used when serializing the datatype.
- The RPC representation that indicates how to represent remote calls and their responses.
The SOAP 1.2 specification has only two parts:
- The message envelope.
- The data model and protocol bindings.
In terms of organizational structure, SOAP messages are comprised of namespaces. The root element is the SOAP envelope. The Header
, Body
, and eventual Fault
elements are all inside of it. All SOAP envelopes must specify the http://www.w3.org/2003/05/soap-envelope/
Universal Resource Identifier (URI) as their namespace indication attribute. The encodingStyle
attribute may appear to indicate which encoding schema is used inside the message. The envelope declaration would look something like this:
<soap:Envelope
xmlns:soap="http://www.w3.org/2003/05/soap-envelope/"
soap:encodingStyle="http://www.w3.org/2003/05/soap-encoding">
A header in a SOAP message is optional, but if one is present, it must be at the beginning of the message, just after the Envelope
declaration. Its purpose is to store data that is specific to the application, such as payment information or an Authentication (AuthN) mechanism. Inside the header, some attributes can be declared, such as env:role
, env:mustUnderstand
, and env:relay
. The first one is used to define which role is associated with the header block. The second one is a Boolean variable. When true, it means that the recipient of the message must process the header. If some issue is raised while processing the header, a fault element is generated. Finally, the env:relay
component is only checked or processed by relay (intermediary nodes). It is a new feature of the SOAP 1.2 specification. An example header with two blocks could look like this (the tags were wrapped in multiple lines to facilitate reading):
<env:Header>
<BA:BlockA xmlns:BA="http://mysoap.com"
env:role="http://mysoap.com/role/A" env:mustUnderstand="true">
...
</BA:BlockA>
<BB:BlockB xmlns:BB="http://mysoap.com"
env:role="http://mysoap.com/role/B" env:relay="true">
...
</BB:BlockB>
</env:Header>
In this example, the block A part has a mustUnderstand
clause that is true
, which means that the recipient must process it. Block B is meant to be parsed by intermediary nodes only, since the env:relay
attribute is set to true
. Both blocks have role specifications.
XML Protocol (XMLP) was another XML-based message-exchanging protocol that was on spot until 2009, two years after SOAP specification 1.2 was released. XMLP proposed an abstract model, whereas SOAP details the primitives to allow for the practical application of this model. SOAP and XMLP have the concept of binding that determines which other protocol XMLP and/or SOAP should connect to work. One of (if not the) most popular bindings for SOAP is HTTP. This means that SOAP messages can and are effectively employed to allow communication of peers through HTTP.
REST
The predefined set of REST operations is stateless (as is also the case with XMLP), and the services have access to constructs to manipulate text-based representations of the data. While SOAP and XMLP have bindings that allow both to connect to other application-layer protocols and even to the transport layer (TCP or UDP), REST is more related to HTTP (also stateless), and therefore, manipulating such constructs reduces the learning curve for developers and sysadmins that are already used to HTTP terms. While using HTTP, all the protocol’s methods are available with REST: CONNECT
, DELETE
, GET
, HEAD
, OPTIONS
, PATCH
, POST
, PUT
, and TRACE
. REST was used to define the HTTP version 1.1 specification.
There may be the presence of intermediary nodes, which, in the case of REST, are translated as gateways such as cache or proxy servers, or even firewalls. Those nodes could allow scalability to the architecture since no state is held inside the messages, and some explicit cache information could be inserted into the responses. According to Roy Fielding’s specification, there are six constraints that rule whether a system can be categorized as RESTful. They are as follows:
- Client-server: Although there might be intermediary nodes, the communication usually happens between two peers only.
- Stateless: No state is stored in RESTful messages. The session state must be managed by the client. As the state is not controlled, this grants scalability to the architecture.
- Cache: Intermediary nodes can present themselves as cache servers. The server points to the content that can be cached, and this is respected by the client.
- Uniform interface: Using generality, the architecture becomes simpler, which improves the visibility of interactions.
- Layered system: Through the adoption of a hierarchy, each layer only has visibility to the layers it directly interacts with, which allows for the encapsulation of legacy services.
- Code-on-demand: Client functionality can be extended through the download and execution of additional codes from the server, which simplifies the client design.
The heart of any REST-based design is the state transfer operations. They are universal to any retrieval or storage system, and the acronym that encompasses them is Create, Read, Update, Delete (CRUD). There are direct associations between those operations and HTTP verbs (or commands). Create relates to POST
, Read relates to GET
, Update relates to PUT
and Delete relates to DELETE
(HTTP verbs are usually represented in technical literature with all capital letters).
Despite the similarities, some notable differences exist between REST and SOAP. They are specially related to how to do remote invocations (RPCs). On the other hand, with REST, a client locates a resource in a server and chooses what to do with it (change it, delete it, or get info about it – which could be mapped to the UPDATE
, DELETE
, and GET
HTTP methods, respectively). With SOAP, there is no direct interaction with a resource. Instead, the client needs to call a service and the service, in turn, does all the required actions with related objects and resources.
To circumvent this way of work, SOAP leverages some frameworks that allow it to give additional capability to the clients. One of those frameworks is Web Services Description Language (WSDL), a World Wide Web Consortium (W3C) recommendation from 2007. With the inclusion of specific attributes, such as getTermRequest
, and a type, such as string
, WSDL grants one step beyond using SOAP with web services.
We need to understand why REST virtually took over SOAP in the modern web API landscape. One of the points that counted in favor of REST when compared to SOAP was that SOAP is based on XML. This language can produce quite complex and verbose documents that obviously need to be correctly crafted by the sender and parsed by the receiver. Parsing an XML document (or structure) means reading it and transforming its elements into some data structure that can be further handled by the application. One of the most well-known parsers is called Document Object Model (DOM). One drawback of using DOM is its high memory consumption, which might be many times bigger than the amount of memory originally described in the document.
In computer science, data serialization is the activity of transforming abstract objects (or elements) present in data structures into something that can be stored at or transferred between computers. Deserialization means the opposite. Data serialization becomes more complex as nesting is used in documents. XML allows element nesting. There is no formal limit for this in the XML specification, which essentially means that an infinite number of elements could be nested. Complexity may raise security threats. Through the parsing of an XML document, an application could store its elements in a Structured Query Language (SQL) database, translating them to tables, rows, and columns, or even as Key-Value (KV) pairs in a NoSQL database. When accepting serialized objects from unknown or untrusted sources, this might impose an unnecessary risk to the application.
Open Web Application Security Project (OWASP) is a global organization that regularly releases cyber security best practices, including secure code development, and maintains some notable security projects. One of them is Top Ten, which lists the top ten most dangerous threats to web applications. The most current version was published in 2021. Insecure data deserialization is in the A03-2021 Injection group, which means that it is considered the third-most dangerous threat for applications.
Under the same project but classified as the fifth-most dangerous threat to web security is the XML External Entities (XXE) attack, categorized under the A05-2021 Security Misconfiguration group. If an XML document makes use of DTDs, it can be incorrectly interpreted by the XML parser. A DTD was the first way to specify the structure of an XML document, and it can also be used to determine how XML data should be stored.
With the usage of DTDs, a vulnerable XML parser might be the victim of a Denial of Service (DoS) attack called an XML bomb (also known as a billion laughs attack). Through the specification of ten DTD entities, with each subsequent entity being ten times a reference of the previous entity, this would result in one billion copies of the first entity. As previously explained, to accommodate all entities in memory, the XML parser needs to allocate a considerable amount of memory, eventually crashing and making the application unavailable.
REST APIs, on the other hand, are primarily based on JSON data structures. Those are simpler documents organized as maps that leverage the concept of KV pairs. JSON files do not require a specific parser; they support different types of data, such as strings, Boolean, numbers, arrays, and objects. However, JSON files are usually smaller when compared to their equivalents on XML. JSON also does not support comments. JSON structures are therefore more compact, as well as easy to craft and process. The code block that follows contains an example of a JSON structure:
{
"config_file": "apache.conf",
"number_of_replicas": 2, "active": true,
"host_names": [
"server1.domain", "server2.domain"
]
}
gRPC
The core idea of gRPC is to let you, a developer, invoke a remote method (located on your colleague’s computer or on the other side of the world) as if it was in your codebase itself. In other words, a client (or stub, as it is referred to inside the specification) calls a function, with its expected parameters, but that function is not even inside its code. It is implemented somewhere else. To tackle this, you need to follow definitions established by the server side of the gRPC invocation. Such definitions include the acceptable data types and the methods to return after their invocations end. Everything is based on creating a service that will leverage such methods to provide data to clients.
Another interesting part of gRPC is the support of modern programming languages, which allows you to split the development efforts among your team, with, for example, the Go programmers being responsible for the server and the Python programmers being occupied with building the client. As the protocol was created by Google, a gRPC server can also be hosted on the company’s public cloud.
There is one major difference between gRPC and the other two protocols already covered: it uses protocol buffers, although it can also be configured to work with other data formats, such as JSON. Protocol buffers is a data serialization technology created by Google in which you define the data structures you are going to use in your applications and, by applying the protoc
protocol buffer compiler, object classes are created in your code. The data structures are stored in text files with the .proto
extension. In a .proto
file, you create a service and define what makes the message that will flow between the client and server. When you run protoc
, it creates or updates the corresponding classes. The code block that follows shows an example of a file like this:
service MyService {
rpc ProcessFile (FileRequest) returns (ExitCode);
} // Comments are supported.
message FileRequest {
string FileName = 1;
}
message ExitCode {
int code = 1;
}
In the preceding code, you are creating a service called ProcessFile
that is invoked by the client side of your application on a method called FileRequest
that returns ExitCode
as the output. This last method is implemented on the server portion of your application. Obviously, as per the definition of gRPC, client and server portions can be in separate machines. Services can be of four different types:
- Unary: The client sends a single request and waits for a single response.
- Server Streaming: The client sends a request, and the response is returned as a stream of messages. The messages are sent in sequence.
- Client Streaming: The client sends a sequence of messages and waits for a single response from the server.
- Bidirectional Streaming: Both parts send sequences of messages.
It is interesting to realize how gRPC also works as a Software Development Kit (SDK). This means that the package has some software development support foundations that can be leveraged to design and deploy applications. It is not only a protocol per se but also a toolbelt to help you create your applications, led by the protoc
compiler. In Python, the compiler is implemented as a Package Installer for Python (PIP) module.
JSON-RPC
As we’ve introduced, JSON-RPC is a good replacement for REST when performance is an important factor. One characteristic of this protocol is that a client can send a request with no need to wait for a server response. Another feature allows clients to send multiple requests to the server and the server returning the responses out of the original requested order. In other words, the server’s responses follow asynchronously.
The current specification is 2.0 and it is not fully compatible with the previous one (1.0). JSON-RPC 2.0 request and response objects may not be correctly understood when the client and server are not running the same version of the protocol, although it is easy to identify the 2.0 specification, since it uses a jsonrpc
key whose value is 2.0
. All JSON primitives (strings, numbers, Booleans, null) and structures (arrays and objects) are fully supported.
There is a strict syntax (remember when we started talking about API definitions?) that must be respected when sending requests and receiving responses. The following are possible members of a request:
jsonrpc
: This contains 2.0
when this is the specification in use.
method
: String containing the name of the remote method to be invoked.
params
: Optional member that’s structured (either an array or object) and contains parameters to be passed to the invoked method.
id
: Optional member that can be a string, number, or null and contains the identification of the request.
Likewise, there is a definition for the response structure. Its members are as follows:
jsonrpc
: Same description as for the request.
result
: Exists only when the method was successfully invoked; the contents are provided by the invoked method.
error
: Only exists when the method is not successfully invoked; this is an object member, and its contents are provided by the invoked method.
id
: Same description as for the request, needs to carry the same value as the one specified in the request.
The error object has its own structure. You can easily realize another difference between REST and JSON-RPC. There are no HTTP methods, such as GET
, PUT
, or POST
, to be called. Instead, a simple JSON structure is provided. Another difference lies in the response. Where REST can use JSON or XML formats, JSON-RPC only supports JSON. For error handling, you just saw that JSON-RPC has its own error
member. REST provides HTTP status codes, such as 200 (OK), 404 (Not Found) or 500 (Server Error). Caching is supported by REST but not by JSON-RPC, and finally, JSON-RPC is simpler than REST simply because it only supports the request and response JSON structures. The code block that follows shows examples of requests and responses. A method called IsStudent
is invoked to return True
or False
should a provided numeric enrollment id
be a registered student. The first request succeeds, while the second request generates an error:
{"jsonrpc": "2.0", "method": "IsStudent", "params": [100], "id": 1}
{"jsonrpc": "2.0", "result": true, "id": 1}
{"jsonrpc": "2.0", "method": "IsStudent", "params": ["ABC"], "id": 2}
{"jsonrpc": "2.0", "error": {"code": -1, "message": "Invalid enrollment id format"}, "id": 2}
GraphQL
GraphQL, as the name implies, is a language to allow querying data served by an API. Wait a moment! This is inside a subsection on protocols. What is a language doing here? A generic definition of protocol could be “a set of rules that need to be properly followed to allow the successful establishment of communication between two or more peers.” GraphQL implements this as well.
It was created by Meta (then Facebook) in 2012 and released as an open source project in 2015. Later, in 2018, it was started to be hosted by the Linux Foundation and its ownership was taken by the GraphQL Foundation. One notorious feature is the fact that a single endpoint is exposed, making it easier for developers to request and receive the desired data. Other API protocols may eventually expose multiple endpoints to fulfill the needs of providing different types of data, or data spread in various databases or systems.
The data formats are also like JSON with some slight changes. There is a tremendous difference between GraphQL and REST. Rather than making requests, fetching the results, and adjusting the requests after analyzing the results to then submit new requests, with GraphQL, the application can interactively change the request until the received results are satisfactory. This is supported by WebSockets, a technology that allows continuous bidirectional communications between an HTTP client and a server where both sides send and receive data and any side can close the connection.
Since any side, client, or server can send data to each other at any time, WebSockets is also useful for sending notifications, especially from server to client, while the connection is still open. One possible application for this protocol is a currency exchange website. A client queries the server for the rate once. Every time the rate changes, the server notifies the client of the new rate. GraphQL also supports query parameters. You can filter results based on a criterion or ask the server to make data conversions or calculations all in the same query. The code block that follows shows an example of a request:
{
student(id: 100) {
name
grade(average: True)
}
}
The preceding code queries the server for a student whose id
is 100
. The client wants the student’s name and their grade, but only the average grade (calculated over the course modules), not the grade itself (average: True
). A possible answer is in the code block that follows. Observe that responses in GraphQL follow the structure of the request:
{
"data": {
"student" {
"name": "Mauricio Harley"
"grade": 85.2128
}
}
}
GraphQL data structures have a schema. This way, when designing queries, a developer will know the possible types of data that could be returned in a response in advance. It is useful to know that a single query may generate a list of items as a response with not much effort, considering the schemas have been properly set.