Examining API data formats
Finally, in this section, let’s take a quick look at common data formats used in APIs. For REST APIs, information is transferred in plain text format (although this information may be encoded), either as key-value pairs as request parameters, one or more headers, or as an optional request body. Responses consist of a status and an optional response body.
XML
eXtensible Markup Language (XML) is the original heavyweight format for internet data storage and transmission. The format is designed to be agnostic of data type, separates data from presentation, and is of course extensible, not being reliant on any strict schema definition (unlike HTML, which uses fixed tags and keywords).
Although XML was dominant several years ago, it suffered from some significant drawbacks, namely complexity and large data payloads. These two factors make it difficult to process and parse XML on resource-limited systems. XML is still encountered, although much less so in APIs.
A simple example of XML shows the basic structure of tags and values:
<note> <to>Colin</to> <priority>High</priority> <heading>Reminder</heading> <body>Learn about API security</body> </note>
JSON
Javascript Object Notation (JSON) is now the dominant transfer format for data over HTTP, particularly in REST APIs. JSON originated as a lightweight alternative to the more heavyweight XML format, being particularly efficient with transmission bandwidth and client-side processing.
Data is represented by key-value pairs, with integer, null, Boolean, and string data types supported. Keys are delimited with quotes, as are strings. Records can be nested, and array data is supported. Comments are not permitted in JSON data.
A simple example of JSON shows the key-value pair structure:
{ "name": "Colin", "age": 52, "car": null }
YAML
YAML Ain’t Markup Language (YAML) is another common internet format, similar to JSON in its design goals. YAML is in fact a superset of JSON, with the addition of some processing features. JSON can be easily converted to YAML, and often, they are used interchangeably, depending on personal preference, particularly for OpenAPI definitions.
The same data from the JSON example can be expressed in YAML as follows:
--- name: Colin age: 52 car:
OpenAPI Specification
The final format we need to understand is the OpenAPI Specification (OAS), which is a human-readable (and machine-readable) specification for defining the behavior of an API. The OpenAPI Specification is an open standard run under the auspices of the OpenAPI Initiative. Previously, the standard was known as Swagger (aka version 2) but has now been formalized into an open standard, and currently, version 3.0 is in general use, with version 3.1 due imminently at the time of writing.
An OAS definition can be expressed either as YAML or JSON and comprises several sections, as shown here:
Figure 1.1 – OpenAPI Specification sections
Using an OAS definition at the inception of the API life cycle (referred to as design-first) offers several key benefits, namely the following:
- Description validation and linting: Parsers and audit tools can automatically validate a definition to confirm its correctness and completeness.
- Data validation: Request and response data can be fully specified, allowing validation of API behavior at runtime.
- Documentation generation: Documentation can be automatically generated from a definition, including a test UI, allowing the API to be exercised.
- Code generation: Tools exist that allow the server and client code stubs to be generated in a variety of languages, easing the burden on developers.
- Graphical editors: Fully featured graphical editors make it a simple task to design OAS specifications in an interactive, intuitive manner.
- Mock servers: OAS definitions can be used to build mock servers that simulate the behavior of an actual API backend. This is extremely useful in the early stages of API development and integration.
- Security analysis: Most importantly for us is the security benefits that the use of an OAS definition brings – definitions can be examined for security constraints (authorization and authentication, for example), and deficiencies can be highlighted. Data structures can be fully specified to allow the validation of data, preventing excessive information exposure.
A sample OAS definition is shown in the following snippet. This is an example of a bare-minimum specification of an API and includes the following in the header section:
- The OpenAPI version
- Information metadata
- Server information, including the host URL:
{ "openapi": "3.0.0", "info": { "version": "1.0.0", "title": "Swagger Petstore", "license": { "name": "MIT" } }, "servers": [ { "url": http://petstore.swagger.io/v1 }], ..
The next section in the OAS definition describes an endpoint, showing details such as the following:
- The endpoint path name
- The HTTP method to be used
- Request parameters
- Status codes
- The response format:
"paths": { "/pets": { "get": { "summary": "List all pets", "operationId": "listPets", "parameters": [ { "name": "limit", "in": "query", "description": "Maximum items (max 100)", "required": false, "schema": { "type": "integer", "format": "int32" } } ], "responses": { "200": { "description": "A paged array of pets", "headers": { "x-next": { "description": "Next page", "schema": { "type": "string } } }, "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Pets" } } } }, ..
At this point, we understand the building blocks of APIs and the associated data formats. It is now time to look at the elements of API security.