Handling resources and URIs
Every document on the World Wide Web (WWW) is represented as a resource in terms of HTTP. This resource is represented as a URI, which is an endpoint that represents a unique resource on a server.
Roy Fielding states that a URI is known by many names – a WWW address, a Universal Document Identifier (UDI), a URI, a Uniform Resource Locator (URL), and a Uniform Resource Name (URN).
So, what is a URI? A URI is a string (that is, a sequence of characters) that identifies a resource by its location, name, or both (in the WWW world). There are two types of URIs – URLs and URNs – as follows:
URLs are widely used and even known to non-developer users. URLs are not only restricted to HTTP; in fact, they are also used for many other protocols such as FTP, JDBC, and MAILTO. Therefore, a URL is an identifier that identifies the network location of a resource. We will go into more detail in the later sections.
The URI syntax
The URI syntax is as follows:
scheme:[//authority]path[?query][#fragment]
As per the syntax, the following is a list of components of a URI:
- Scheme: This refers to a non-empty sequence of characters followed by a colon (
:
).scheme
starts with a letter and is followed by any combination of digits, letters, periods (.
), hyphens (-
), or plus characters (+
).Scheme examples include HTTP, HTTPS, MAILTO, FILE, FTP, and more. URI schemes must be registered with the Internet Assigned Numbers Authority (IANA).
- Authority: This is an optional field and is preceded by
//
. It consists of the following optional subfields:a. Userinfo: This is a subcomponent that might contain a username and a password, which are both optional.
b. Host: This is a subcomponent containing either an IP address or a registered host or domain name.
c. Port: This is an optional subcomponent that is followed by a colon (
:
). - Path: A path contains a sequence of segments separated by slash characters (
/
). In the preceding GitHub REST API example,/licenses
is the path. - Query: This is an optional component and is preceded by a question mark (
?
). The query component contains a query string of non-hierarchical data. Each parameter is separated by an ampersand (&
) in the query component and parameter values are assigned using an equals (=
) operator. - Fragment: This is an optional field and is preceded by a hash (
#
). The fragment component includes a fragment identifier that gives direction to a secondary resource.
The following list contains examples of URIs:
- www.packt.com: This doesn't contain the scheme. It just contains the domain name. There is no port either, which means it points to the default port.
index.html
: This contains no scheme nor authority. It only contains the path.- https://www.packt.com/index.html: This contains the scheme, authority, and path.
Here are some examples of different scheme URIs:
mailto:support@packt.com
telnet://192.168.0.1:23/
ldap://[2020:ab9::9]/c=AB?objectClass?obj
From a REST perspective, the path component of a URI is very important because it represents the resource path and your API endpoint paths are formed based on it. For example, take a look at the following:
GET https://www.domain.com/api/v1/order/1
Here, /api/v1/order/1
represents the path, and GET
represents the HTTP method.
URLs
If you look closely, most of the URI examples mentioned earlier can also be called URLs. A URI is an identifier; on the other hand, a URL is not only an identifier, but it also tells you how to get to it.
As per Request for Comments (RFC)-3986 on URIs (https://xml2rfc.tools.ietf.org/public/rfc/html/rfc3986.html), the term URL refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (for example, its network "location").
A URL represents the full web address of a resource, including the protocol name (the scheme), the hostname port (in case the HTTP port is not 80
; for HTTPS, the default port is 443
), part of the authority component, the path, and optional query and fragment subcomponents.
URNs
URNs are not commonly used. They are also a type of URI that starts with a scheme – urn. The following URN example is directly taken from RFC-3986 for URIs (https://xml2rfc.tools.ietf.org/public/rfc/html/rfc3986.html):
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
This example follows the "urn:" <NID> ":" <NSS>
syntax, where <NID>
is the NAMESPACE IDENTIFIER, and <NSS>
is the Namespace-specific String. We are not going to use URNs in our REST implementation. However, you can read more about them at RFC-2141 (https://tools.ietf.org/html/rfc2141).
As per RFC-3986 on URIs (https://xml2rfc.tools.ietf.org/public/rfc/html/rfc3986.html): The term URN has been used historically to refer to both URIs under the "urn" scheme RFC-2141, which are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable, and to any other URI with the properties of a name.