Introducing HTTP

Hypertext Transfer Protocol (HTTP) is the foundation of data communication for WWW. To comprehend HTTP, it is essential to understand the etymology of hypertext. The major constraint of written text is its linearity, that is, not being able to easily reference other text that the user can easily access. Hypertext overcomes this constraint, with the concept of hyperlinks, which allows the user to easily navigate to the referenced section. HTTP is an application layer protocol that defines how hypertext messages are formatted, transmitted, and processed over the internet. Let's have a quick recap of HTTP in this section.

HTTP versions

HTTP has been consistently evolving over time. So far, there have been three versions. HTTP/0.9 was the first documented version, which was released in the year 1991. This was very primitive and supported only the GET method. Later, HTTP/1.0 was released in the year 1996 with more features and corrections for the shortcomings of the previous release. HTTP/1.0 supported more request methods such as GET, HEAD, and POST. The next release was HTTP/1.1 in the year 1999. This was a revision of HTTP/1.0. This version is in common use today.

HTTP/2 (originally named HTTP 2.0) was published in 2015. It is mainly focused on how the data is framed and transported between the client and server. It is currently supported by major browsers and as of May 2017, 13.7% of the top 10 million websites support HTTP/2.

To learn more about HTTP, you can refer to Wikipedia you may find the relevant page at http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol.

Understanding the HTTP request-response model

HTTP works in a request-response manner. Let's take an example to understand this model better.

The following example illustrates the basic request-response model of communication between a web browser and a server over HTTP. The following sequence diagram illustrates the request and response messages sent between the client and server:

Here is the detailed explanation for the sequence of actions shown in the preceding diagram.

The user enters http://www.example.com/index.html in the browser and then submits the request. The browser establishes a connection with the server and sends a request to the server in the form of a request method (URI) and a protocol version, followed by a message containing the request modifiers, client information, and possible body content. The sample request looks as follows:

GET /index.html HTTP/1.1 
Host: www.example.com 
User-Agent: Mozilla/5.0 
Accept: text/htmlAccept-Language: en-US,en;q=0.5 
Accept-Encoding: gzip, deflate 
Connection: keep-alive

Let's take a minute to understand the structure of the preceding message. The following code is what you see in the first lines of request in our example:

GET /index.html HTTP/1.1

The general format for the request line is an HTTP command, followed by the resource to retrieve and the HTTP version supported by the client. The client can be any application that understands HTTP, although this example refers to a web browser as the client. The request line and other header fields must end with a carriage return character followed by a line-feed character. In the preceding example, the browser instructs the server to get the index.html file through the HTTP 1.1 protocol.

The rest of the information that you may see in the request message is the HTTP header values for use by the server. The header fields are colon-separated key-value pairs in the plain-text format, terminated by a carriage return and followed by a line feed character. The header fields in the request, such as the acceptable content types, languages, and connection type, are the operating parameters for an HTTP transaction. The server can use this information while preparing the response to the request. A blank line is used at the end of the header to indicate the end of the header portion in a request.

The last part of an HTTP request is the HTTP body. Typically, the body is left blank unless the client has some data to submit to the server. In our example, the body part is empty as this is a GET request for retrieving a page from the server.

So far, we have been discussing the HTTP request sent by the client. Now, let's take a look at what happens on the server when the message is received. Once the server receives the HTTP request, it will process the message and return a response to the client. The response is made up of the reply status code from the server, followed by the HTTP header and a response content body:

HTTP/1.1 200 OK 
Accept-Ranges: bytes 
Cache-Control: max-age=604800 
Content-Type: text/html 
Date: Wed, 03 Dec 2014 15:05:59 GMT 
Content-Length: 1270 
 
<html> 
<head> 
  <title>An Example Page</title> 
</head> 
<body> 
  Hello World ! 
</body> 
</html>.

The first line in the response is a status line. It contains the HTTP version that the server is using, followed by a numeric status code and its associated textual phrase. The status code indicates one of the following parameters: informational codes, success of the request, client error, server error, or redirection of the request. In our example, the status line is as follows:

HTTP/1.1 200 OK

The next item in the response is the HTTP response header. Similar to the request header, the response header follows the colon-separated name-value pair format terminated by a carriage return and line feed characters. The HTTP response header can contain useful information about the resource being fetched, the server hosting the resource, and some parameters controlling the client behavior while dealing with resource, such as content type, cache expiry, and refresh rate.

The last part of the response is the response body. Upon successful processing of the request, the server will add the requested resource in the HTTP response body. It can be HTML, binary data, image, video, text, XML, JSON, and so on. Once the response body has been sent to the requestor, the HTTP server will disconnect if the connection created during the request is not of the keep-alive type (using the Connection: keep-alive header).

Uniform resource identifier

You may see the term Uniform Resource Identifier (URI) used very frequently in the rest of the chapter. A URI is a text that identifies any resource or name on the internet. One can further classify a URI as a Uniform Resource Locator (URL) if the text used for identifying the resource also holds the means for accessing the resource, such as HTTP or FTP. The following is one such example:

https://www.packtpub.com/application-development

In general, all URLs such as https://www.packtpub.com/application-development are URIs.

To learn more about URIs, visit http://en.wikipedia.org/wiki/Uniform_resource_identifier.

Understating the HTTP request methods

In the previous session, we discussed about the HTTP GET request method for retrieving a page from the server. More request methods similar to GET are available with HTTP, each performing specific actions on the target resource. Let's learn about these methods and their role in client-server communication over HTTP.

The set of common methods for HTTP/1.1 is listed in the following table:

Method	Description
`GET`	This method is used for retrieving resources from the server by using the given URI.
`HEAD`	This method is the same as the `GET` request, but it only transfers the status line and the header section without the response body.
`POST`	This method is used for posting data to the server. The server stores the data (entity) as a new subordinate of the resource identified by the URI. If you execute `POST` multiple times on a resource, it may yield different results.
`PUT`	This method is used for updating the resource pointed by the URI. If the URI does not point to an existing resource, the server can create the resource with that URI.
`DELETE`	This method deletes the resource pointed by the URI.
`TRACE`	This method is used for echoing the contents of the received request. This is useful for the debugging purpose with which the client can see what changes (if any) have been made by the intermediate servers.
`OPTIONS`	This method returns the HTTP methods that the server supports for the specified URI.
`CONNECT`	This method is used for establishing a connection to the target server over HTTP.
`PATCH`	This method is used for applying partial modifications to a resource identified by the URI.

We may use some of these HTTP methods, such as GET, POST, PUT, and DELETE, while building RESTful web services in the later chapters.

Continuing our discussion on HTTP, the next section discusses the HTTP header parameter, which identifies the content type for the message body.

Representing content types using HTTP header fields

When we discussed the HTTP request-response model in the Understanding the HTTP request-response model section, we talked about the HTTP header parameters (the name-value pairs) that define the operating parameters of an HTTP transaction. In this section, we will cover the header parameter used for describing the content types present in the request and the response message body.

The Content-Type header in an HTTP request or response describes the content type for the message body. The Accept header in the request tells the server the content types that the client is expecting in the response body. The content types are represented using the internet media type. The internet media type (also known as the MIME type) indicates the type of data that a file contains. Here is an example:

Content-Type: text/html

This header indicates that the body content is presented in the html format. The format of the content type values is a primary type/subtype followed by optional semicolon-delimited attribute-value pairs (known as parameters).

The internet media types are broadly classified into the following categories on the basis of the primary (or initial) Content-Type header:

text: This type indicates that the content is a plain text and no special software is required to read the contents. The subtype represents more specific details about the content, which can be used by the client for special processing, if any. For instance, Content-Type: text/html indicates that the body content is html, and the client can use this hint to kick off an appropriate rendering engine while displaying the response.
multipart: As the name indicates, this type consists of multiple parts of independent data types. For instance, Content-Type: multipart/form-data is used for submitting forms that contain the files, non-ASCII data, and binary data.
message: This type encapsulates more messages. It allows messages to contain other messages or pointers to other messages. For instance, the Content-Type: message/partial content type allows for large messages to be broken up into smaller messages. The full message can then be read by the client (user agent) by putting all the broken messages together.
image: This type represents the image data. For instance, Content-Type: image/png indicates that the body content is a .png image.
audio: This type indicates the audio data. For instance, Content-Type: audio/mpeg indicates that the body content is MP3 or other MPEG audio.
video: This type indicates the video data. For instance, Content-Type: video/mp4 indicates that the body content is an MP4 video.
application: This type represents the application data or binary data. For instance, Content-Type: application/json; charset=utf-8 designates the content to be in the JavaScript Object Notation (JSON) format, encoded with UTF-8 character encoding.

JSON is a lightweight data-interchange format. If you are not familiar with the JSON format, not to worry now; we will cover this topic in Chapter 2, Java APIs for JSON Processing.

We may need to use some of these content types in the next chapters while developing the RESTful web services. This hint will be used by the client to correctly process the response body.

We are not covering all possible subtypes for each category of media types here. To refer to the complete list, visit the website of the Internet Assigned Numbers Authority (IANA) at http://www.iana.org/assignments/media-types/media-types.xhtml.

The next topic, a simple but important one, is on HTTP status codes.

HTTP status codes

For every HTTP request, the server returns a status code indicating the processing status of the request. In this section, we will see some of the frequently used HTTP status codes. A basic understanding of status codes will definitely help us later while designing RESTful web services:

1xx Informational: This series of status codes indicates informational content. This means that the request is received and processing is going on. Here are the frequently used informational status codes:
- 100 Continue: This code indicates that the server has received the request header and the client can now send the body content. In this case, the client first makes a request (with the Expect: 100-continue header) to check whether it can start with a partial request. The server can then respond either with 100 Continue (OK) or 417 Expectation Failed (No) along with an appropriate reason.
- 101 Switching Protocols: This code indicates that the server is OK for a protocol switch request from the client.
- 102 Processing: This code is an informational status code used for long running processing to prevent the client from timing out. This tells the client to wait for the future response, which will have the actual response body.
2xx Success: This series of status codes indicates the successful processing of requests. Some of the frequently used status codes in this class are as follows:
- 200 OK: This code indicates that the request is successful and the response content is returned to the client as appropriate.
- 201 Created: This code indicates that the request is successful and a new resource is created.
- 204 No Content: This code indicates that the request is processed successfully, but there's no return value for this request. For instance, you may find such status codes in response to the deletion of a resource.
3xx Redirection: This series of status codes indicates that the client needs to perform further actions to logically end the request. A frequently used status code in this class is as follows:
- 304 Not Modified: This status indicates that the resource has not been modified since it was last accessed. This code is returned only when allowed by the client via setting the request headers as If-Modified-Since or If-None-Match. The client can take appropriate action on the basis of this status code.
4xx Client Error: This series of status codes indicates an error in processing the request. Some of the frequently used status codes in this class are as follows:
- 400 Bad Request: This code indicates that the server failed to process the request because of malformed syntax in the request. The client can try again after correcting the request.
- 401 Unauthorized: This code indicates that authentication is required for the resource. The client can try again with appropriate authentication.
- 403 Forbidden: This code indicates that the server is refusing to respond to the request even if the request is valid. The reason will be listed in the body content if the request is not a HEAD method.
- 404 Not Found: This code indicates that the requested resource is not found at the location specified in the request.
- 405 Method Not Allowed: This code indicates that the HTTP method specified in the request is not allowed on the resource identified by the URI.
- 408 Request Timeout: This code indicates that the client failed to respond within the time window set on the server.
- 409 Conflict: This code indicates that the request cannot be completed because it conflicts with some rules established on resources, such as validation failure.
5xx Server Error: This series of status codes indicates server failures while processing a valid request. Here is one of the frequently used status codes in this class:
- 500 Internal Server Error: This code indicates a generic error message, and it tells that an unexpected error occurred on the server and that the request cannot be fulfilled.

To refer to the complete list of HTTP status codes maintained by IANA, visit http://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml.

With this topic, we have finished the crash course on HTTP basics. We will be resuming our discussion on RESTful web services in the next section. Take a deep breath and be ready for an exciting journey.