Introducing HTTP
Hypertext Transfer Protocol (HTTP) is the foundation of data communication for WWW. This protocol defines how messages are formatted, transmitted, and processed over the Internet. Let's have a quick recap of HTTP in this section.
HTTP versions
HTTP has been consistently evolving over time. So far, there are three versions. HTTP/0.9 was the first documented version, which was released in the year 1991. This was very primitive and supported only the GET
method. Later, HTTP/1.0 was released in the year 1996 with more features and corrections for the shortcomings in the previous release. HTTP/1.0 supported more request methods such as GET
, HEAD
, and POST
. The next release was HTTP/1.1 in the year 1999. This was the revision of HTTP/1.0. This version is in common use today.
HTTP/2 (originally named HTTP 2.0) is the next planned version. It is mainly focused on how the data is framed and transported between the client and the server.
Tip
To learn more about HTTP, you can refer to the Wikipedia resources that you may find at http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol.
Understanding the HTTP request-response model
HTTP works in a request-response manner. Let's take an example to understand this model better.
The following example illustrates the basic request-response model of communication between a web browser and a server over HTTP. The following sequence diagram illustrates the request and response messages sent between the client and the server:
Here is a detailed explanation of the sequence of actions shown in the preceding diagram.
The user enters the following URL in the browser, http://www.example.com/index.html
, and then submits the request. The browser establishes a connection with the server and sends a request to the server in the form of a request method, URI, and protocol version, followed by a message containing request modifiers, client information, and possible body content. The sample request looks like the following:
GET /index.html HTTP/1.1 Host: www.example.com User-Agent: Mozilla/5.0 Accept: text/htmlAccept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive
Let's take a minute to understand the structure of the preceding message. The following code is what you see in the first lines of the request in our example:
GET /index.html HTTP/1.1
The general format for the request line is an HTTP command, followed by the resource to retrieve, and the HTTP version supported by the client. The client can be any application that understands HTTP, although this example refers to a web browser as the client. The request line and other header fields must end with a carriage return character followed by a line feed character. In the preceding example, the browser instructs the server to get the index.html
file through the HTTP 1.1
protocol.
The rest of the information that you may see in the request message is the HTTP header values for use by the server. The header fields are colon-separated key-value pairs in the plain-text format, terminated by a carriage return followed by a line feed character. The header fields in the request, such as the acceptable content types, languages, and connection type, are the operating parameters for an HTTP transaction. The server can use this information while preparing the response for the request. A blank line is used at the end of the header to indicate the end of the header portion in a request.
The last part of an HTTP request is the HTTP body. Typically, the body is left blank unless the client has some data to submit to the server. In our example, the body part is empty as this is a GET
request for retrieving a page from the server.
So far, we have been discussing the HTTP request sent by the client. Now, let's take a look at what happens on the server when the message is received. Once the server receives the HTTP request, it will process the message and return a response to the client. The response is made up of the reply status code from the server, followed by the HTTP header and a response content body:
HTTP/1.1 200 OK Accept-Ranges: bytes Cache-Control: max-age=604800 Content-Type: text/html Date: Wed, 03 Dec 2014 15:05:59 GMT Content-Length: 1270 <html> <head> <title>An Example Page</title> </head> <body> Hello World ! </body> </html>.
The first line in the response is a status line. It contains the HTTP version that the server is using, followed by a numeric status code and its associated textual phrase. The status code indicates one of the following parameters: informational codes, success of the request, client error, server error, or redirection of the request. In our example, the status line is as follows:
HTTP/1.1 200 OK
The next item in the response is the HTTP response header. Similar to the request header, the response header follows the colon-separated name-value pair format terminated by the carriage return and line feed characters. The HTTP response header can contain useful information about the resource being fetched, the server hosting the resource, and some parameters controlling the client behavior while dealing with the resource, such as content type, cache expiry, and refresh rate.
The last part of the response is the response body. Upon the successful processing of the request, the server will add the requested resource in the HTTP response body. It can be HTML, binary data, image, video, text, XML, JSON, and so on. Once the response body has been sent to the requestor, the HTTP server will disconnect if the connection created during the request is not of the keep-alive
type (using the Connection: keep-alive
header).
Uniform resource identifier
You may see the term uniform resource identifier (URI) used very frequently in the rest of the chapter. A URI is a text that identifies any resource or name on the Internet. One can further classify a URI as a Uniform Resource Locator (URL) if the text used for identifying the resource also holds the means for accessing the resource such as HTTP or FTP. The following is one such example:
https://www.packtpub.com/application-development
In general, all URLs are URIs. To learn more about URIs, visit http://en.wikipedia.org/wiki/Uniform_resource_identifier.
Understanding the HTTP request methods
In the previous section, we discussed about the HTTP GET
request method for retrieving a page from the server. More request methods similar to GET
are available with HTTP, each performing specific actions on the target resource. Let's learn about these methods and their role in client-server communication over HTTP.
The set of common methods for HTTP/1.1 is listed in the following table:
Method |
Description |
---|---|
|
This method is used for retrieving resources from the server by using the given URI. |
|
This method is the same as the |
|
This method is used for posting data to the server. The server stores the data (entity) as a new subordinate of the resource identified by the URI. If you execute |
|
This method is used for updating the resource pointed at by the URI. If the URI does not point to an existing resource, the server can create the resource with that URI. |
|
This method deletes the resource pointed at by the URI. |
|
This method is used for echoing the contents of the received request. This is useful for the debugging purpose with which the client can see what changes (if any) have been made by the intermediate servers. |
|
This method returns the HTTP methods that the server supports for the specified URI. |
|
This method is used for establishing a connection to the target server over HTTP. |
|
This method is used for applying partial modifications to a resource identified by the URI. |
We may use some of these HTTP methods, such as GET
, POST
, PUT
, and DELETE
, while building RESTful web services in the later chapters.
Continuing our discussion on HTTP, the next section discusses the HTTP header parameter that identifies the content type for the message body.
Representing content types using HTTP header fields
When we discussed the HTTP request-response model in the Understanding the HTTP request-response model section, we talked about the HTTP header parameters (the name-value pairs) that define the operating parameters of an HTTP transaction. In this section, we will cover the header parameter used for describing the content types present in the request and the response message body.
The Content-Type
header in an HTTP request or response describes the content type for the message body. The Accept
header in the request tells the server the content types that the client is expecting in the response body. The content types are represented using the Internet media type. The Internet media type (also known as the MIME type) indicates the type of data that a file contains. Here is an example:
Content-Type: text/html
This header indicates that the body content is presented in the html
format. The format of the content type values is a primary type/subtype followed by an optional semicolon delimited attribute-value pairs (known as parameters).
The Internet media types are broadly classified in to the following categories on the basis of the primary (or initial) Content-Type
header:
text
: This type indicates that the content is plain text and no special software is required to read the contents. The subtype represents more specific details about the content, which can be used by the client for special processing, if any. For instance,Content-Type: text/html
indicates that the body content ishtml
, and the client can use this hint to kick off an appropriate rendering engine while displaying the response.multipart
: As the name indicates, this type consists of multiple parts of the independent data types. For instance,Content-Type: multipart/form-data
is used for submitting forms that contain the files, non-ASCII data, and binary data.message
: This type encapsulates more messages. It allows messages to contain other messages or pointers to other messages. For instance, theContent-Type: message/partial
content type allows for large messages to be broken up into smaller messages. The full message can then be read by the client (user agent) by putting all the broken messages together.image
: This type represents the image data. For instance,Content-Type: image/png
indicates that the body content is a.png
image.audio
: This type indicates the audio data. For instance,Content-Type: audio/mpeg
indicates that the body content is MP3 or other MPEG audio.video
: This type indicates the video data. For instance,Content-Type: video/mp4
indicates that the body content is MP4 video.application
: This type represents the application data or binary data. For instance,Content-Type: application/json; charset=utf-8
designates the content to be in the JavaScript Object Notation (JSON) format, encoded with UTF-8 character encoding.
Tip
JSON is a lightweight data-interchange format. If you are not familiar with the JSON format, not to worry now; we will cover this topic in Chapter 2, Java APIs for JSON Processing.
We may need to use some of these content types in the next chapters while developing the RESTful web services. This hint will be used by the client to correctly process the response body.
Note
We are not covering all the possible subtypes for each category of media type here. To refer to the complete list, visit the website of Internet Assigned Numbers Authority (IANA) at http://www.iana.org/assignments/media-types/media-types.xhtml.
The next topic, a simple but important one, is on HTTP status codes.
HTTP status codes
For every HTTP request, the server returns a status code indicating the processing status of the request. In this section, we will see some of the frequently used HTTP status codes. A basic understanding of status codes will definitely help us later while designing RESTful web services:
1xx Informational
: This series of status codes indicates informational content. This means that the request is received and processing is going on. Here are the frequently used informational status codes:100 Continue
: This code indicates that the server has received the request header and the client can now send the body content. In this case, the client first makes a request (with theExpect: 100-continue
header) to check whether it can start with a partial request. The server can then respond either with100 Continue (OK)
or417 Expectation Failed (No)
along with an appropriate reason.101 Switching Protocols
: This code indicates that the server is OK for a protocol switch request from the client.102 Processing
: This code is an informational status code used for long running processing to prevent the client from timing out. This tells the client to wait for the future response, which will have the actual response body.
2xx Success
: This series of status codes indicates the successful processing of requests. Some of the frequently used status codes in this class are as follows:200 OK
: This code indicates that the request is successful and the response content is returned to the client as appropriate.201 Created
: This code indicates that the request is successful and a new resource is created.204 No Content
: This code indicates that the request is processed successfully, but there's no return value for this request. For instance, you may find such status codes in response to the deletion of a resource.
3xx Redirection
: This series of status codes indicates that the client needs to perform further actions to logically end the request. A frequently used status code in this class is as follows:304 Not Modified
: This status indicates that the resource has not been modified since it was last accessed. This code is returned only when allowed by the client via setting the request headers asIf-Modified-Since
orIf-None-Match
. The client can take appropriate action on the basis of this status code.
4xx Client Error
: This series of status codes indicates an error in processing the request. Some of the frequently used status codes in this class are as follows:400 Bad Request
: This code indicates that the server failed to process the request because of the malformed syntax in the request. The client can try again after correcting the request.401 Unauthorized
: This code indicates that authentication is required for the resource. The client can try again with the appropriate authentication.403 Forbidden
: This code indicates that the server is refusing to respond to the request even if the request is valid. The reason will be listed in the body content if the request is not aHEAD
method.404 Not Found
: This code indicates that the requested resource is not found at the location specified in the request.405 Method Not Allowed
: This code indicates that the HTTP method specified in the request is not allowed on the resource identified by the URI.408 Request Timeout
: This code indicates that the client failed to respond within the time frame set on the server.409 Conflict
: This code indicates that the request cannot be completed because it conflicts with some rules established on resources, such as validation failure.
5xx Server Error
: This series of status codes indicates server failures while processing a valid request. Here is one of the frequently used status codes in this class:500 Internal Server Error
: This code indicates a generic error message, and it tells that an unexpected error occurred on the server and the request cannot be fulfilled.
Note
To refer to the complete list of HTTP status codes maintained by IANA, visit http://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml.
With this topic, we have finished the crash course on HTTP basics. We will be resuming our discussion on RESTful web services in the next section. Take a deep breath and get ready for an exciting journey.
The evolution of RESTful web services
Before getting into the details of REST-enabled web services, let's take a step back and define what a web service is. Then, we will see what makes a web service RESTful.
A web service is one of the very popular methods of communication between the client and server applications over the Internet. In simple words, web services are web application components that can be published, found, and used over the web. Typically, a web service has an interface describing the web service APIs, which is known as Web Services Description Language (WSDL). A WSDL file can be easily processed by machines, which blows out the integration complexities that you may see with large systems. Other systems interact with the web service by using Simple Object Access Protocol (SOAP) messages. The contract for communication is driven by the WSDL exposed by the web service. Typically, communication happens over HTTP with XML in conjunction with other web-related standards.
What kind of problems do the web services solve? There are two main areas where web services are used:
- Many of the companies specialized in Internet-related services and products have opened their doors to developers using publicly available APIs. For instance, companies such as Google, Yahoo, Amazon, and Facebook are using web services to offer new products that rely on their massive hardware infrastructures. Google and Yahoo offer their search services; Amazon offers its on-demand hosting storage infrastructure and Facebook offers its platform for targeted marketing and advertising campaigns. With the help of web services, these companies have opened the door for the creation of products that did not exist some years ago.
- Web services are being used within the enterprises to connect previously disjointed departments such as marketing and manufacturing. Each department or line of business (LOB) can expose its business processes as a web service, which can be consumed by the other departments.
By connecting more than one department to share information by using web services, we begin to enter the territory of the Service-Oriented Architecture (SOA). The SOA is essentially a collection of services, each talking to one another in a well-defined manner, in order to complete relatively large and logically complete business processes.
All these points lead to the fact that a web service has evolved into a powerful and effective channel of communication between a client and a server over a period of time. The good news is that we can integrate RESTful systems into a web service-oriented computing environment without much effort. Although you may have a fair idea about RESTful web services by now, let's see the formal definition before proceeding further.
Note
What is a RESTful web service?
Web services that adhere to the REST architectural constraints are characterized as RESTful web services. Refer to the section, The REST architectural style, at the beginning of this chapter if you need a quick brush up on the architectural constraints for a RESTful system.
Remember that REST is not the system's architecture in itself, but it is a set of constraints that when applied to the system's design leads to a RESTful architecture. As our definition of a web service does not dictate the implementation details of a computing unit, we can easily incorporate RESTful web services to solve large-scale problems. We can even fully use RESTful web services under the larger umbrella of the SOA.
With this larger view of the SOA, we begin to see how REST has the potential to impact the new computing models being developed.
Note
The RESTful web API or REST API is an API implemented using HTTP and the REST architectural constraints. Technically speaking, this is just another term for a RESTful web service. In this book, we will use these terms interchangeably.