URL encoding – percent encoding
In this section, I'll explain percent encoding, which is a commonly used encoding technique to encode URLs.
URL encoding is a way in which certain characters are encoded or substituted by %
followed by the hexadecimal equivalent of the character. Developers often use encoding because there are certain cases when an intended character or representation is sent to the server but when received, the character changes or gets misinterpreted because of transport issues. Certain protocols such as OAuth also require some of its parameters, such as redirect_uri
, to be percent encoded to make it distinct from rest of the URL for the browser.
Example: <
is represented as %3c
in percent encoding format.
URL encoding is done typically on URI characters that are defined in RFC 3986. The RFC mentions that the characters must be separated into two different sets: reserved characters and unreserved characters.
Reserved characters have special meanings in the context of URLs and must be encoded into another form, which is the percent-encoded form to avoid any sort of ambiguity. A classic example of such ambiguity can be /
, which is used to separate paths in a URL, so if the necessity arises to transmit the /
character in a URL then we must encode it accordingly, so that the receiver or parser of the URL does not get confused and parse the URL incorrectly. Therefore, in that case /
is encoded into %2F
, this will be decoded into /
by the URL parser.
Unrestricted characters
The following characters are not encoded as part of the URL encoding technique:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 - _ . ~
Restricted characters
The following characters are encoded as part of the URL encoding technique:
! * ' ( ) ; : @ & = + $ , / ? # [ ]
Encoding table
The following is a list of characters with their encoded form:
Character | Encoded |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Encoding unrestricted characters
Although the percent encoding technique typically encodes restricted characters, it is also possible to encode unrestricted characters by providing an equivalent ASCII hexadecimal code for the character, preceded by %
.
For example, if we had to encode A into percent encoding, we can simply provide %41
; here, 41
is the hexadecimal for 65
, which, in turn, is the ASCII code for capital A.
A web-based URL encoder/decoder can be found here: