Base64 encoding
Base64 is an encoding mechanism which was originally made for encoding binary data into textual format. First used in e-mail system that required binary attachments such as images and rich-text documents to be sent in ASCII format.
Base64 is commonly used in websites as well, not for encoding binary data but for obscuring things such as request parameter values, sessions, and so on. You might be aware that security through obscurity is not at all beneficial in any way. In this case, developers are not generally aware of the fact that even a slightly skilled person can decode the hidden value disguised as a Base64 string. Base64 encoding is used to encode media such as images, fonts, and so on through data URIs.
JS also provides built-in functions for encoding/decoding Base64-encoded strings such as:
atob()
: Encode to Base64bota()
: Decode from Base64
Character set of Base64 encoding
Base64 encoding contains a character set of 64 printable ASCII characters. The following set of characters is used to encode binary to text:
- A to Z characters
- a to z characters
- + (plus character)
- / (forward-slash character)
- = (equal character)
The following table is used for indexing the values to their respective Base64 encoding alternatives:
Value | Enc | Value | Enc | Value | Enc | Value | Enc |
---|---|---|---|---|---|---|---|
0 | A | 16 | Q | 32 | g | 48 | w |
1 | B | 17 | R | 33 | h | 49 | x |
2 | C | 18 | S | 34 | i | 50 | y |
3 | D | 19 | T | 35 | j | 51 | z |
4 | E | 20 | U | 36 | k | 52 | 0 |
5 | F | 21 | V | 37 | l | 53 | 1 |
6 | G | 22 | W | 38 | m | 54 | 2 |
7 | H | 23 | X | 39 | n | 55 | 3 |
8 | I | 24 | Y | 40 | o | 56 | 4 |
9 | J | 25 | Z | 41 | p | 57 | 5 |
10 | K | 26 | a | 42 | q | 58 | 6 |
11 | L | 27 | b | 43 | r | 59 | 7 |
12 | M | 28 | c | 44 | s | 60 | 8 |
13 | N | 29 | d | 45 | t | 61 | 9 |
14 | O | 30 | e | 46 | u | 62 | + |
15 | P | 31 | f | 47 | v | 63 | / |
The encoding process
The encoding process is as follows:
- Binary or non-binary data is read from left to right.
- Three separate 8-bit data from the input are joined to make a 24-bit-long group.
- The 24-bit long group is divided into 6-bit individual groups, that is, 4 groups.
- Now each 6-bit group is converted into the Base64-encoded format using the previous lookup table.
Example:
Let us take the word God
. We'll make a table to demonstrate the process more easily:
Alphabet | G | o | d | |
---|---|---|---|---|
8-bit groups | 01000111 | 01101111 | 01100100 | |
6-bit groups | 010001 | 110110 | 111101 | 100100 |
6-bit in decimal (Radix) | 17 | 54 | 61 | 36 |
Base64 lookup | R | 2 | 9 | k |
Therefore, the Base64 equivalent for God
becomes R29k
.
However, a problem arises when the character groups are do not exactly form the 24-bit pattern. Let me illustrate this. Consider the word PACKT
. We cannot divide this word into 24-bit groups equally. Hypothetically speaking, the first 24-bit group is PAC
and second group KT?
, where ?
signifies a missing 8-bit character. This is the place where the padding mechanism of Base64 kicks in. I'll explain that in the next section.
Padding in Base64
Wherever there is a missing character (8-bit) in forming the 24-bit groups then for every missing character (8-bit), =
is appended in place of that. So, for one missing character, =
is used; for every two missing characters ==
is used:
Input | Output | Padding | Padding Length |
---|---|---|---|
|
|
| 1 |
|
|
| 2 |
|
| 0 |