URL Encoding and Decoding

URL encoding of a sequence of octets (bytes) is defined by RFC 1738 to allow the transmission of arbitrary data over the URL in the form of ASCII text.

Any octet can be expressed as %XX, where XX is the hexadecimal value of the octet. The XX is always two bytes long.

For example, a tab would be encoded as %09, a space as %20, the % as %25, etc.

Not all bytes need to be encoded. For example, characters A-Z, a-z, digits 0-9, are usually not encoded (though a decoder should handle the case when they are). Some octets are considered safe, others not safe. Which ones are safe depends on the protocol used. Refer to RFC 1738, section 2.2, for details.

Additionally, whenever CGI arguments are transmitted, the blank space is often encoded as a plus (+). Any real plus is URL encoded to prevent confusion. To learn more about CGI, read the CGI Programming Is Simple! tutorial.

The urlendec collection comes with two utilities that allow you to URL encode/decode any sequence of data. They are:

They can take the data from the command line or from standard input. They write to standard output. Their usage is:

urlencode [options] [string ...]
urldecode [options] [string ...]

If a string is specified, that is the input data. For example,

urlencode Hello, world!

will output:

Hello%2C%20world%21

Similarly,

urldecode Hello%2C%20world%21

will result in:

Hello, world!

If no string exists on the command line (with an exception described under the -e option), input is taken from standard input. And, of course, it can be piped. For example,

date | urlencode

will produce something like this:

Wed%20Oct%2025%2019%3A35%3A03%20CDT%202000%0A

Note that it ends with %0A, which is the new line character URL encoded. That is because the date command prints one at the end of its output. If you didn't want it there, you could type:

urlencode `date`

In that case, the output would be:

Wed%20Oct%2025%2019%3A35%3A03%20CDT%202000

Urlencode Options

By default, urlencode will URL encode everything, except letters A-Z, a-z, and digits 0-9. Various command line options allow you to determine what should or should not be URL encoded. The options must precede the string (if you specify one). To see all options in alphabetical order, type:

urlencode -h

I will describe most of them here, but not alphabetically.

You can exclude individual values, or ranges of values, by the list option. A list (as all options) is preceded with a dash (-). The list is enclosed in square brackets. Non-printable characters may be URL encoded within the list. Four characters have special meaning:

Options can be grouped. In other words, several options (including lists) can follow a single dash.

Note: When entering the list from Unix shell, you need to prevent the shell from interpreting the [ as its own command. You can do so by preceding it with a \ or enclosing it with single quotes (but not both at the same time).

Examples:

urlencode -a\[0-7]
URL encode everything except octal digits.

urlencode '-x%d[89]'
URL encode the % sign and the octal digits. Leave everything else unencoded.

urlencode '-[%00-%1F]'
URL encode everything, except alphanumeric and control characters.

urlencode '-[:/.\-_]p'
URL encode everything, except A-Z, a-z, 0-9, colon, slash, dot, dash, and underline. Encode spaces into plus signs.

Urldecode Options

Type urldecode -h for the full list.

Common Options

The following options are common to both urlencode and urldecode:

The -e option

The -e option is common to both urlencode and urldecode. It serves a double purpose: