URL Encoding: The Hidden Language of the Web

Why URLs Need Encoding

A URL is made up of characters — and not all characters are "safe" to use directly. URLs are transmitted over the internet through systems that may interpret certain characters as having special meaning (like ? for query strings, & for separating parameters, or # for fragment identifiers). Other characters may be non-ASCII (like Chinese or Arabic letters) and some characters may be garbled by older HTTP infrastructure.

URL encoding (officially called percent-encoding) solves this by replacing unsafe characters with a % followed by the two-digit hexadecimal representation of the character's byte value.

A space (0x20) → %20
An at sign (0x40) → %40
A forward slash (0x2F) → %2F

The RFC Standard

URL encoding is defined in RFC 3986 (Uniform Resource Identifiers). According to this standard:

Unreserved characters — safe to use anywhere in a URL without encoding:

A–Z, a–z, 0–9
-, _, ., ~

Reserved characters — have special meaning in URIs and must be percent-encoded when used as data:

: / ? # [ ] @ ! $ & ' ( ) * + , ; =

Everything else — including spaces, non-ASCII characters, and characters like ", <, >, {, } — must be percent-encoded.

Percent-Encoding in Practice

Encoding a Space

The character space (U+0020) is encoded as %20. You may also see + used instead of %20 in query strings — this comes from the HTML form URL encoding (application/x-www-form-urlencoded) standard, where spaces are encoded as +. The two are distinct:

%20 — RFC 3986 percent-encoding (spaces in paths, headers)
+ — HTML form encoding (spaces in query strings)

When decoding, always know which convention applies to avoid bugs.

Non-ASCII Characters

Non-ASCII characters are first encoded in UTF-8, then each byte is percent-encoded:

Chinese character 中 (U+4E2D)
UTF-8 bytes: 0xE4 0xB8 0xAD
Percent-encoded: %E4%B8%AD

So the Chinese word 你好 becomes %E4%BD%A0%E5%A5%BD.

JavaScript Encoding Functions

JavaScript provides several built-in functions for URL encoding:

`encodeURI()`

Encodes a complete URI — designed to encode a full URL while preserving its structure. It does not encode characters that are part of the URI syntax: ;, ,, /, ?, :, @, &, =, +, $, #.

encodeURI("https://example.com/search?q=hello world&lang=中文")
// "https://example.com/search?q=hello%20world&lang=%E4%B8%AD%E6%96%87"

`encodeURIComponent()`

Encodes a URI component — designed to encode individual values like query parameter values. It encodes everything except A–Z a–z 0–9 - _ . ! ~ * ' ( ).

encodeURIComponent("hello world & more")
// "hello%20world%20%26%20more"

encodeURIComponent("https://example.com")
// "https%3A%2F%2Fexample.com"

When to Use Which

Scenario	Function
Encode an entire URL	`encodeURI()`
Encode a query parameter value	`encodeURIComponent()`
Encode a path segment	`encodeURIComponent()`
Encode a URL to embed in another URL	`encodeURIComponent()`

The Complementary Decode Functions

decodeURI(encodedURI)
decodeURIComponent(encodedComponent)

Never use the deprecated escape() and unescape() functions — they handle non-ASCII characters differently and produce incorrect results.

Common Gotchas and Pitfalls

Double Encoding

A frequent bug is encoding a string that is already encoded:

encodeURIComponent(encodeURIComponent("hello world"))
// "hello%2520world"
// %25 is the encoding of %, so %20 became %2520

Always check if the value is already encoded before encoding it.

The `+` vs `%20` Trap

If you decode a query string that used + for spaces with decodeURIComponent, the + will not be decoded to a space — you must replace + with %20 first, or use a URLSearchParams API:

new URLSearchParams("q=hello+world").get("q")
// "hello world" ✓

decodeURIComponent("hello+world")
// "hello+world" ✗ — still has a literal plus

Fragment Identifiers

The # character in URLs marks the start of a fragment identifier (for in-page anchors). If you have a # in data, it must be encoded as %23, otherwise the browser will treat everything after it as a fragment.

Internationalized Domain Names (IDN)

Domain names with non-ASCII characters (like bücher.de) use Punycode encoding, not percent-encoding. Browsers convert IDNs to Punycode internally: bücher.de → xn--bcher-kva.de.

URL Structure Reference

A URL has the following components (per RFC 3986):

scheme://userinfo@host:port/path?query#fragment

Component	Example	Encoding rules
Scheme	`https`	Letters, digits, `+`, `-`, `.`
Host	`example.com`	Domain labels + dots
Port	`8080`	Digits only
Path	`/search/results`	Encoded with `%XX` except unreserved + `:@!$&'()*+,;=`
Query	`q=hello+world`	`+` for spaces in form data, `%20` in general
Fragment	`#section-2`	Not sent to server; browser-only

Server-Side Considerations

URL Normalization

Servers should normalize URLs before processing — for example, treating %41 (which decodes to A) the same as A. However, some characters have different meanings encoded vs unencoded: / vs %2F in paths — many web servers treat these differently for security reasons (path traversal protection).

SQL Injection via URL Parameters

Always sanitize and validate URL parameters before using them in database queries, even after URL decoding. URL encoding is not a security boundary.

Tools and APIs

Browser's `URL` API

Modern browsers and Node.js provide the URL API for working with URLs in a structured way:

const url = new URL("https://example.com/search?q=hello world&page=1");
console.log(url.searchParams.get("q")); // "hello world" (auto-decoded)

url.searchParams.set("q", "new value & special");
console.log(url.href);
// https://example.com/search?q=new+value+%26+special&page=1

The URL API handles encoding/decoding transparently, which is generally the preferred approach over manual encodeURIComponent calls.

Summary

URL encoding is a foundational concept that every web developer encounters daily, often without noticing it. The key points to remember:

Percent-encoding (%XX) is the standard mechanism to encode unsafe characters in URIs.
Use encodeURIComponent() for individual values; encodeURI() for full URLs.
Be aware of the + vs %20 distinction in query strings.
Avoid double-encoding — check if a string is already encoded before encoding it.
Prefer the modern URL and URLSearchParams APIs for working with URLs programmatically.

URL Encoding: The Hidden Language of the Web

Why URLs Need Encoding

The RFC Standard

Percent-Encoding in Practice

Encoding a Space

Non-ASCII Characters

JavaScript Encoding Functions

`encodeURI()`

`encodeURIComponent()`

When to Use Which

The Complementary Decode Functions

Common Gotchas and Pitfalls

Double Encoding

The `+` vs `%20` Trap

Fragment Identifiers

Internationalized Domain Names (IDN)

URL Structure Reference

Server-Side Considerations

URL Normalization

SQL Injection via URL Parameters

Tools and APIs

Browser's `URL` API

Summary

Privacy & Security

Completely Free

Why URLs Need Encoding

The RFC Standard

Percent-Encoding in Practice

Encoding a Space

Non-ASCII Characters

JavaScript Encoding Functions

encodeURI()

encodeURIComponent()

When to Use Which

The Complementary Decode Functions

Common Gotchas and Pitfalls

Double Encoding

The + vs %20 Trap

Fragment Identifiers

Internationalized Domain Names (IDN)

URL Structure Reference

Server-Side Considerations

URL Normalization

SQL Injection via URL Parameters

Tools and APIs

Browser's URL API

Summary

`encodeURI()`

`encodeURIComponent()`

The `+` vs `%20` Trap

Browser's `URL` API