Why URLs Need Encoding
A URL is made up of characters — and not all characters are "safe" to use directly. URLs are transmitted over the internet through systems that may interpret certain characters as having special meaning (like ? for query strings, & for separating parameters, or # for fragment identifiers). Other characters may be non-ASCII (like Chinese or Arabic letters) and some characters may be garbled by older HTTP infrastructure.
URL encoding (officially called percent-encoding) solves this by replacing unsafe characters with a % followed by the two-digit hexadecimal representation of the character's byte value.
A space (0x20) → %20
An at sign (0x40) → %40
A forward slash (0x2F) → %2F
The RFC Standard
URL encoding is defined in RFC 3986 (Uniform Resource Identifiers). According to this standard:
Unreserved characters — safe to use anywhere in a URL without encoding:
A–Z,a–z,0–9-,_,.,~
Reserved characters — have special meaning in URIs and must be percent-encoded when used as data:
: / ? # [ ] @ ! $ & ' ( ) * + , ; =
Everything else — including spaces, non-ASCII characters, and characters like ", <, >, {, } — must be percent-encoded.
Percent-Encoding in Practice
Encoding a Space
The character space (U+0020) is encoded as %20. You may also see + used instead of %20 in query strings — this comes from the HTML form URL encoding (application/x-www-form-urlencoded) standard, where spaces are encoded as +. The two are distinct:
%20— RFC 3986 percent-encoding (spaces in paths, headers)+— HTML form encoding (spaces in query strings)
When decoding, always know which convention applies to avoid bugs.
Non-ASCII Characters
Non-ASCII characters are first encoded in UTF-8, then each byte is percent-encoded:
Chinese character 中 (U+4E2D)
UTF-8 bytes: 0xE4 0xB8 0xAD
Percent-encoded: %E4%B8%AD
So the Chinese word 你好 becomes %E4%BD%A0%E5%A5%BD.
JavaScript Encoding Functions
JavaScript provides several built-in functions for URL encoding:
encodeURI()
Encodes a complete URI — designed to encode a full URL while preserving its structure. It does not encode characters that are part of the URI syntax: ;, ,, /, ?, :, @, &, =, +, $, #.
encodeURI("https://example.com/search?q=hello world&lang=中文")
// "https://example.com/search?q=hello%20world&lang=%E4%B8%AD%E6%96%87"
encodeURIComponent()
Encodes a URI component — designed to encode individual values like query parameter values. It encodes everything except A–Z a–z 0–9 - _ . ! ~ * ' ( ).
encodeURIComponent("hello world & more")
// "hello%20world%20%26%20more"
encodeURIComponent("https://example.com")
// "https%3A%2F%2Fexample.com"
When to Use Which
| Scenario | Function |
|---|---|
| Encode an entire URL | encodeURI() |
| Encode a query parameter value | encodeURIComponent() |
| Encode a path segment | encodeURIComponent() |
| Encode a URL to embed in another URL | encodeURIComponent() |
The Complementary Decode Functions
decodeURI(encodedURI)
decodeURIComponent(encodedComponent)
Never use the deprecated escape() and unescape() functions — they handle non-ASCII characters differently and produce incorrect results.
Common Gotchas and Pitfalls
Double Encoding
A frequent bug is encoding a string that is already encoded:
encodeURIComponent(encodeURIComponent("hello world"))
// "hello%2520world"
// %25 is the encoding of %, so %20 became %2520
Always check if the value is already encoded before encoding it.
The + vs %20 Trap
If you decode a query string that used + for spaces with decodeURIComponent, the + will not be decoded to a space — you must replace + with %20 first, or use a URLSearchParams API:
new URLSearchParams("q=hello+world").get("q")
// "hello world" ✓
decodeURIComponent("hello+world")
// "hello+world" ✗ — still has a literal plus
Fragment Identifiers
The # character in URLs marks the start of a fragment identifier (for in-page anchors). If you have a # in data, it must be encoded as %23, otherwise the browser will treat everything after it as a fragment.
Internationalized Domain Names (IDN)
Domain names with non-ASCII characters (like bücher.de) use Punycode encoding, not percent-encoding. Browsers convert IDNs to Punycode internally: bücher.de → xn--bcher-kva.de.
URL Structure Reference
A URL has the following components (per RFC 3986):
scheme://userinfo@host:port/path?query#fragment
| Component | Example | Encoding rules |
|---|---|---|
| Scheme | https |
Letters, digits, +, -, . |
| Host | example.com |
Domain labels + dots |
| Port | 8080 |
Digits only |
| Path | /search/results |
Encoded with %XX except unreserved + :@!$&'()*+,;= |
| Query | q=hello+world |
+ for spaces in form data, %20 in general |
| Fragment | #section-2 |
Not sent to server; browser-only |
Server-Side Considerations
URL Normalization
Servers should normalize URLs before processing — for example, treating %41 (which decodes to A) the same as A. However, some characters have different meanings encoded vs unencoded: / vs %2F in paths — many web servers treat these differently for security reasons (path traversal protection).
SQL Injection via URL Parameters
Always sanitize and validate URL parameters before using them in database queries, even after URL decoding. URL encoding is not a security boundary.
Tools and APIs
Browser's URL API
Modern browsers and Node.js provide the URL API for working with URLs in a structured way:
const url = new URL("https://example.com/search?q=hello world&page=1");
console.log(url.searchParams.get("q")); // "hello world" (auto-decoded)
url.searchParams.set("q", "new value & special");
console.log(url.href);
// https://example.com/search?q=new+value+%26+special&page=1
The URL API handles encoding/decoding transparently, which is generally the preferred approach over manual encodeURIComponent calls.
Summary
URL encoding is a foundational concept that every web developer encounters daily, often without noticing it. The key points to remember:
- Percent-encoding (
%XX) is the standard mechanism to encode unsafe characters in URIs. - Use
encodeURIComponent()for individual values;encodeURI()for full URLs. - Be aware of the
+vs%20distinction in query strings. - Avoid double-encoding — check if a string is already encoded before encoding it.
- Prefer the modern
URLandURLSearchParamsAPIs for working with URLs programmatically.