URL Encoding (Percent Encoding) and its Applications
This comprehensive guide explains URL encoding (percent encoding), a crucial process for handling non-ASCII characters in web addresses. Learn why URL encoding is necessary, how it works, and how to use encoding functions in JavaScript, PHP, and ASP. Includes a practical encoding example and a detailed character encoding reference table.
URL Encoding: Understanding Percent Encoding
Web browsers use URLs (Uniform Resource Locators) to request web pages from servers. A URL is the web address, like https://www.w3schools.com
.
Why URL Encoding?
URLs are sent over the internet using the ASCII character set, which has limitations. Many characters used in web pages aren't part of the ASCII set. Therefore, these characters need to be converted (encoded) into a format that the internet can handle.
URL encoding, also known as percent encoding, replaces unsafe characters with a "%" followed by two hexadecimal digits. For example, a space is often encoded as %20
or sometimes a plus sign (+).
URL Encoding in Practice
Below is a simple form to demonstrate URL encoding. Enter some text, and click "Submit." The encoded version will be sent to a (placeholder) server.
URL Encoding Functions
Programming languages like JavaScript, PHP, and ASP provide built-in functions to perform URL encoding:
- JavaScript:
encodeURIComponent()
- PHP:
rawurlencode()
- ASP:
Server.URLEncode()
Here's how the JavaScript function works:
ASCII and UTF-8 Encoding Reference
Your browser encodes text based on the character set specified in the HTML document. UTF-8 is the default in HTML5. The table below shows some examples of how characters are encoded using Windows-1252 and UTF-8.
Character | Windows-1252 | UTF-8 |
---|---|---|
space | %20 | %20 |
! | %21 | %21 |
" | %22 | %22 |
# | %23 | %23 |
... | ... | ... |
ASCII Control Characters (00-1F)
ASCII control characters (%00-%1F
) were initially intended for controlling hardware. They are not suitable for use within URLs.
ASCII Character | Description | URL-encoding |
---|---|---|
NUL | null character | %00 |
SOH | start of header | %01 |
STX | start of text | %02 |
... | ... | ... |