URL Encoding (Percent Encoding) and its Applications

This comprehensive guide explains URL encoding (percent encoding), a crucial process for handling non-ASCII characters in web addresses. Learn why URL encoding is necessary, how it works, and how to use encoding functions in JavaScript, PHP, and ASP. Includes a practical encoding example and a detailed character encoding reference table.



URL Encoding: Understanding Percent Encoding

Web browsers use URLs (Uniform Resource Locators) to request web pages from servers. A URL is the web address, like https://www.w3schools.com.

Why URL Encoding?

URLs are sent over the internet using the ASCII character set, which has limitations. Many characters used in web pages aren't part of the ASCII set. Therefore, these characters need to be converted (encoded) into a format that the internet can handle.

URL encoding, also known as percent encoding, replaces unsafe characters with a "%" followed by two hexadecimal digits. For example, a space is often encoded as %20 or sometimes a plus sign (+).

URL Encoding in Practice

Below is a simple form to demonstrate URL encoding. Enter some text, and click "Submit." The encoded version will be sent to a (placeholder) server.




URL Encoding Functions

Programming languages like JavaScript, PHP, and ASP provide built-in functions to perform URL encoding:

  • JavaScript: encodeURIComponent()
  • PHP: rawurlencode()
  • ASP: Server.URLEncode()

Here's how the JavaScript function works:

ASCII and UTF-8 Encoding Reference

Your browser encodes text based on the character set specified in the HTML document. UTF-8 is the default in HTML5. The table below shows some examples of how characters are encoded using Windows-1252 and UTF-8.

Character Windows-1252 UTF-8
space %20 %20
! %21 %21
" %22 %22
# %23 %23
.........

ASCII Control Characters (00-1F)

ASCII control characters (%00-%1F) were initially intended for controlling hardware. They are not suitable for use within URLs.

ASCII Character Description URL-encoding
NUL null character %00
SOH start of header %01
STX start of text %02
.........

**Remember:** You'll need to manually add the missing rows to both character encoding tables. The JavaScript function is a basic placeholder—you will need to implement the actual URL encoding logic. The form's `action` attribute should point to a server-side script that handles the encoded data. Remember to replace the placeholder comments with your actual server-side handling code.