A Name-Encoding for URIs

Copyright © 2012 GlobalMentor, Inc. This specification may be freely used but only in unmodifed form.

Author
Garret Wilson
Version
2012-02-12

Overview

Name-encoding is a tranformation of a URI to only include name-token characters, that is, characters normally used as "names" in other specifications such as XML, namely the characters '0'-'9', 'a'-'z', 'A'-'Z', '-', and '_'. This transformation results in a single normalized representation of equivalent URIs. This tranformation guarantees that a round-trip transformation will result in a URI equivalent to the original URI, although it may not be identical when compared as a string to the original URI.

Name-encoding is useful for representing a full URI reference in some specification that does not allow URIs. For example, a name-encoded URI can be used as the name of a Subversion property or as a key in a Java properties file.

Rules

A name-encoded URI is a transformation of a normal URI that follows these rules:

  1. Every existing percent-encoded value in the URI (using the '%' character as an escape) is normalized to use the lowercase hexadecimal form.
  2. The URI scheme separator ':' is replaced by the hyphen character '-' (U+002D).
  3. Every path separator '/' in the URI scheme-specific part is replaced by the hyphen character '-' (U+002D).
  4. Every remaining character that is not one of '0'-'9', 'a'-'z', 'A'-'Z', '-', or '_' is encoded by escaping each byte of the UTF-8 encoding of the character, using the '_' character (U+005F) as an escape followed by the two-character hexadecimal representation of the byte value in lowercase.

The character-encoding in step 3 is identical to URI-encoding of reserved characters, except that the '_' character is used in place of the '%' character and the hex representation must be in lowercase.

Examples

URIName-Encoding
http://www.example.com/foo/bar http---www.example.com-foo-bar
x-foo.bar://www.example.com/foo/bar x_2dfoo.bar---www.example.com-foo-bar
http://www.example.com/foo-bar http---www.example.com-foo_2dbar
http://www.example.com/foo_bar http---www.example.com-foo_5fbar
http://www.example.com/foo/bar#fooBar http---www.example.com-foo-bar_23fooBar
http://www.example.com/foo!bar http---www.example.com-foo_21bar
http://www.example.com/foo%2Abar http---www.example.com-foo_252abar

References

IETF RFC 3986
Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force, 2005. (See http://www.ietf.org/rfc/rfc3986.txt.)