A Name-Encoding for URIs
Copyright © 2012 GlobalMentor, Inc. This specification may be freely used but only in unmodifed form.
- Author
- Garret Wilson
- Version
- 2012-02-12
Overview
Name-encoding is a tranformation of a URI to only include name-token characters, that is, characters normally used as "names" in other specifications such as XML, namely the characters '0'-'9', 'a'-'z', 'A'-'Z', '-', and '_'. This transformation results in a single normalized representation of equivalent URIs. This tranformation guarantees that a round-trip transformation will result in a URI equivalent to the original URI, although it may not be identical when compared as a string to the original URI.
Name-encoding is useful for representing a full URI reference in some specification that does not allow URIs. For example, a name-encoded URI can be used as the name of a Subversion property or as a key in a Java properties file.
Rules
A name-encoded URI is a transformation of a normal URI that follows these rules:
- Every existing percent-encoded value in the URI (using the
'%' character as an escape) is normalized to use the lowercase hexadecimal form.
- The URI scheme separator
':' is replaced by the hyphen character '-' (U+002D).
- Every path separator
'/' in the URI scheme-specific part is replaced by the hyphen character '-' (U+002D).
- Every remaining character that is not one of
'0'-'9', 'a'-'z', 'A'-'Z', '-', or '_' is encoded by escaping each byte of the UTF-8 encoding of the character, using the '_' character (U+005F) as an escape followed by the two-character hexadecimal representation of the byte value in lowercase.
The character-encoding in step 3 is identical to URI-encoding of reserved characters, except that the '_' character is used in place of the '%' character and the hex representation must be in lowercase.
Examples
| URI | Name-Encoding |
http://www.example.com/foo/bar |
http---www.example.com-foo-bar |
x-foo.bar://www.example.com/foo/bar |
x_2dfoo.bar---www.example.com-foo-bar |
http://www.example.com/foo-bar |
http---www.example.com-foo_2dbar |
http://www.example.com/foo_bar |
http---www.example.com-foo_5fbar |
http://www.example.com/foo/bar#fooBar |
http---www.example.com-foo-bar_23fooBar |
http://www.example.com/foo!bar |
http---www.example.com-foo_21bar |
http://www.example.com/foo%2Abar |
http---www.example.com-foo_252abar |
References
- IETF RFC 3986
- Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force, 2005. (See http://www.ietf.org/rfc/rfc3986.txt.)