Marmox™ Home Home
Folder Folder
Page Page
Page Folder Page Folder
Blog Entry Blog Entry
Upload… Upload…
selectedEdit Edit
Save Save
Rename… Rename…
Refresh Refresh
Delete Delete
Previous Previous
Next Next
Download… Download…
Add Comment… Add Comment…
Access… Access…
Properties… Properties…
About The Marmox™ Network About The Marmox™ Network
Log in Log in Join Join
GlobalMentor Specifications URI Name-Encoding

URI Name-Encoding

A Name-Encoding for URIs

Copyright © 2012 GlobalMentor, Inc. This specification may be freely used but only in unmodifed form.

Author
Garret Wilson
Version
2012-02-12

Overview

Name-encoding is a tranformation of a URI to only include name-token characters, that is, characters normally used as "names" in other specifications such as XML, namely the characters '0'-'9', 'a'-'z', 'A'-'Z', '-', and '_'. This transformation results in a single normalized representation of equivalent URIs. This tranformation guarantees that a round-trip transformation will result in a URI equivalent to the original URI, although it may not be identical when compared as a string to the original URI.

Name-encoding is useful for representing a full URI reference in some specification that does not allow URIs. For example, a name-encoded URI can be used as the name of a Subversion property or as a key in a Java properties file.

Rules

A name-encoded URI is a transformation of a normal URI that follows these rules:

  1. Every existing percent-encoded value in the URI (using the '%' character as an escape) is normalized to use the lowercase hexadecimal form.
  2. The URI scheme separator ':' is replaced by the hyphen character '-' (U+002D).
  3. Every path separator '/' in the URI scheme-specific part is replaced by the hyphen character '-' (U+002D).
  4. Every remaining character that is not one of '0'-'9', 'a'-'z', 'A'-'Z', '-', or '_' is encoded by escaping each byte of the UTF-8 encoding of the character, using the '_' character (U+005F) as an escape followed by the two-character hexadecimal representation of the byte value in lowercase.

The character-encoding in step 3 is identical to URI-encoding of reserved characters, except that the '_' character is used in place of the '%' character and the hex representation must be in lowercase.

Examples

URIName-Encoding
http://www.example.com/foo/bar http---www.example.com-foo-bar
x-foo.bar://www.example.com/foo/bar x_2dfoo.bar---www.example.com-foo-bar
http://www.example.com/foo-bar http---www.example.com-foo_2dbar
http://www.example.com/foo_bar http---www.example.com-foo_5fbar
http://www.example.com/foo/bar#fooBar http---www.example.com-foo-bar_23fooBar
http://www.example.com/foo!bar http---www.example.com-foo_21bar
http://www.example.com/foo%2Abar http---www.example.com-foo_252abar

References

IETF RFC 3986
Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force, 2005. (See http://www.ietf.org/rfc/rfc3986.txt.)