A URI (Uniform Resource Identifier) is a sequence of characters that identifies a logical or physical resource. Various scheme specifications dictate how that identification occurs. A URI typically describes:

  • The mechanism used to access the resource;
  • The specific computer in which the resource is housed; and
  • The specific name of the resource -- a file name -- on the computer.

One of its most well-known uses is to identify any of the internet's many points of content, whether it be a page of text, a video or sound clip, a still or animated image, an email address or a program. The most common form of URI is the Uniform Resource Locator (URL), also known as a web address.

The Uniform Resource Name (URN) is another unique form of URI. Unlike a URL, which no longer works once content is moved, a URN has institutional persistence. This means while a URL's exact location may change from time to time, someone -- or some program -- will be able to find it.

URI syntax

The URI rules of syntax, set forth in the Internet Engineering Task Force (IETF) Request for Comments (RFC) 1630, apply for all internet addresses. RFC 3986 was published in January 2005, and it is considered the standard reference for URI syntax. The IETF published the content of RFC 3986 as the full standard STD 66, reflecting the establishment of the URI generic syntax as an official internet protocol.

Interpretation of URIs is independent of access. URIs have a global scope and are interpreted consistently, regardless of context. However, that interpretation may be in relation to the end user's context. For example, "http://localhost/" has the same interpretation for every user of that reference, even though the network interface corresponding to "localhost" may be different for each end user.

URI components

The generic form of the URI is:


The scheme specifies the concrete syntax and associated protocols that define each URI. It consists of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus sign, period or hyphen. Schemes are case-insensitive; however, the canonical form is lowercase. It is followed by a colon. Popular schemes include http, ftp, mailto, file and data. URI schemes should be registered with the Internet Assigned Numbers Authority (IANA); although, nonregistered schemes are used in practice.

The two slashes are required by some schemes, but not all. If an authority component is absent, the path component cannot begin with two slashes.

The authority component is made up of an optional authentication section, a host -- consisting of either a registered name or an IP address -- and an optional port number. In the authentication section, the username and password are separated by a colon, and the password is followed by the symbol for at (@). The port number is separated from the hostname by a colon. IPv4 addresses must be in dot-decimal notation, and IPv6 addresses must be enclosed in brackets.

The path, which contains data, is usually organized in hierarchical form. It appears as a sequence of segments separated by slashes. The path must begin with a single slash if an authority part was present, and may also if one was not, but must not begin with a double slash. This sequence may resemble or map exactly to a file system path; however, it does not always imply a relation to that file system path.

The optional query is separated from the preceding part by a question mark. It contains a query string of nonhierarchical data. Although the syntax is not well-defined, by convention, it is most often a sequence of attribute value pairs separated by a delimiter, such as an ampersand or a semicolon.

The optional fragment is separated from the preceding part by a hash. It contains a fragment identifier that provides direction to a secondary resource, such as a section heading in an article identified by the remainder of the URI. When the primary resource is an HTML document, the fragment is often an ID attribute of a specific element. Web browsers will scroll this element into view.

Examples of URIs and their component breakdowns.

For example, the URI ""identifies a file that can be accessed using the Web protocol application, Hypertext Transfer Protocol, ("http://") that is housed on a computer named "" -- which can be mapped to a unique internet address. In the computer's directory structure, the file is located at the "/Icons/WWW/w3c_main.gif" path name. Character strings that identify File Transfer Protocol (FTP) addresses and email addresses are also URIs -- and, like web address, are also called a URL.

URI references

A URI reference may take the form of a full URI, the scheme-specific portion of a full URI, a trailing component of a full URI or the empty string. If there is a fragment identifier, the part of the reference before the # indirectly identifies a resource. The fragment identifier identifies some portion of that resource.

How URI, URL and URN differ.

Software converts a URI reference to absolute form by merging it with a base URI according to a fixed algorithm. The system treats the URI reference as relative to the base URI. However, in the case of an absolute reference, the base has no relevance. If the base URI includes a fragment identifier, it is ignored during the merging process. If a fragment identifier is present in the URI reference, it is preserved during the merging process.

For example, in HTML, the value of the src attribute of the img element provides a URI reference, as does the value of the href attribute of the a or link element.

URI resolution

To resolve a URI means either to convert a relative URI reference to absolute form or to dereference a URI -- or URI reference -- by attempting to obtain a representation of the resource that it identifies.

A same-document reference is a URI reference to a document containing the URI reference itself. A URI reference is defined as a same-document reference if, when resolved to absolute form, it equates exactly to the base URI in effect for the reference.

URI equivalence is defined as when a URI reference, while not identical to the base URI, still represents the same resource.

