Definition

URI (Uniform Resource Identifier)

A URI  (Uniform Resource Identifier) is a sequence of characters that identifies a logical or physical resource.  Universal Resource Identifiers are specified in the Internet Engineering Task Force (IETF) Request for Comments (RFC) 3986 and are summarized and extended in documentation for the W3C’s Web Architecture, Architecture of the World Wide Web, Volume 1. According to the specifications, resources do not have to be accessible on the Internet. Examples of resources include electronic documents, elevator door sensors, XML namespaces, web pages and ID microchips for pets.

There are two types of URIs, Uniform Resource Identifiers (URLs) and Uniform Resource Names (URNs).

Uniform Resource Locator (URL) – this type of URI begins by stating which protocol should be used to locate and access the physical or logical resource on a network. If the resource is a web page, for example, the URI will begin with the protocol HTTP.  If the resource is a file, the URI will begin with the protocol FTP  or if the resource is an email address, the URI will begin with the protocol mailto. It is important to remember that URLs are not persistent. This means that if the resource’s location changes, the URL also needs to change to point to the resource’s new location.

Uniform Resource Name (URN) – this type of URI does not state which protocol should be used to locate and access the resource; it simply labels the resource with a persistent, location-independent unique identifier.  A URN will identify the resource throughout its lifecycle and will never change. Each URN has three components: the label “urn,” a colon and a character string that serves as a unique identifier. 

Every URL is also a URI, but not vice versa.

URI Syntax

The generic form of any URI is scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]

Scheme: The scheme lays out the concrete syntax and any associated protocols for the URI. Schemes are case-insensitive and are followed by a colon. Ideally, URI schemes should be registered with the Internet Assigned Numbers Authority (IANA), although nonregistered schemes can also be used.

While the two slashes shown in the example above are required by some schemes, they are not required by all schemes, including authority components, which are described below.

Authority component: An authority component is made up of multiple parts: an optional authentication section, a host -- consisting of either a registered name or an IP address -- and an optional port number. The authentication section contains the username and password, which are separated by a colon and followed by the symbol for at (@). After the @ comes the hostname, which is in turn followed by a colon and then a port number. It is important to note that IPv4 addresses must be in dot-decimal notation, and IPv6 addresses must be enclosed in brackets.

The path, which contains data, is notated by a sequence of segments separated by slashes. The path must begin with a single slash if an authority part was present. It may also begin with a single slash even if there is no authority part, but it cannot begin with a double slash. Keep in mind that while this part of the syntax may closely resemble a particular file path, it does not always imply a relation to that file system path.

Query (optional): The query contains a string of nonhierarchical data. Although the syntax is not well-defined, it is most often a sequence of attribute value pairs separated by a delimiter, such as an ampersand or a semicolon. The query is separated from the preceding part by a question mark.

Fragment (optional): The fragment contains a fragment identifier that provides direction to a secondary resource. For example, if the primary resource is an HTML document, the fragment is often an ID attribute of a specific element of that document. If the fragment identifies a certain section of an article identified by the rest of the URI, a Web browser will scroll this particular element into view. The fragment is separated from the preceding part by a hash (#).

URI resolution and references

URI resolution is one of a few common operations performed on URIs that are also URLs. It involves determining the proper data access method and parameters needed to locate and retrieve the resource that the URI points to.

A URI-reference is used to determine common usage for a URI. A URI reference may take the form of a full URI, a specific portion of a full URI or an empty string. If there is a fragment identifier, it will identify some portion of the resource referred to by the rest of the URI.

A URI-reference can be a URI, but it can also be what is known as a relative reference.  A URI is a relative reference if the URI-reference's prefix does not match the syntax of a scheme followed by its colon separator. In order to determine what components are present and whether the reference is relative, each of the five URI components are parsed for its subparts and their validation.

This was last updated in November 2016

Continue Reading About URI (Uniform Resource Identifier)

Dig Deeper on XML and XML schema

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

4 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What are some unique ways you put URIs to use?
Cancel
Parts in URI don't really matter for programs as long as URI is correct. But words have a meaning for people. For Web Services, having a consistent and descriptive naming convention for the path certainly matters.
And I also notice that in phishing emails real URI is now masked with some shortening service, like tinyurl.
Cancel
AlbertGareev, thanks for your comment. So it seems like the fact that URIs can be "masked" is a real problem security-wise -- do you thinkg it makes sense to start considering an "alternative" method of resource location that can't be so easily masked? What would that alternative even be?
Cancel
@Fred - that masking is a result of a clever human workaround intended to fool humans. The programs are not getting fooled. Instead of changing the URI standard we can simply have browsers showing a real, unmasked address, say, on mouse-over.
Cancel

-ADS BY GOOGLE

File Extensions and File Formats

Powered by:

SearchSoftwareQuality

SearchCloudApplications

SearchAWS

TheServerSide

SearchWinDevelopment

Close