본 문서는 http://gbiv.com/protocols/uri/rfc/rfc3986.html 에서 발췌한것임을 밝힙니다.br /br /br /br /br /table summary=header information class=header border=0 cellpadding=1 cellspacing=1tbodytrtd class=header-lNetwork Working Group /tdtd class=header-rT. Berners-Lee /td/trtrtd class=header-lRequest for Comments: 3986 /tdtd class=header-rW3C/MIT /td/trtrtd class=header-lObsoletes: a href=http://www.ietf.org/rfc/rfc2732.txt2732/a,
a href=http://www.ietf.org/rfc/rfc2396.txt2396/a,
a href=http://www.ietf.org/rfc/rfc1808.txt1808/anbsp;/tdtd class=header-rR. Fielding /td/trtrtd class=header-lSTD: 66 /tdtd class=header-rDay Software /td/trtrtd class=header-lUpdates: a href=http://www.ietf.org/rfc/rfc1738.txt1738/anbsp;/tdtd class=header-rL. Masinter /td/trtrtd class=header-lCategory: Standards Track /tdtd class=header-rAdobe Systems /td/trtrtd class=header-lnbsp;/tdtd class=header-rJanuary 2005 /td/tr/tbody/tablep class=titlebr /Uniform Resource Identifier (URI): Generic Syntax/ph1a name=rfc.status href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.statusStatus of this Memo/a/h1pThis
document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the “Internet
Official Protocol Standards” (STD 1) for the standardization state and
status of this protocol. Distribution of this memo is unlimited./ph1a name=rfc.copyrightnotice href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.copyrightnoticeCopyright Notice/a/h1pCopyright © The Internet Society (2005). All Rights Reserved./ph1 id=rfc.abstracta href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.abstractAbstract/a/h1 pA
Uniform Resource Identifier (URI) is a compact sequence of characters
that identifies an abstract or physical resource. This specification
defines the generic URI syntax and a process for resolving URI
references that might be in relative form, along with guidelines and
security considerations for the use of URIs on the Internet. The URI
syntax defines a grammar that is a superset of all valid URIs, allowing
an implementation to parse the common components of a URI reference
without knowing the scheme-specific requirements of every possible
identifier. This specification does not define a generative grammar for
URIs; that task is performed by the individual specifications of each
URI scheme./phr class=noprinth1 class=np id=rfc.toca href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.tocTable of Contents/a/h1ul class=tocli class=tocline01. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#introIntroduction/aul class=tocli class=tocline11.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#overviewOverview of URIs/aul class=tocli class=tocline11.1.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#generic-syntaxGeneric Syntax/a/lili class=tocline11.1.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#examplesExamples/a/lili class=tocline11.1.3. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#URLvsURNURI, URL, and URN/a/li/ul/lili class=tocline11.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#designDesign Considerations/aul class=tocli class=tocline11.2.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#transcriptionTranscription/a/lili class=tocline11.2.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#identificationSeparating Identification from Interaction/a/lili class=tocline11.2.3. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#hierarchicalHierarchical Identifiers/a/li/ul/lili class=tocline11.3. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#notationSyntax Notation/a/li/ul/lili class=tocline02. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#charactersCharacters/aul class=tocli class=tocline12.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#percent-encodingPercent-Encoding/a/lili class=tocline12.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#reservedReserved Characters/a/lili class=tocline12.3. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#unreservedUnreserved Characters/a/lili class=tocline12.4. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#when-to-percent-encodeWhen to Encode or Decode/a/lili class=tocline12.5. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#identifying-dataIdentifying Data/a/li/ul/lili class=tocline03. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#componentsSyntax Components/aul class=tocli class=tocline13.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#schemeScheme/a/lili class=tocline13.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#authorityAuthority/aul class=tocli class=tocline13.2.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#userinfoUser Information/a/lili class=tocline13.2.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#hostHost/a/lili class=tocline13.2.3. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#portPort/a/li/ul/lili class=tocline13.3. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#pathPath/a/lili class=tocline13.4. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#queryQuery/a/lili class=tocline13.5. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#fragmentFragment/a/li/ul/lili class=tocline04. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#usageUsage/aul class=tocli class=tocline14.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#uri-referenceURI Reference/a/lili class=tocline14.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-refRelative Reference/a/lili class=tocline14.3. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#absolute-uriAbsolute URI/a/lili class=tocline14.4. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#same-documentSame-Document Reference/a/lili class=tocline14.5. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#suffixSuffix Reference/a/li/ul/lili class=tocline05. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#reference-resolutionReference Resolution/aul class=tocli class=tocline15.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-uriEstablishing a Base URI/aul class=tocli class=tocline15.1.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-contentBase URI Embedded in Content/a/lili class=tocline15.1.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-encapsulatedBase URI from the Encapsulating Entity/a/lili class=tocline15.1.3. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-retrievalBase URI from the Retrieval URI/a/lili class=tocline15.1.4. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-defaultDefault Base URI/a/li/ul/lili class=tocline15.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#absolutizeRelative Resolution/aul class=tocli class=tocline15.2.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-basePre-parse the Base URI/a/lili class=tocline15.2.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-transformTransform References/a/lili class=tocline15.2.3. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-mergeMerge Paths/a/lili class=tocline15.2.4. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-dot-segmentsRemove Dot Segments/a/li/ul/lili class=tocline15.3. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#recompositionComponent Recomposition/a/lili class=tocline15.4. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#reference-examplesReference Resolution Examples/aul class=tocli class=tocline15.4.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-normalNormal Examples/a/lili class=tocline15.4.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-abnormalAbnormal Examples/a/li/ul/li/ul/lili class=tocline06. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#comparisonNormalization and Comparison/aul class=tocli class=tocline16.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#equivalenceEquivalence/a/lili class=tocline16.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#comparison-ladderComparison Ladder/aul class=tocli class=tocline16.2.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#comparison-stringSimple String Comparison/a/lili class=tocline16.2.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-syntaxSyntax-Based Normalization/aul class=tocli class=tocline16.2.2.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-caseCase Normalization/a/lili class=tocline16.2.2.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-encodingPercent-Encoding Normalization/a/lili class=tocline16.2.2.3. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-pathPath Segment Normalization/a/li/ul/lili class=tocline16.2.3. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-schemeScheme-Based Normalization/a/lili class=tocline16.2.4. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-protocolProtocol-Based Normalization/a/li/ul/li/ul/lili class=tocline07. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#securitySecurity Considerations/aul class=tocli class=tocline17.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#security-reliabilityReliability and Consistency/a/lili class=tocline17.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#security-maliciousMalicious Construction/a/lili class=tocline17.3. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#security-transcodingBack-End Transcoding/a/lili class=tocline17.4. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#security-ipv4Rare IP Address Formats/a/lili class=tocline17.5. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#security-sensitiveSensitive Information/a/lili class=tocline17.6. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#security-semanticSemantic Attacks/a/li/ul/lili class=tocline08. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#IANAIANA Considerations/a/lili class=tocline09. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#ackAcknowledgements/a/lili class=tocline010. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.referencesReferences/aul class=tocli class=tocline110.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.references.1Normative References/a/lili class=tocline110.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.references.2Informative References/a/li/ul/lili class=tocline0A. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#collected-abnfCollected ABNF for URI/a/lili class=tocline0B. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#regexpParsing a URI Reference with a Regular Expression/a/lili class=tocline0C. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#delimitingDelimiting a URI in Context/a/lili class=tocline0D. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#changesChanges from RFC 2396/aul class=tocli class=tocline1D.1. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#additionsAdditions/a/lili class=tocline1D.2. nbsp; a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#modificationsModifications/a/li/ul/lili class=tocline0a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.authorsAuthors' Addresses/a/lili class=tocline0a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.iprIntellectual Property and Copyright Statements/a/lili class=tocline0a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.indexIndex/a/li/ulh1 id=rfc.section.1 class=npa href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.11./anbsp;a name=intro href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#introIntroduction/a/h1p id=rfc.section.1.p.1A
Uniform Resource Identifier (URI) provides a simple and extensible
means for identifying a resource. This specification of URI syntax and
semantics is derived from concepts introduced by the World Wide Web
global information initiative, whose use of these identifiers dates
from 1990 and is described in Universal Resource Identifiers in WWW a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC1630 title=Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web[RFC1630]/a. The syntax is designed to meet the recommendations laid out in Functional Recommendations for Internet Resource Locators a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC1736 title=Functional Recommendations for Internet Resource Locators[RFC1736]/a and Functional Requirements for Uniform Resource Names a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC1737 title=Functional Requirements for Uniform Resource Names[RFC1737]/a./pp id=rfc.section.1.p.2This document obsoletes a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC2396 title=Uniform Resource Identifiers (URI): Generic Syntax[RFC2396]/a, which merged Uniform Resource Locators a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC1738 title=Uniform Resource Locators (URL)[RFC1738]/a and Relative Uniform Resource Locators a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC1808 title=Relative Uniform Resource Locators[RFC1808]/a in order to define a single, generic syntax for all URIs. It obsoletes a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC2732 title=Format for Literal IPv6 Addresses in URL's[RFC2732]/a,
which introduced syntax for an IPv6 address. It excludes portions of
RFC 1738 that defined the specific syntax of individual URI schemes;
those portions will be updated as separate documents. The process for
registration of new URI schemes is defined separately by a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#BCP35 title=Registration Procedures for URL Scheme Names[BCP35]/a. Advice for designers of new URI schemes can be found in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC2718 title=Guidelines for new URL Schemes[RFC2718]/a. All significant changes from RFC 2396 are noted in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#changes title=Changes from RFC 2396Appendix D/a./pp id=rfc.section.1.p.3This specification uses the terms character and coded character set in accordance with the definitions provided in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#BCP19 title=IANA Charset Registration Procedures[BCP19]/a, and character encoding in place of what a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#BCP19 title=IANA Charset Registration Procedures[BCP19]/a refers to as a charset./ph2 id=rfc.section.1.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.1.11.1./anbsp;a name=overview href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#overviewOverview of URIs/a/h2p id=rfc.section.1.1.p.1URIs are characterized as follows:/pp id=rfc.section.1.1.p.2Uniform /pdldd style=margin-top: 0.5em;Uniformity
provides several benefits. It allows different types of resource
identifiers to be used in the same context, even when the mechanisms
used to access those resources may differ. It allows uniform semantic
interpretation of common syntactic conventions across different types
of resource identifiers. It allows introduction of new types of
resource identifiers without interfering with the way that existing
identifiers are used. It allows the identifiers to be reused in many
different contexts, thus permitting new applications or protocols to
leverage a pre-existing, large, and widely used set of resource
identifiers./dd/dlp id=rfc.section.1.1.p.3Resource /pdldd style=margin-top: 0.5em;This
specification does not limit the scope of what might be a resource;
rather, the term resource is used in a general sense for whatever
might be identified by a URI. Familiar examples include an electronic
document, an image, a source of information with a consistent purpose
(e.g., today's weather report for Los Angeles), a service (e.g., an
HTTP-to-SMS gateway), and a collection of other resources. A resource
is not necessarily accessible via the Internet; e.g., human beings,
corporations, and bound books in a library can also be resources.
Likewise, abstract concepts can be resources, such as the operators and
operands of a mathematical equation, the types of a relationship (e.g.,
parent or employee), or numeric values (e.g., zero, one, and
infinity)./dd/dlp id=rfc.section.1.1.p.4Identifier /pdldd style=margin-top: 0.5em;An
identifier embodies the information required to distinguish what is
being identified from all other things within its scope of
identification. Our use of the terms identify and identifying refer
to this purpose of distinguishing one resource from all other
resources, regardless of how that purpose is accomplished (e.g., by
name, address, or context). These terms should not be mistaken as an
assumption that an identifier defines or embodies the identity of what
is referenced, though that may be the case for some identifiers. Nor
should it be assumed that a system using URIs will access the resource
identified: in many cases, URIs are used to denote resources without
any intention that they be accessed. Likewise, the one resource
identified might not be singular in nature (e.g., a resource might be a
named set or a mapping that varies over time)./dd/dlp id=rfc.section.1.1.p.5A URI is an identifier consisting of a sequence of characters matching the syntax rule named lt;URIgt; in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#components title=Syntax ComponentsSection 3/a. It enables uniform identification of resources via a separately defined extensible set of naming schemes (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#scheme title=SchemeSection 3.1/a). How that identification is accomplished, assigned, or enabled is delegated to each scheme specification./pp id=rfc.section.1.1.p.6This
specification does not place any limits on the nature of a resource,
the reasons why an application might seek to refer to a resource, or
the kinds of systems that might use URIs for the sake of identifying
resources. This specification does not require that a URI persists in
identifying the same resource over time, though that is a common goal
of all URI schemes. Nevertheless, nothing in this specification
prevents an application from limiting itself to particular types of
resources, or to a subset of URIs that maintains characteristics
desired by that application./pp id=rfc.section.1.1.p.7URIs have a
global scope and are interpreted consistently regardless of context,
though the result of that interpretation may be in relation to the
end-user's context. For example, http://localhost/ has the same
interpretation for every user of that reference, even though the
network interface corresponding to localhost may be different for
each end-user: interpretation is independent of access. However, an
action made on the basis of that reference will take place in relation
to the end-user's context, which implies that an action intended to
refer to a globally unique thing must use a URI that distinguishes that
resource from all other things. URIs that identify in relation to the
end-user's local context should only be used when the context itself is
a defining aspect of the resource, such as when an on-line help manual
refers to a file on the end-user's file system (e.g.,
file:///etc/hosts)./ph3 id=rfc.section.1.1.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.1.1.11.1.1./anbsp;a name=generic-syntax href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#generic-syntaxGeneric Syntax/a/h3p id=rfc.section.1.1.1.p.1Each URI begins with a scheme name, as defined in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#scheme title=SchemeSection 3.1/a,
that refers to a specification for assigning identifiers within that
scheme. As such, the URI syntax is a federated and extensible naming
system wherein each scheme's specification may further restrict the
syntax and semantics of identifiers using that scheme./pp id=rfc.section.1.1.1.p.2This
specification defines those elements of the URI syntax that are
required of all URI schemes or are common to many URI schemes. It thus
defines the syntax and semantics needed to implement a
scheme-independent parsing mechanism for URI references, by which the
scheme-dependent handling of a URI can be postponed until the
scheme-dependent semantics are needed. Likewise, protocols and data
formats that make use of URI references can refer to this specification
as a definition for the range of syntax allowed for all URIs, including
those schemes that have yet to be defined. This decouples the evolution
of identification schemes from the evolution of protocols, data
formats, and implementations that make use of URIs./pp id=rfc.section.1.1.1.p.3A
parser of the generic URI syntax can parse any URI reference into its
major components. Once the scheme is determined, further
scheme-specific parsing can be performed on the components. In other
words, the URI generic syntax is a superset of the syntax of all URI
schemes./ph3 id=rfc.section.1.1.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.1.1.21.1.2./anbsp;a name=examples href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#examplesExamples/a/h3pThe following example URIs illustrate several URI schemes and variations in their common syntax components:/ppre ftp://ftp.is.co.za/rfc/rfc1808.txtbr /br / http://www.ietf.org/rfc/rfc2396.txtbr /br / ldap://[2001:db8::7]/c=GB?objectClass?onebr /br / mailto:John.Doe@example.combr /br / news:comp.infosystems.www.servers.unixbr /br / tel:+1-816-555-1212br /br / telnet://192.0.2.16:80/br /br / urn:oasis:names:specification:docbook:dtd:xml:4.1.2br //preh3 id=rfc.section.1.1.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.1.1.31.1.3./anbsp;a name=URLvsURN href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#URLvsURNURI, URL, and URN/a/h3p id=rfc.section.1.1.3.p.1A
URI can be further classified as a locator, a name, or both. The term
Uniform Resource Locator (URL) refers to the subset of URIs that, in
addition to identifying a resource, provide a means of locating the
resource by describing its primary access mechanism (e.g., its network
location). The term Uniform Resource Name (URN) has been used
historically to refer to both URIs under the urn scheme a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC2141 title=URN Syntax[RFC2141]/a,
which are required to remain globally unique and persistent even when
the resource ceases to exist or becomes unavailable, and to any other
URI with the properties of a name./pp id=rfc.section.1.1.3.p.2An
individual scheme does not have to be classified as being just one of
name or locator. Instances of URIs from any given scheme may have
the characteristics of names or locators or both, often depending on
the persistence and care in the assignment of identifiers by the naming
authority, rather than on any quality of the scheme. Future
specifications and related documentation should use the general term
URI rather than the more restrictive terms URL and URN a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC3305 title=Report from the Joint W3C/IETF URI Planning Interest Group: Uniform Resource Identifiers (URIs), URLs, and Uniform Resource Names (URNs): Clarifications and Recommendations[RFC3305]/a./ph2 id=rfc.section.1.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.1.21.2./anbsp;a name=design href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#designDesign Considerations/a/h2h3 id=rfc.section.1.2.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.1.2.11.2.1./anbsp;a name=transcription href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#transcriptionTranscription/a/h3p id=rfc.section.1.2.1.p.1The
URI syntax has been designed with global transcription as one of its
main considerations. A URI is a sequence of characters from a very
limited set: the letters of the basic Latin alphabet, digits, and a few
special characters. A URI may be represented in a variety of ways;
e.g., ink on paper, pixels on a screen, or a sequence of character
encoding octets. The interpretation of a URI depends only on the
characters used and not on how those characters are represented in a
network protocol./pp id=rfc.section.1.2.1.p.2The goal of
transcription can be described by a simple scenario. Imagine two
colleagues, Sam and Kim, sitting in a pub at an international
conference and exchanging research ideas. Sam asks Kim for a location
to get more information, so Kim writes the URI for the research site on
a napkin. Upon returning home, Sam takes out the napkin and types the
URI into a computer, which then retrieves the information to which Kim
referred./pp id=rfc.section.1.2.1.p.3There are several design considerations revealed by the scenario: /pulliA URI is a sequence of characters that is not always represented as a sequence of octets./liliA
URI might be transcribed from a non-network source and thus should
consist of characters that are most likely able to be entered into a
computer, within the constraints imposed by keyboards (and related
input devices) across languages and locales./liliA URI often has to
be remembered by people, and it is easier for people to remember a URI
when it consists of meaningful or familiar components./li/ulp id=rfc.section.1.2.1.p.4These
design considerations are not always in alignment. For example, it is
often the case that the most meaningful name for a URI component would
require characters that cannot be typed into some systems. The ability
to transcribe a resource identifier from one medium to another has been
considered more important than having a URI consist of the most
meaningful of components./pp id=rfc.section.1.2.1.p.5In local or
regional contexts and with improving technology, users might benefit
from being able to use a wider range of characters; such use is not
defined by this specification. Percent-encoded octets (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#percent-encoding title=Percent-EncodingSection 2.1/a)
may be used within a URI to represent characters outside the range of
the US-ASCII coded character set if this representation is allowed by
the scheme or by the protocol element in which the URI is referenced.
Such a definition should specify the character encoding used to map
those characters to octets prior to being percent-encoded for the URI./ph3 id=rfc.section.1.2.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.1.2.21.2.2./anbsp;a name=identification href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#identificationSeparating Identification from Interaction/a/h3p id=rfc.section.1.2.2.p.1A
common misunderstanding of URIs is that they are only used to refer to
accessible resources. The URI itself only provides identification;
access to the resource is neither guaranteed nor implied by the
presence of a URI. Instead, any operation associated with a URI
reference is defined by the protocol element, data format attribute, or
natural language text in which it appears./pp id=rfc.section.1.2.2.p.2Given
a URI, a system may attempt to perform a variety of operations on the
resource, as might be characterized by words such as access,
update, replace, or find attributes. Such operations are defined
by the protocols that make use of URIs, not by this specification.
However, we do use a few general terms for describing common operations
on URIs. URI resolution is the process of determining an access
mechanism and the appropriate parameters necessary to dereference a
URI; this resolution may require several iterations. To use that access
mechanism to perform an action on the URI's resource is to
dereference the URI./pp id=rfc.section.1.2.2.p.3When
URIs are used within information retrieval systems to identify sources
of information, the most common form of URI dereference is retrieval:
making use of a URI in order to retrieve a representation of its
associated resource. A representation is a sequence of octets, along
with representation metadata describing those octets, that constitutes
a record of the state of the resource at the time when the
representation is generated. Retrieval is achieved by a process that
might include using the URI as a cache key to check for a locally
cached representation, resolution of the URI to determine an
appropriate access mechanism (if any), and dereference of the URI for
the sake of applying a retrieval operation. Depending on the protocols
used to perform the retrieval, additional information might be supplied
about the resource (resource metadata) and its relation to other
resources./pp id=rfc.section.1.2.2.p.4URI
references in information retrieval systems are designed to be
late-binding: the result of an access is generally determined when it
is accessed and may vary over time or due to other aspects of the
interaction. These references are created in order to be used in the
future: what is being identified is not some specific result that was
obtained in the past, but rather some characteristic that is expected
to be true for future results. In such cases, the resource referred to
by the URI is actually a sameness of characteristics as observed over
time, perhaps elucidated by additional comments or assertions made by
the resource provider./pp id=rfc.section.1.2.2.p.5Although many
URI schemes are named after protocols, this does not imply that use of
these URIs will result in access to the resource via the named
protocol. URIs are often used simply for the sake of identification.
Even when a URI is used to retrieve a representation of a resource,
that access might be through gateways, proxies, caches, and name
resolution services that are independent of the protocol associated
with the scheme name. The resolution of some URIs may require the use
of more than one protocol (e.g., both DNS and HTTP are typically used
to access an http URI's origin server when a representation isn't
found in a local cache)./ph3 id=rfc.section.1.2.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.1.2.31.2.3./anbsp;a name=hierarchical href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#hierarchicalHierarchical Identifiers/a/h3p id=rfc.section.1.2.3.p.1The
URI syntax is organized hierarchically, with components listed in order
of decreasing significance from left to right. For some URI schemes,
the visible hierarchy is limited to the scheme itself: everything after
the scheme component delimiter (:) is considered opaque to URI
processing. Other URI schemes make the hierarchy explicit and visible
to generic parsing algorithms./pp id=rfc.section.1.2.3.p.2The
generic syntax uses the slash (/), question mark (?), and number
sign (#) characters to delimit components that are significant to the
generic parser's hierarchical interpretation of an identifier. In
addition to aiding the readability of such identifiers through the
consistent use of familiar syntax, this uniform representation of
hierarchy across naming schemes allows scheme-independent references to
be made relative to that hierarchy./pp id=rfc.section.1.2.3.p.3It
is often the case that a group or tree of documents has been
constructed to serve a common purpose, wherein the vast majority of URI
references in these documents point to resources within the tree rather
than outside it. Similarly, documents located at a particular site are
much more likely to refer to other resources at that site than to
resources at remote sites. Relative referencing of URIs allows document
trees to be partially independent of their location and access scheme.
For instance, it is possible for a single set of hypertext documents to
be simultaneously accessible and traversable via each of the file,
http, and ftp schemes if the documents refer to each other with
relative references. Furthermore, such document trees can be moved, as
a whole, without changing any of the relative references./pp id=rfc.section.1.2.3.p.4A relative reference (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-ref title=Relative ReferenceSection 4.2/a)
refers to a resource by describing the difference within a hierarchical
name space between the reference context and the target URI. The
reference resolution algorithm, presented in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#reference-resolution title=Reference ResolutionSection 5/a,
defines how such a reference is transformed to the target URI. As
relative references can only be used within the context of a
hierarchical URI, designers of new URI schemes should use a syntax
consistent with the generic syntax's hierarchical components unless
there are compelling reasons to forbid relative referencing within that
scheme. /pdldd style=margin-top: 0.5em;NOTE: Previous
specifications used the terms partial URI and relative URI to
denote a relative reference to a URI. As some readers misunderstood
those terms to mean that relative URIs are a subset of URIs rather than
a method of referencing URIs, this specification simply refers to them
as relative references./dd/dlp id=rfc.section.1.2.3.p.5All URI
references are parsed by generic syntax parsers when used. However,
because hierarchical processing has no effect on an absolute URI used
in a reference unless it contains one or more dot-segments (complete
path segments of . or .., as described in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#path title=PathSection 3.3/a),
URI scheme specifications can define opaque identifiers by disallowing
use of slash characters, question mark characters, and the URIs
scheme:. and scheme:.../ph2 id=rfc.section.1.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.1.31.3./anbsp;a name=notation href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#notationSyntax Notation/a/h2p id=rfc.section.1.3.p.1This specification uses the Augmented Backus-Naur Form (ABNF) notation of a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC2234 title=Augmented BNF for Syntax Specifications: ABNF[RFC2234]/a,
including the following core ABNF syntax rules defined by that
specification: ALPHA (letters), CR (carriage return), DIGIT (decimal
digits), DQUOTE (double quote), HEXDIG (hexadecimal digits), LF (line
feed), and SP (space). The complete URI syntax is collected in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#collected-abnf title=Collected ABNF for URIAppendix A/a./ph1 id=rfc.section.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.22./anbsp;a name=characters href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#charactersCharacters/a/h1p id=rfc.section.2.p.1The
URI syntax provides a method of encoding data, presumably for the sake
of identifying a resource, as a sequence of characters. The URI
characters are, in turn, frequently encoded as octets for transport or
presentation. This specification does not mandate any particular
character encoding for mapping between URI characters and the octets
used to store or transmit those characters. When a URI appears in a
protocol element, the character encoding is defined by that protocol;
without such a definition, a URI is assumed to be in the same character
encoding as the surrounding text./pp id=rfc.section.2.p.2The ABNF
notation defines its terminal values to be non-negative integers
(codepoints) based on the US-ASCII coded character set a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#ASCII title=Coded Character Set -- 7-bit American Standard Code for Information Interchange[ASCII]/a.
Because a URI is a sequence of characters, we must invert that relation
in order to understand the URI syntax. Therefore, the integer values
used by the ABNF must be mapped back to their corresponding characters
via US-ASCII in order to complete the syntax rules./pp id=rfc.section.2.p.3A
URI is composed from a limited set of characters consisting of digits,
letters, and a few graphic symbols. A reserved subset of those
characters may be used to delimit syntax components within a URI while
the remaining characters, including both the unreserved set and those
reserved characters not acting as delimiters, define each component's
identifying data./ph2 id=rfc.section.2.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.2.12.1./anbsp;a name=percent-encoding href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#percent-encodingPercent-Encoding/a/h2p
A percent-encoding mechanism is used to represent a data octet in a
component when that octet's corresponding character is outside the
allowed set or is being used as a delimiter of, or within, the
component. A percent-encoded octet is encoded as a character triplet,
consisting of the percent character % followed by the two hexadecimal
digits representing that octet's numeric value. For example, %20 is
the percent-encoding for the binary octet 00100000 (ABNF: %x20),
which in US-ASCII corresponds to the space character (SP). a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#when-to-percent-encode title=When to Encode or DecodeSection 2.4/a describes when percent-encoding and decoding is applied. /ppre pct-encoded = % HEXDIG HEXDIGbr //prepThe uppercase hexadecimal digits 'A' through 'F' are
equivalent to the lowercase digits 'a' through 'f', respectively. If
two URIs differ only in the case of hexadecimal digits used in
percent-encoded octets, they are equivalent. For consistency, URI
producers and normalizers should use uppercase hexadecimal digits for
all percent-encodings./ph2 id=rfc.section.2.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.2.22.2./anbsp;a name=reserved href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#reservedReserved Characters/a/h2p
URIs include components and subcomponents that are delimited by
characters in the reserved set. These characters are called
reserved because they may (or may not) be defined as delimiters by
the generic syntax, by each scheme-specific syntax, or by the
implementation-specific syntax of a URI's dereferencing algorithm. If
data for a URI component would conflict with a reserved character's
purpose as a delimiter, then the conflicting data must be
percent-encoded before the URI is formed. /ppre reserved = gen-delims / sub-delimsbr /br / gen-delims = : / / / ? / # / [ / ] / @br /br / sub-delims = ! / $ / amp; / ' / ( / )br / / * / + / , / ; / =br //prep id=rfc.section.2.2.p.2The purpose of reserved characters is
to provide a set of delimiting characters that are distinguishable from
other data within a URI. URIs that differ in the replacement of a
reserved character with its corresponding percent-encoded octet are not
equivalent. Percent-encoding a reserved character, or decoding a
percent-encoded octet that corresponds to a reserved character, will
change how the URI is interpreted by most applications. Thus,
characters in the reserved set are protected from normalization and are
therefore safe to be used by scheme-specific and producer-specific
algorithms for delimiting data subcomponents within a URI./pp id=rfc.section.2.2.p.3A subset of the reserved characters (gen-delims) is used as delimiters of the generic URI components described in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#components title=Syntax ComponentsSection 3/a.
A component's ABNF syntax rule will not use the reserved or gen-delims
rule names directly; instead, each syntax rule lists the characters
allowed within that component (i.e., not delimiting it), and any of
those characters that are also in the reserved set are reserved for
use as subcomponent delimiters within the component. Only the most
common subcomponents are defined by this specification; other
subcomponents may be defined by a URI scheme's specification, or by the
implementation-specific syntax of a URI's dereferencing algorithm,
provided that such subcomponents are delimited by characters in the
reserved set allowed within that component./pp id=rfc.section.2.2.p.4URI
producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component. If a reserved character is found in a URI component and no
delimiting role is known for that character, then it must be
interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII./ph2 id=rfc.section.2.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.2.32.3./anbsp;a name=unreserved href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#unreservedUnreserved Characters/a/h2p
Characters that are allowed in a URI but do not have a reserved purpose
are called unreserved. These include uppercase and lowercase letters,
decimal digits, hyphen, period, underscore, and tilde. /ppre unreserved = ALPHA / DIGIT / - / . / _ / ~br //prep id=rfc.section.2.3.p.2URIs that differ in the replacement
of an unreserved character with its corresponding percent-encoded
US-ASCII octet are equivalent: they identify the same resource.
However, URI comparison implementations do not always perform
normalization prior to comparison (see a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#comparison title=Normalization and ComparisonSection 6/a).
For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A
and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore
(%5F), or tilde (%7E) should not be created by URI producers and, when
found in a URI, should be decoded to their corresponding unreserved
characters by URI normalizers./ph2 id=rfc.section.2.4a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.2.42.4./anbsp;a name=when-to-percent-encode href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#when-to-percent-encodeWhen to Encode or Decode/a/h2p id=rfc.section.2.4.p.1Under
normal circumstances, the only time when octets within a URI are
percent-encoded is during the process of producing the URI from its
component parts. This is when an implementation determines which of the
reserved characters are to be used as subcomponent delimiters and which
can be safely used as data. Once produced, a URI is always in its
percent-encoded form./pp id=rfc.section.2.4.p.2When a URI is
dereferenced, the components and subcomponents significant to the
scheme-specific dereferencing process (if any) must be parsed and
separated before the percent-encoded octets within those components can
be safely decoded, as otherwise the data may be mistaken for component
delimiters. The only exception is for percent-encoded octets
corresponding to characters in the unreserved set, which can be decoded
at any time. For example, the octet corresponding to the tilde (~)
character is often encoded as %7E by older URI processing
implementations; the %7E can be replaced by ~ without changing its
interpretation./pp id=rfc.section.2.4.p.3Because the percent
(%) character serves as the indicator for percent-encoded octets, it
must be percent-encoded as %25 for that octet to be used as data
within a URI. Implementations must not percent-encode or decode the
same string more than once, as decoding an already decoded string might
lead to misinterpreting a percent data octet as the beginning of a
percent-encoding, or vice versa in the case of percent-encoding an
already percent-encoded string./ph2 id=rfc.section.2.5a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.2.52.5./anbsp;a name=identifying-data href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#identifying-dataIdentifying Data/a/h2p id=rfc.section.2.5.p.1URI
characters provide identifying data for each of the URI components,
serving as an external interface for identification between systems.
Although the presence and nature of the URI production interface is
hidden from clients that use its URIs (and is thus beyond the scope of
the interoperability requirements defined by this specification), it is
a frequent source of confusion and errors in the interpretation of URI
character issues. Implementers have to be aware that there are multiple
character encodings involved in the production and transmission of
URIs: local name and data encoding, public interface encoding, URI
character encoding, data format encoding, and protocol encoding./pp id=rfc.section.2.5.p.2Local
names, such as file system names, are stored with a local character
encoding. URI producing applications (e.g., origin servers) will
typically use the local encoding as the basis for producing meaningful
names. The URI producer will transform the local encoding to one that
is suitable for a public interface and then transform the public
interface encoding into the restricted set of URI characters (reserved,
unreserved, and percent-encodings). Those characters are, in turn,
encoded as octets to be used as a reference within a data format (e.g.,
a document charset), and such data formats are often subsequently
encoded for transmission over Internet protocols./pp id=rfc.section.2.5.p.3For
most systems, an unreserved character appearing within a URI component
is interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII. Consumers of URIs assume that the
letter X corresponds to the octet 01011000, and even when that
assumption is incorrect, there is no harm in making it. A system that
internally provides identifiers in the form of a different character
encoding, such as EBCDIC, will generally perform character translation
of textual identifiers to UTF-8 a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#STD63 title=UTF-8, a transformation format of ISO 10646[STD63]/a
(or some other superset of the US-ASCII character encoding) at an
internal interface, thereby providing more meaningful identifiers than
those resulting from simply percent-encoding the original octets./pp id=rfc.section.2.5.p.4For
example, consider an information service that provides data, stored
locally using an EBCDIC-based file system, to clients on the Internet
through an HTTP server. When an author creates a file with the name
Laguna Beach on that file system, the http URI corresponding to
that resource is expected to contain the meaningful string
Laguna%20Beach. If, however, that server produces URIs by using an
overly simplistic raw octet mapping, then the result would be a URI
containing %D3%81%87%A4%95%81@%C2%85%81%83%88. An internal
transcoding interface fixes this problem by transcoding the local name
to a superset of US-ASCII prior to producing the URI. Naturally, proper
interpretation of an incoming URI on such an interface requires that
percent-encoded octets be decoded (e.g., %20 to SP) before the
reverse transcoding is applied to obtain the local name./pp id=rfc.section.2.5.p.5In
some cases, the internal interface between a URI component and the
identifying data that it has been crafted to represent is much less
direct than a character encoding translation. For example, portions of
a URI might reflect a query on non-ASCII data, or numeric coordinates
on a map. Likewise, a URI scheme may define components with additional
encoding requirements that are applied prior to forming the component
and producing the URI./pp id=rfc.section.2.5.p.6When a new URI
scheme defines a component that represents textual data consisting of
characters from the Universal Character Set a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#UCS title=Information Technology - Universal Multiple-Octet Coded Character Set (UCS)[UCS]/a, the data should first be encoded as octets according to the UTF-8 character encoding a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#STD63 title=UTF-8, a transformation format of ISO 10646[STD63]/a;
then only those octets that do not correspond to characters in the
unreserved set should be percent-encoded. For example, the character A
would be represented as A, the character LATIN CAPITAL LETTER A WITH
GRAVE would be represented as %C3%80, and the character KATAKANA
LETTER A would be represented as %E3%82%A2./ph1 id=rfc.section.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.33./anbsp;a name=components href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#componentsSyntax Components/a/h1pThe
generic URI syntax consists of a hierarchical sequence of components
referred to as the scheme, authority, path, query, and fragment. /ppre URI = scheme : hier-part [ ? query ] [ # fragment ]br /br / hier-part = // authority path-abemptybr / / path-absolutebr / / path-rootlessbr / / path-emptybr //prepThe scheme and path components are required, though the path
may be empty (no characters). When authority is present, the path must
either be empty or begin with a slash (/) character. When authority
is not present, the path cannot begin with two slash characters (//).
These restrictions result in five different ABNF rules for a path (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#path title=PathSection 3.3/a), only one of which will match any given URI reference./ppThe following are two example URIs and their component parts:/ppre foo://example.com:8042/over/there?name=ferret#nosebr / \_/ \______________/\_________/ \_________/ \__/br / | | | | |br / scheme authority path query fragmentbr / | _____________________|__br / / \ / \br / urn:example:animal:ferret:nosebr //preh2 id=rfc.section.3.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.3.13.1./anbsp;a name=scheme href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#schemeScheme/a/h2p id=rfc.section.3.1.p.1Each
URI begins with a scheme name that refers to a specification for
assigning identifiers within that scheme. As such, the URI syntax is a
federated and extensible naming system wherein each scheme's
specification may further restrict the syntax and semantics of
identifiers using that scheme./ppScheme
names consist of a sequence of characters beginning with a letter and
followed by any combination of letters, digits, plus (+), period
(.), or hyphen (-). Although schemes are case-insensitive, the
canonical form is lowercase and documents that specify schemes must do
so with lowercase letters. An implementation should accept uppercase
letters as equivalent to lowercase in scheme names (e.g., allow HTTP
as well as http) for the sake of robustness but should only produce
lowercase scheme names for consistency. /ppre scheme = ALPHA *( ALPHA / DIGIT / + / - / . )br //prepIndividual schemes are not specified by this document. The process for registration of new URI schemes is defined separately by a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#BCP35 title=Registration Procedures for URL Scheme Names[BCP35]/a.
The scheme registry maintains the mapping between scheme names and
their specifications. Advice for designers of new URI schemes can be
found in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC2718 title=Guidelines for new URL Schemes[RFC2718]/a.
URI scheme specifications must define their own syntax so that all
strings matching their scheme-specific syntax will also match the
lt;absolute-URIgt; grammar, as described in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#absolute-uri title=Absolute URISection 4.3/a./pp id=rfc.section.3.1.p.3When
presented with a URI that violates one or more scheme-specific
restrictions, the scheme-specific resolution process should flag the
reference as an error rather than ignore the unused parts; doing so
reduces the number of equivalent URIs and helps detect abuses of the
generic syntax, which might indicate that the URI has been constructed
to mislead the user (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#security-semantic title=Semantic AttacksSection 7.6/a)./ph2 id=rfc.section.3.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.3.23.2./anbsp;a name=authority href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#authorityAuthority/a/h2p id=rfc.section.3.2.p.1Many
URI schemes include a hierarchical element for a naming authority so
that governance of the name space defined by the remainder of the URI
is delegated to that authority (which may, in turn, delegate it
further). The generic syntax provides a common means for distinguishing
an authority based on a registered name or server address, along with
optional port and user information./ppThe
authority component is preceded by a double slash (//) and is
terminated by the next slash (/), question mark (?), or number sign
(#) character, or by the end of the URI. /ppre authority = [ userinfo @ ] host [ : port ]br //prepURI producers and normalizers should omit the : delimiter
that separates host from port if the port component is empty. Some
schemes do not allow the userinfo and/or port subcomponents./pp id=rfc.section.3.2.p.3If
a URI contains an authority component, then the path component must
either be empty or begin with a slash (/) character. Non-validating
parsers (those that merely separate a URI reference into its major
components) will often ignore the subcomponent structure of authority,
treating it as an opaque string from the double-slash to the first
terminating delimiter, until such time as the URI is dereferenced./ph3 id=rfc.section.3.2.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.3.2.13.2.1./anbsp;a name=userinfo href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#userinfoUser Information/a/h3pThe
userinfo subcomponent may consist of a user name and, optionally,
scheme-specific information about how to gain authorization to access
the resource. The user information, if present, is followed by a
commercial at-sign (@) that delimits it from the host. /ppre userinfo = *( unreserved / pct-encoded / sub-delims / : )br //prep id=rfc.section.3.2.1.p.2Use of the format
user:password in the userinfo field is deprecated. Applications
should not render as clear text any data after the first colon (:)
character found within a userinfo subcomponent unless the data after
the colon is the empty string (indicating no password). Applications
may choose to ignore or reject such data when it is received as part of
a reference and should reject the storage of such data in unencrypted
form. The passing of authentication information in clear text has
proven to be a security risk in almost every case where it has been
used./pp id=rfc.section.3.2.1.p.3Applications that render a URI
for the sake of user feedback, such as in graphical hypertext browsing,
should render userinfo in a way that is distinguished from the rest of
a URI, when feasible. Such rendering will assist the user in cases
where the userinfo has been misleadingly crafted to look like a trusted
domain name (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#security-semantic title=Semantic AttacksSection 7.6/a)./ph3 id=rfc.section.3.2.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.3.2.23.2.2./anbsp;a name=host href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#hostHost/a/h3pThe
host subcomponent of authority is identified by an IP literal
encapsulated within square brackets, an IPv4 address in dotted-decimal
form, or a registered name. The host subcomponent is case-insensitive.
The presence of a host subcomponent within a URI does not imply that
the scheme requires access to the given host on the Internet. In many
cases, the host syntax is used only for the sake of reusing the
existing registration process created and deployed for DNS, thus
obtaining a globally unique name without the cost of deploying another
registry. However, such use comes with its own costs: domain name
ownership may change over time for reasons not anticipated by the URI
producer. In other cases, the data within the host component identifies
a registered name that has nothing to do with an Internet host. We use
the name host for the ABNF rule because that is its most common
purpose, not its only purpose. /ppre host = IP-literal / IPv4address / reg-namebr //prepThe syntax rule for host is ambiguous because it does not
completely distinguish between an IPv4address and a reg-name. In order
to disambiguate the syntax, we apply the first-match-wins algorithm:
If host matches the rule for IPv4address, then it should be considered
an IPv4 address literal and not a reg-name. Although host is
case-insensitive, producers and normalizers should use lowercase for
registered names and hexadecimal addresses for the sake of uniformity,
while only using uppercase letters for percent-encodings./pp A host identified by an Internet Protocol literal address, version 6 a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC3513 title=Internet Protocol Version 6 (IPv6) Addressing Architecture[RFC3513]/a
or later, is distinguished by enclosing the IP literal within square
brackets ([ and ]). This is the only place where square bracket
characters are allowed in the URI syntax. In anticipation of future,
as-yet-undefined IP literal address formats, an implementation may use
an optional version flag to indicate such a format explicitly rather
than rely on heuristic determination. /ppre IP-literal = [ ( IPv6address / IPvFuture ) ]br /br / IPvFuture = v 1*HEXDIG . 1*( unreserved / sub-delims / : )br //prepThe version flag does not indicate the IP version; rather, it
indicates future versions of the literal format. As such,
implementations must not provide the version flag for the existing IPv4
and IPv6 literal address forms described below. If a URI containing an
IP-literal that starts with v (case-insensitive), indicating that the
version flag is present, is dereferenced by an application that does
not know the meaning of that version flag, then the application should
return an appropriate error for address mechanism not supported./pp id=rfc.section.3.2.2.p.3A
host identified by an IPv6 literal address is represented inside the
square brackets without a preceding version flag. The ABNF provided
here is a translation of the text definition of an IPv6 literal address
provided in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC3513 title=Internet Protocol Version 6 (IPv6) Addressing Architecture[RFC3513]/a. This syntax does not support IPv6 scoped addressing zone identifiers./ppA
128-bit IPv6 address is divided into eight 16-bit pieces. Each piece is
represented numerically in case-insensitive hexadecimal, using one to
four hexadecimal digits (leading zeroes are permitted). The eight
encoded pieces are given most-significant first, separated by colon
characters. Optionally, the least-significant two pieces may instead be
represented in IPv4 address textual format. A sequence of one or more
consecutive zero-valued 16-bit pieces within the address may be elided,
omitting all their digits and leaving exactly two consecutive colons in
their place to mark the elision. /ppre IPv6address = 6( h16 : ) ls32br / / :: 5( h16 : ) ls32br / / [ h16 ] :: 4( h16 : ) ls32br / / [ *1( h16 : ) h16 ] :: 3( h16 : ) ls32br / / [ *2( h16 : ) h16 ] :: 2( h16 : ) ls32br / / [ *3( h16 : ) h16 ] :: h16 : ls32br / / [ *4( h16 : ) h16 ] :: ls32br / / [ *5( h16 : ) h16 ] :: h16br / / [ *6( h16 : ) h16 ] ::br /br / ls32 = ( h16 : h16 ) / IPv4addressbr / ; least-significant 32 bits of addressbr /br / h16 = 1*4HEXDIGbr / ; 16 bits of address represented in hexadecimalbr //prep
A host identified by an IPv4 literal address is represented in
dotted-decimal notation (a sequence of four decimal numbers in the
range 0 to 255, separated by .), as described in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC1123 title=Requirements for Internet Hosts - Application and Support[RFC1123]/a by reference to a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC0952 title=DoD Internet host table specification[RFC0952]/a. Note that other forms of dotted notation may be interpreted on some platforms, as described in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#security-ipv4 title=Rare IP Address FormatsSection 7.4/a, but only the dotted-decimal form of four octets is allowed by this grammar. /ppre IPv4address = dec-octet . dec-octet . dec-octet . dec-octetbr /br / dec-octet = DIGIT ; 0-9br / / %x31-39 DIGIT ; 10-99br / / 1 2DIGIT ; 100-199br / / 2 %x30-34 DIGIT ; 200-249br / / 25 %x30-35 ; 250-255br //prep
A host identified by a registered name is a sequence of characters
usually intended for lookup within a locally defined host or service
name registry, though the URI's scheme-specific semantics may require
that a specific registry (or fixed name table) be used instead. The
most common name registry mechanism is the Domain Name System (DNS). A
registered name intended for lookup in the DNS uses the syntax defined
in Section 3.5 of a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC1034 title=Domain names - concepts and facilities[RFC1034]/a and Section 2.1 of a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC1123 title=Requirements for Internet Hosts - Application and Support[RFC1123]/a.
Such a name consists of a sequence of domain labels separated by .,
each domain label starting and ending with an alphanumeric character
and possibly also containing - characters. The rightmost domain label
of a fully qualified domain name in DNS may be followed by a single .
and should be if it is necessary to distinguish between the complete
domain name and some local domain. /ppre reg-name = *( unreserved / pct-encoded / sub-delims )br //prepIf the URI scheme defines a default for host, then that
default applies when the host subcomponent is undefined or when the
registered name is empty (zero length). For example, the file URI
scheme is defined so that no authority, an empty host, and localhost
all mean the end-user's machine, whereas the http scheme considers a
missing authority or empty host invalid./pp id=rfc.section.3.2.2.p.7This
specification does not mandate a particular registered name lookup
technology and therefore does not restrict the syntax of reg-name
beyond what is necessary for interoperability. Instead, it delegates
the issue of registered name syntax conformance to the operating system
of each application performing URI resolution, and that operating
system decides what it will allow for the purpose of host
identification. A URI resolution implementation might use DNS, host
tables, yellow pages, NetInfo, WINS, or any other system for lookup of
registered names. However, a globally scoped naming system, such as DNS
fully qualified domain names, is necessary for URIs intended to have
global scope. URI producers should use names that conform to the DNS
syntax, even when use of DNS is not immediately apparent, and should
limit these names to no more than 255 characters in length./pp id=rfc.section.3.2.2.p.8The
reg-name syntax allows percent-encoded octets in order to represent
non-ASCII registered names in a uniform way that is independent of the
underlying name resolution technology. Non-ASCII characters must first
be encoded according to UTF-8 a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#STD63 title=UTF-8, a transformation format of ISO 10646[STD63]/a,
and then each octet of the corresponding UTF-8 sequence must be
percent-encoded to be represented as URI characters. URI producing
applications must not use percent-encoding in host unless it is used to
represent a UTF-8 character sequence. When a non-ASCII registered name
represents an internationalized domain name intended for resolution via
the DNS, the name must be transformed to the IDNA encoding a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC3490 title=Internationalizing Domain Names in Applications (IDNA)[RFC3490]/a
prior to name lookup. URI producers should provide these registered
names in the IDNA encoding, rather than a percent-encoding, if they
wish to maximize interoperability with legacy URI resolvers./ph3 id=rfc.section.3.2.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.3.2.33.2.3./anbsp;a name=port href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#portPort/a/h3pThe
port subcomponent of authority is designated by an optional port number
in decimal following the host and delimited from it by a single colon
(:) character. /ppre port = *DIGITbr //prepA scheme may define a default port. For example, the http
scheme defines a default port of 80, corresponding to its reserved
TCP port number. The type of port designated by the port number (e.g.,
TCP, UDP, SCTP) is defined by the URI scheme. URI producers and
normalizers should omit the port component and its : delimiter if
port is empty or if its value would be the same as that of the scheme's
default./ph2 id=rfc.section.3.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.3.33.3./anbsp;a name=path href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#pathPath/a/h2p id=rfc.section.3.3.p.1The path component contains data, usually organized in hierarchical form, that, along with data in the non-hierarchical a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#queryquery component/a,
serves to identify a resource within the scope of the URI's scheme and
naming authority (if any). The path is terminated by the first question
mark (?) or number sign (#) character, or by the end of the URI./ppIf
a URI contains an authority component, then the path component must
either be empty or begin with a slash (/) character. If a URI does
not contain an authority component, then the path cannot begin with two
slash characters (//). In addition, a URI reference (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#uri-reference title=URI ReferenceSection 4.1/a)
may be a relative-path reference, in which case the first path segment
cannot contain a colon (:) character. The ABNF requires five separate
rules to disambiguate these cases, only one of which will match the
path substring within a given URI reference. We use the generic term
path component to describe the URI substring matched by the parser to
one of these rules. /ppre path = path-abempty ; begins with / or is emptybr / / path-absolute ; begins with / but not //br / / path-noscheme ; begins with a non-colon segmentbr / / path-rootless ; begins with a segmentbr / / path-empty ; zero charactersbr /br / path-abempty = *( / segment )br / path-absolute = / [ segment-nz *( / segment ) ]br / path-noscheme = segment-nz-nc *( / segment )br / path-rootless = segment-nz *( / segment )br / path-empty = 0lt;pchargt;br /br / segment = *pcharbr / segment-nz = 1*pcharbr / segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / @ )br / ; non-zero-length segment without any colon :br /br / pchar = unreserved / pct-encoded / sub-delims / : / @br //prep id=rfc.section.3.3.p.3A path consists of a sequence of path
segments separated by a slash (/) character. A path is always defined
for a URI, though the defined path may be empty (zero length). Use of
the slash character to indicate hierarchy is only required when a URI
will be used as the context for relative references. For example, the
URI lt;mailto:fred@example.comgt; has a path of fred@example.com,
whereas the URI lt;foo://info.example.com?fredgt; has an empty path./pp id=rfc.section.3.3.p.4The
path segments . and .., also known as dot-segments, are defined for
relative reference within the path name hierarchy. They are intended
for use at the beginning of a relative-path reference (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-ref title=Relative ReferenceSection 4.2/a)
to indicate relative position within the hierarchical tree of names.
This is similar to their role within some operating systems' file
directory structures to indicate the current directory and parent
directory, respectively. However, unlike in a file system, these
dot-segments are only interpreted within the URI path hierarchy and are
removed as part of the resolution process (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#absolutize title=Relative ResolutionSection 5.2/a)./pp id=rfc.section.3.3.p.5Aside
from dot-segments in hierarchical paths, a path segment is considered
opaque by the generic syntax. URI producing applications often use the
reserved characters allowed in a segment to delimit scheme-specific or
dereference-handler-specific subcomponents. For example, the semicolon
(;) and equals (=) reserved characters are often used to delimit
parameters and parameter values applicable to that segment. The comma
(,) reserved character is often used for similar purposes. For
example, one URI producer might use a segment such as name;v=1.1 to
indicate a reference to version 1.1 of name, whereas another might
use a segment such as name,1.1 to indicate the same. Parameter types
may be defined by scheme-specific semantics, but in most cases the
syntax of a parameter is specific to the implementation of the URI's
dereferencing algorithm./ph2 id=rfc.section.3.4a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.3.43.4./anbsp;a name=query href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#queryQuery/a/h2p The query component contains non-hierarchical data that, along with data in the a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#pathpath component/a,
serves to identify a resource within the scope of the URI's scheme and
naming authority (if any). The query component is indicated by the
first question mark (?) character and terminated by a number sign
(#) character or by the end of the URI. /ppre query = *( pchar / / / ? )br //prep id=rfc.section.3.4.p.2The characters slash (/) and
question mark (?) may represent data within the query component.
Beware that some older, erroneous implementations may not handle such
data correctly when it is used as the base URI for relative references (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-uri title=Establishing a Base URISection 5.1/a),
apparently because they fail to distinguish query data from path data
when looking for hierarchical separators. However, as query components
are often used to carry identifying information in the form of
key=value pairs and one frequently used value is a reference to
another URI, it is sometimes better for usability to avoid
percent-encoding those characters./ph2 id=rfc.section.3.5a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.3.53.5./anbsp;a name=fragment href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#fragmentFragment/a/h2p
The fragment identifier component of a URI allows indirect
identification of a secondary resource by reference to a primary
resource and additional identifying information. The identified
secondary resource may be some portion or subset of the primary
resource, some view on representations of the primary resource, or some
other resource defined or described by those representations. A
fragment identifier component is indicated by the presence of a number
sign (#) character and terminated by the end of the URI. /ppre fragment = *( pchar / / / ? )br //prep id=rfc.section.3.5.p.2The semantics of a fragment
identifier are defined by the set of representations that might result
from a retrieval action on the primary resource. The fragment's format
and resolution is therefore dependent on the media type a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC2046 title=Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types[RFC2046]/a
of a potentially retrieved representation, even though such a retrieval
is only performed if the URI is dereferenced. If no such representation
exists, then the semantics of the fragment are considered unknown and
are effectively unconstrained. Fragment identifier semantics are
independent of the URI scheme and thus cannot be redefined by scheme
specifications./pp id=rfc.section.3.5.p.3Individual media types
may define their own restrictions on or structures within the fragment
identifier syntax for specifying different types of subsets, views, or
external references that are identifiable as secondary resources by
that media type. If the primary resource has multiple representations,
as is often the case for resources whose representation is selected
based on attributes of the retrieval request (a.k.a., content
negotiation), then whatever is identified by the fragment should be
consistent across all of those representations. Each representation
should either define the fragment so that it corresponds to the same
secondary resource, regardless of how it is represented, or should
leave the fragment undefined (i.e., not found)./pp id=rfc.section.3.5.p.4As
with any URI, use of a fragment identifier component does not imply
that a retrieval action will take place. A URI with a fragment
identifier may be used to refer to the secondary resource without any
implication that the primary resource is accessible or will ever be
accessed./pp id=rfc.section.3.5.p.5Fragment identifiers have a
special role in information retrieval systems as the primary form of
client-side indirect referencing, allowing an author to specifically
identify aspects of an existing resource that are only indirectly
provided by the resource owner. As such, the fragment identifier is not
used in the scheme-specific processing of a URI; instead, the fragment
identifier is separated from the rest of the URI prior to a
dereference, and thus the identifying information within the fragment
itself is dereferenced solely by the user agent, regardless of the URI
scheme. Although this separate handling is often perceived to be a loss
of information, particularly for accurate redirection of references as
resources move over time, it also serves to prevent information
providers from denying reference authors the right to refer to
information within a resource selectively. Indirect referencing also
provides additional flexibility and extensibility to systems that use
URIs, as new media types are easier to define and deploy than new
schemes of identification./pp id=rfc.section.3.5.p.6The
characters slash (/) and question mark (?) are allowed to represent
data within the fragment identifier. Beware that some older, erroneous
implementations may not handle this data correctly when it is used as
the base URI for relative references (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-uri title=Establishing a Base URISection 5.1/a)./ph1 id=rfc.section.4a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.44./anbsp;a name=usage href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#usageUsage/a/h1p id=rfc.section.4.p.1When
applications make reference to a URI, they do not always use the full
form of reference defined by the URI syntax rule. To save space and
take advantage of hierarchical locality, many Internet protocol
elements and media type formats allow an abbreviation of a URI, whereas
others restrict the syntax to a particular form of URI. We define the
most common forms of reference syntax in this specification because
they impact and depend upon the design of the generic syntax, requiring
a uniform parsing algorithm in order to be interpreted consistently./ph2 id=rfc.section.4.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.4.14.1./anbsp;a name=uri-reference href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#uri-referenceURI Reference/a/h2pURI-reference is used to denote the most common usage of a resource identifier. /ppre URI-reference = URI / relative-refbr //prepA URI-reference is either a URI or a relative reference. If
the URI-reference's prefix does not match the syntax of a scheme
followed by its colon separator, then the URI-reference is a relative
reference./pp id=rfc.section.4.1.p.2A URI-reference is typically
parsed first into the five URI components, in order to determine what
components are present and whether the reference is relative. Then,
each component is parsed for its subparts and their validation. The
ABNF of URI-reference, along with the first-match-wins disambiguation
rule, is sufficient to define a validating parser for the generic
syntax. Readers familiar with regular expressions should see a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#regexp title=Parsing a URI Reference with a Regular ExpressionAppendix B/a for an example of a non-validating URI-reference parser that will take any given string and extract the URI components./ph2 id=rfc.section.4.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.4.24.2./anbsp;a name=relative-ref href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-refRelative Reference/a/h2pA relative reference takes advantage of the hierarchical syntax (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#hierarchical title=Hierarchical IdentifiersSection 1.2.3/a) to express a URI reference relative to the name space of another hierarchical URI. /ppre relative-ref = relative-part [ ? query ] [ # fragment ]br /br / relative-part = // authority path-abemptybr / / path-absolutebr / / path-noschemebr / / path-emptybr //prepThe URI referred to by a relative reference, also known as the
target URI, is obtained by applying the reference resolution algorithm
of a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#reference-resolution title=Reference ResolutionSection 5/a./pp id=rfc.section.4.2.p.2A
relative reference that begins with two slash characters is termed a
network-path reference; such references are rarely used. A relative
reference that begins with a single slash character is termed an
absolute-path reference. A relative reference that does not begin with
a slash character is termed a relative-path reference./pp id=rfc.section.4.2.p.3A
path segment that contains a colon character (e.g., this:that) cannot
be used as the first segment of a relative-path reference, as it would
be mistaken for a scheme name. Such a segment must be preceded by a
dot-segment (e.g., ./this:that) to make a relative-path reference./ph2 id=rfc.section.4.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.4.34.3./anbsp;a name=absolute-uri href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#absolute-uriAbsolute URI/a/h2pSome
protocol elements allow only the absolute form of a URI without a
fragment identifier. For example, defining a base URI for later use by
relative references calls for an absolute-URI syntax rule that does not
allow a fragment. /ppre absolute-URI = scheme : hier-part [ ? query ]br //prepURI scheme specifications must define their own syntax so that
all strings matching their scheme-specific syntax will also match the
lt;absolute-URIgt; grammar. Scheme specifications will not define
fragment identifier syntax or usage, regardless of its applicability to
resources identifiable via that scheme, as fragment identification is
orthogonal to scheme definition. However, scheme specifications are
encouraged to include a wide range of examples, including examples that
show use of the scheme's URIs with fragment identifiers when such usage
is appropriate./ph2 id=rfc.section.4.4a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.4.44.4./anbsp;a name=same-document href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#same-documentSame-Document Reference/a/h2p id=rfc.section.4.4.p.1When a URI reference refers to a URI that is, aside from its fragment component (if any), identical to the base URI (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-uri title=Establishing a Base URISection 5.1/a),
that reference is called a same-document reference. The most frequent
examples of same-document references are relative references that are
empty or include only the number sign (#) separator followed by a
fragment identifier./pp id=rfc.section.4.4.p.2When a
same-document reference is dereferenced for a retrieval action, the
target of that reference is defined to be within the same entity
(representation, document, or message) as the reference; therefore, a
dereference should not result in a new retrieval action./pp id=rfc.section.4.4.p.3Normalization of the base and target URIs prior to their comparison, as described in Sections a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-syntax title=Syntax-Based Normalization6.2.2/a and a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-scheme title=Scheme-Based Normalization6.2.3/a,
is allowed but rarely performed in practice. Normalization may increase
the set of same-document references, which may be of benefit to some
caching applications. As such, reference authors should not assume that
a slightly different, though equivalent, reference URI will (or will
not) be interpreted as a same-document reference by any given
application./ph2 id=rfc.section.4.5a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.4.54.5./anbsp;a name=suffix href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#suffixSuffix Reference/a/h2p id=rfc.section.4.5.p.1The
URI syntax is designed for unambiguous reference to resources and
extensibility via the URI scheme. However, as URI identification and
usage have become commonplace, traditional media (television, radio,
newspapers, billboards, etc.) have increasingly used a suffix of the
URI as a reference, consisting of only the authority and path portions
of the URI, such as/ppre www.w3.org/Addressing/br //prep id=rfc.section.4.5.p.3or simply a DNS registered name on
its own. Such references are primarily intended for human
interpretation rather than for machines, with the assumption that
context-based heuristics are sufficient to complete the URI (e.g., most
registered names beginning with www are likely to have a URI prefix
of http://). Although there is no standard set of heuristics for
disambiguating a URI suffix, many client implementations allow them to
be entered by the user and heuristically resolved./pp id=rfc.section.4.5.p.4Although
this practice of using suffix references is common, it should be
avoided whenever possible and should never be used in situations where
long-term references are expected. The heuristics noted above will
change over time, particularly when a new URI scheme becomes popular,
and are often incorrect when used out of context. Furthermore, they can
lead to security issues along the lines of those described in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC1535 title=A Security Problem and Proposed Correction With Widely Deployed DNS Software[RFC1535]/a./pp id=rfc.section.4.5.p.5As
a URI suffix has the same syntax as a relative-path reference, a suffix
reference cannot be used in contexts where a relative reference is
expected. As a result, suffix references are limited to places where
there is no defined base URI, such as dialog boxes and off-line
advertisements./ph1 id=rfc.section.5a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.55./anbsp;a name=reference-resolution href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#reference-resolutionReference Resolution/a/h1p id=rfc.section.5.p.1This
section defines the process of resolving a URI reference within a
context that allows relative references so that the result is a string
matching the lt;URIgt; syntax rule of a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#components title=Syntax ComponentsSection 3/a./ph2 id=rfc.section.5.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.15.1./anbsp;a name=base-uri href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-uriEstablishing a Base URI/a/h2p id=rfc.section.5.1.p.1The
term relative implies that a base URI exists against which the
relative reference is applied. Aside from fragment-only references (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#same-document title=Same-Document ReferenceSection 4.4/a),
relative references are only usable when a base URI is known. A base
URI must be established by the parser prior to parsing URI references
that might be relative. A base URI must conform to the
lt;absolute-URIgt; syntax rule (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#absolute-uri title=Absolute URISection 4.3/a).
If the base URI is obtained from a URI reference, then that reference
must be converted to absolute form and stripped of any fragment
component prior to its use as a base URI./pp id=rfc.section.5.1.p.2The
base URI of a reference can be established in one of four ways,
discussed below in order of precedence. The order of precedence can be
thought of in terms of layers, where the innermost defined base URI has
the highest precedence. This can be visualized graphically as follows:/ppre .----------------------------------------------------------.br / | .----------------------------------------------------. |br / | | .----------------------------------------------. | |br / | | | .----------------------------------------. | | |br / | | | | .----------------------------------. | | | |br / | | | | | lt;relative-referencegt; | | | | |br / | | | | `----------------------------------' | | | |br / | | | | (5.1.1) Base URI embedded in content | | | |br / | | | `----------------------------------------' | | |br / | | | (5.1.2) Base URI of the encapsulating entity | | |br / | | | (message, representation, or none) | | |br / | | `----------------------------------------------' | |br / | | (5.1.3) URI used to retrieve the entity | |br / | `----------------------------------------------------' |br / | (5.1.4) Default Base URI (application-dependent) |br / `----------------------------------------------------------'br //preh3 id=rfc.section.5.1.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.1.15.1.1./anbsp;a name=base-content href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-contentBase URI Embedded in Content/a/h3p id=rfc.section.5.1.1.p.1Within
certain media types, a base URI for relative references can be embedded
within the content itself so that it can be readily obtained by a
parser. This can be useful for descriptive documents, such as tables of
contents, which may be transmitted to others through protocols other
than their usual retrieval context (e.g., email or USENET news)./pp id=rfc.section.5.1.1.p.2It
is beyond the scope of this specification to specify how, for each
media type, a base URI can be embedded. The appropriate syntax, when
available, is described by the data format specification associated
with each media type./ph3 id=rfc.section.5.1.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.1.25.1.2./anbsp;a name=base-encapsulated href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-encapsulatedBase URI from the Encapsulating Entity/a/h3p id=rfc.section.5.1.2.p.1If
no base URI is embedded, the base URI is defined by the
representation's retrieval context. For a document that is enclosed
within another entity, such as a message or archive, the retrieval
context is that entity. Thus, the default base URI of a representation
is the base URI of the entity in which the representation is
encapsulated./pp id=rfc.section.5.1.2.p.2A mechanism for embedding a base URI within MIME container types (e.g., the message and multipart types) is defined by MHTML a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC2557 title=MIME Encapsulation of Aggregate Documents, such as HTML (MHTML)[RFC2557]/a.
Protocols that do not use the MIME message header syntax, but that do
allow some form of tagged metadata to be included within messages, may
define their own syntax for defining a base URI as part of a message./ph3 id=rfc.section.5.1.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.1.35.1.3./anbsp;a name=base-retrieval href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-retrievalBase URI from the Retrieval URI/a/h3p id=rfc.section.5.1.3.p.1If
no base URI is embedded and the representation is not encapsulated
within some other entity, then, if a URI was used to retrieve the
representation, that URI shall be considered the base URI. Note that if
the retrieval was the result of a redirected request, the last URI used
(i.e., the URI that resulted in the actual retrieval of the
representation) is the base URI./ph3 id=rfc.section.5.1.4a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.1.45.1.4./anbsp;a name=base-default href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-defaultDefault Base URI/a/h3p id=rfc.section.5.1.4.p.1If
none of the conditions described above apply, then the base URI is
defined by the context of the application. As this definition is
necessarily application-dependent, failing to define a base URI by
using one of the other methods may result in the same content being
interpreted differently by different types of applications./pp id=rfc.section.5.1.4.p.2A
sender of a representation containing relative references is
responsible for ensuring that a base URI for those references can be
established. Aside from fragment-only references, relative references
can only be used reliably in situations where the base URI is well
defined./ph2 id=rfc.section.5.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.25.2./anbsp;a name=absolutize href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#absolutizeRelative Resolution/a/h2p id=rfc.section.5.2.p.1This
section describes an algorithm for converting a URI reference that
might be relative to a given base URI into the parsed components of the
reference's target. The components can then be recomposed, as described
in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#recomposition title=Component RecompositionSection 5.3/a,
to form the target URI. This algorithm provides definitive results that
can be used to test the output of other implementations. Applications
may implement relative reference resolution by using some other
algorithm, provided that the results match what would be given by this
one./ph3 id=rfc.section.5.2.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.2.15.2.1./anbsp;a name=relative-base href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-basePre-parse the Base URI/a/h3p id=rfc.section.5.2.1.p.1The base URI (Base) is established according to the procedure of a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#base-uri title=Establishing a Base URISection 5.1/a and parsed into the five main components described in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#components title=Syntax ComponentsSection 3/a.
Note that only the scheme component is required to be present in a base
URI; the other components may be empty or undefined. A component is
undefined if its associated delimiter does not appear in the URI
reference; the path component is never undefined, though it may be
empty./pp id=rfc.section.5.2.1.p.2Normalization of the base URI, as described in Sections a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-syntax title=Syntax-Based Normalization6.2.2/a and a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-scheme title=Scheme-Based Normalization6.2.3/a, is optional. A URI reference must be transformed to its target URI before it can be normalized./ph3 id=rfc.section.5.2.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.2.25.2.2./anbsp;a name=relative-transform href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-transformTransform References/a/h3pFor each URI reference (R), the following pseudocode describes an algorithm for transforming R into its target URI (T):/p pre -- The URI reference is parsed into the five URI componentsbr / --br / (R.scheme, R.authority, R.path, R.query, R.fragment) = parse(R);br /br / -- A non-strict parser may ignore a scheme in the referencebr / -- if it is identical to the base URI's scheme.br / --br / if ((not strict) and (R.scheme == Base.scheme)) thenbr / undefine(R.scheme);br / endif;br /br / if defined(R.scheme) thenbr / T.scheme = R.scheme;br / T.authority = R.authority;br / T.path = remove_dot_segments(R.path);br / T.query = R.query;br / elsebr / if defined(R.authority) thenbr / T.authority = R.authority;br / T.path = remove_dot_segments(R.path);br / T.query = R.query;br / elsebr / if (R.path == ) thenbr / T.path = Base.path;br / if defined(R.query) thenbr / T.query = R.query;br / elsebr / T.query = Base.query;br / endif;br / elsebr / if (R.path starts-with /) thenbr / T.path = remove_dot_segments(R.path);br / elsebr / T.path = merge(Base.path, R.path);br / T.path = remove_dot_segments(T.path);br / endif;br / T.query = R.query;br / endif;br / T.authority = Base.authority;br / endif;br / T.scheme = Base.scheme;br / endif;br /br / T.fragment = R.fragment;br //preh3 id=rfc.section.5.2.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.2.35.2.3./anbsp;a name=relative-merge href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-mergeMerge Paths/a/h3p id=rfc.section.5.2.3.p.1The
pseudocode above refers to a merge routine for merging a
relative-path reference with the path of the base URI. This is
accomplished as follows: /pulliIf the base URI has a defined
authority component and an empty path, then return a string consisting
of / concatenated with the reference's path; otherwise,/lilireturn
a string consisting of the reference's path component appended to all
but the last segment of the base URI's path (i.e., excluding any
characters after the right-most / in the base URI path, or excluding
the entire base URI path if it does not contain any / characters)./li/ulh3 id=rfc.section.5.2.4a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.2.45.2.4./anbsp;a name=relative-dot-segments href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-dot-segmentsRemove Dot Segments/a/h3p id=rfc.section.5.2.4.p.1The
pseudocode also refers to a remove_dot_segments routine for
interpreting and removing the special . and .. complete path
segments from a referenced path. This is done after the path is
extracted from a reference, whether or not the path was relative, in
order to remove any invalid or extraneous dot-segments prior to forming
the target URI. Although there are many ways to accomplish this removal
process, we describe a simple method using two string buffers. /polliThe input buffer is initialized with the now-appended path components and the output buffer is initialized to the empty string./liliWhile the input buffer is not empty, loop as follows: ol style=list-style-type: upper-alpha;liIf the input buffer begins with a prefix of ../ or ./, then remove that prefix from the input buffer; otherwise,/liliif
the input buffer begins with a prefix of /./ or /., where . is a
complete path segment, then replace that prefix with / in the input
buffer; otherwise,/liliif the input buffer begins with a prefix of
/../ or /.., where .. is a complete path segment, then replace
that prefix with / in the input buffer and remove the last segment
and its preceding / (if any) from the output buffer; otherwise,/liliif the input buffer consists only of . or .., then remove that from the input buffer; otherwise,/lilimove
the first path segment in the input buffer to the end of the output
buffer, including the initial / character (if any) and any subsequent
characters up to, but not including, the next / character or the end
of the input buffer./li/ol /liliFinally, the output buffer is returned as the result of remove_dot_segments./li/olp id=rfc.section.5.2.4.p.2Note
that dot-segments are intended for use in URI references to express an
identifier relative to the hierarchy of names in the base URI. The
remove_dot_segments algorithm respects that hierarchy by removing extra
dot-segments rather than treat them as an error or leaving them to be
misinterpreted by dereference implementations./ppThe
following illustrates how the above steps are applied for two examples
of merged paths, showing the state of the two buffers after each step./ppre STEP OUTPUT BUFFER INPUT BUFFERbr /br / 1 : /a/b/c/./../../gbr / 2E: /a /b/c/./../../gbr / 2E: /a/b /c/./../../gbr / 2E: /a/b/c /./../../gbr / 2B: /a/b/c /../../gbr / 2C: /a/b /../gbr / 2C: /a /gbr / 2E: /a/gbr /br / STEP OUTPUT BUFFER INPUT BUFFERbr /br / 1 : mid/content=5/../6br / 2E: mid /content=5/../6br / 2E: mid/content=5 /../6br / 2C: mid /6br / 2E: mid/6br //prepSome applications may find it more efficient to implement the
remove_dot_segments algorithm by using two segment stacks rather than
strings./pp id=rfc.section.5.2.4.p.4 /pdldd style=margin-top: 0.5em;Note:
Beware that some older, erroneous implementations will fail to separate
a reference's query component from its path component prior to merging
the base and reference paths, resulting in an interoperability failure
if the query component contains the strings /../ or /././dd/dlh2 id=rfc.section.5.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.35.3./anbsp;a name=recomposition href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#recompositionComponent Recomposition/a/h2pParsed URI components can be recomposed to obtain the corresponding URI reference string. Using pseudocode, this would be:/ppre result = br /br / if defined(scheme) thenbr / append scheme to result;br / append : to result;br / endif;br /br / if defined(authority) thenbr / append // to result;br / append authority to result;br / endif;br /br / append path to result;br /br / if defined(query) thenbr / append ? to result;br / append query to result;br / endif;br /br / if defined(fragment) thenbr / append # to result;br / append fragment to result;br / endif;br /br / return result;br //prepNote that we are careful to preserve the distinction between a
component that is undefined, meaning that its separator was not present
in the reference, and a component that is empty, meaning that the
separator was present and was immediately followed by the next
component separator or the end of the reference./ph2 id=rfc.section.5.4a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.45.4./anbsp;a name=reference-examples href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#reference-examplesReference Resolution Examples/a/h2p id=rfc.section.5.4.p.1Within a representation with a well defined base URI of/ppre http://a/b/c/d;p?qbr //prep id=rfc.section.5.4.p.3a relative reference is transformed to its target URI as follows./ph3 id=rfc.section.5.4.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.4.15.4.1./anbsp;a name=relative-normal href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-normalNormal Examples/a/h3pre g:h = g:hbr / g = http://a/b/c/gbr / ./g = http://a/b/c/gbr / g/ = http://a/b/c/g/br / /g = http://a/gbr / //g = http://gbr / ?y = http://a/b/c/d;p?ybr / g?y = http://a/b/c/g?ybr / #s = http://a/b/c/d;p?q#sbr / g#s = http://a/b/c/g#sbr / g?y#s = http://a/b/c/g?y#sbr / ;x = http://a/b/c/;xbr / g;x = http://a/b/c/g;xbr / g;x?y#s = http://a/b/c/g;x?y#sbr / = http://a/b/c/d;p?qbr / . = http://a/b/c/br / ./ = http://a/b/c/br / .. = http://a/b/br / ../ = http://a/b/br / ../g = http://a/b/gbr / ../.. = http://a/br / ../../ = http://a/br / ../../g = http://a/gbr //preh3 id=rfc.section.5.4.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.5.4.25.4.2./anbsp;a name=relative-abnormal href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-abnormalAbnormal Examples/a/h3p id=rfc.section.5.4.2.p.1Although
the following abnormal examples are unlikely to occur in normal
practice, all URI parsers should be capable of resolving them
consistently. Each example uses the same base as that above./pp id=rfc.section.5.4.2.p.2Parsers
must be careful in handling cases where there are more .. segments in
a relative-path reference than there are hierarchical levels in the
base URI's path. Note that the .. syntax cannot be used to change the
authority component of a URI./ppre ../../../g = http://a/gbr / ../../../../g = http://a/gbr //prep id=rfc.section.5.4.2.p.4Similarly, parsers must remove the
dot-segments . and .. when they are complete components of a path,
but not when they are only part of a segment./ppre /./g = http://a/gbr / /../g = http://a/gbr / g. = http://a/b/c/g.br / .g = http://a/b/c/.gbr / g.. = http://a/b/c/g..br / ..g = http://a/b/c/..gbr //prep id=rfc.section.5.4.2.p.6Less likely are cases where the
relative reference uses unnecessary or nonsensical forms of the . and
.. complete path segments./ppre ./../g = http://a/b/gbr / ./g/. = http://a/b/c/g/br / g/./h = http://a/b/c/g/hbr / g/../h = http://a/b/c/hbr / g;x=1/./y = http://a/b/c/g;x=1/ybr / g;x=1/../y = http://a/b/c/ybr //prep id=rfc.section.5.4.2.p.8Some applications fail to separate
the reference's query and/or fragment components from the path
component before merging it with the base path and removing
dot-segments. This error is rarely noticed, as typical usage of a
fragment never includes the hierarchy (/) character and the query
component is not normally used within relative references./ppre g?y/./x = http://a/b/c/g?y/./xbr / g?y/../x = http://a/b/c/g?y/../xbr / g#s/./x = http://a/b/c/g#s/./xbr / g#s/../x = http://a/b/c/g#s/../xbr //prep id=rfc.section.5.4.2.p.10Some parsers allow the scheme name
to be present in a relative reference if it is the same as the base URI
scheme. This is considered to be a loophole in prior specifications of
partial URI a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#RFC1630 title=Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web[RFC1630]/a. Its use should be avoided but is allowed for backward compatibility./ppre http:g = http:g ; for strict parsersbr / / http://a/b/c/g ; for backward compatibilitybr //preh1 id=rfc.section.6a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.66./anbsp;a name=comparison href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#comparisonNormalization and Comparison/a/h1p id=rfc.section.6.p.1One
of the most common operations on URIs is simple comparison: determining
whether two URIs are equivalent without using the URIs to access their
respective resource(s). A comparison is performed every time a response
cache is accessed, a browser checks its history to color a link, or an
XML parser processes tags within a namespace. Extensive normalization
prior to comparison of URIs is often used by spiders and indexing
engines to prune a search space or to reduce duplication of request
actions and response storage./pp id=rfc.section.6.p.2URI
comparison is performed for some particular purpose. Protocols or
implementations that compare URIs for different purposes will often be
subject to differing design trade-offs in regards to how much effort
should be spent in reducing aliased identifiers. This section describes
various methods that may be used to compare URIs, the trade-offs
between them, and the types of applications that might use them./ph2 id=rfc.section.6.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.6.16.1./anbsp;a name=equivalence href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#equivalenceEquivalence/a/h2p id=rfc.section.6.1.p.1Because
URIs exist to identify resources, presumably they should be considered
equivalent when they identify the same resource. However, this
definition of equivalence is not of much practical use, as there is no
way for an implementation to compare two resources unless it has full
knowledge or control of them. For this reason, determination of
equivalence or difference of URIs is based on string comparison,
perhaps augmented by reference to additional rules provided by URI
scheme definitions. We use the terms different and equivalent to
describe the possible outcomes of such comparisons, but there are many
application-dependent versions of equivalence./pp id=rfc.section.6.1.p.2Even
though it is possible to determine that two URIs are equivalent, URI
comparison is not sufficient to determine whether two URIs identify
different resources. For example, an owner of two different domain
names could decide to serve the same resource from both, resulting in
two different URIs. Therefore, comparison methods are designed to
minimize false negatives while strictly avoiding false positives./pp id=rfc.section.6.1.p.3In
testing for equivalence, applications should not directly compare
relative references; the references should be converted to their
respective target URIs before comparison. When URIs are compared to
select (or avoid) a network action, such as retrieval of a
representation, fragment components (if any) should be excluded from
the comparison./ph2 id=rfc.section.6.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.6.26.2./anbsp;a name=comparison-ladder href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#comparison-ladderComparison Ladder/a/h2p id=rfc.section.6.2.p.1A
variety of methods are used in practice to test URI equivalence. These
methods fall into a range, distinguished by the amount of processing
required and the degree to which the probability of false negatives is
reduced. As noted above, false negatives cannot be eliminated. In
practice, their probability can be reduced, but this reduction requires
more processing and is not cost-effective for all applications./pp id=rfc.section.6.2.p.2If
this range of comparison practices is considered as a ladder, the
following discussion will climb the ladder, starting with practices
that are cheap but have a relatively higher chance of producing false
negatives, and proceeding to those that have higher computational cost
and lower risk of false negatives./ph3 id=rfc.section.6.2.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.6.2.16.2.1./anbsp;a name=comparison-string href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#comparison-stringSimple String Comparison/a/h3p id=rfc.section.6.2.1.p.1If
two URIs, when considered as character strings, are identical, then it
is safe to conclude that they are equivalent. This type of equivalence
test has very low computational cost and is in wide use in a variety of
applications, particularly in the domain of parsing./pp id=rfc.section.6.2.1.p.2Testing
strings for equivalence requires some basic precautions. This procedure
is often referred to as bit-for-bit or byte-for-byte comparison,
which is potentially misleading. Testing strings for equality is
normally based on pair comparison of the characters that make up the
strings, starting from the first and proceeding until both strings are
exhausted and all characters are found to be equal, until a pair of
characters compares unequal, or until one of the strings is exhausted
before the other./pp id=rfc.section.6.2.1.p.3This character
comparison requires that each pair of characters be put in comparable
form. For example, should one URI be stored in a byte array in EBCDIC
encoding and the second in a Java String object (UTF-16), bit-for-bit
comparisons applied naively will produce errors. It is better to speak
of equality on a character-for-character basis rather than on a
byte-for-byte or bit-for-bit basis. In practical terms,
character-by-character comparisons should be done
codepoint-by-codepoint after conversion to a common character encoding./pp id=rfc.section.6.2.1.p.4False
negatives are caused by the production and use of URI aliases.
Unnecessary aliases can be reduced, regardless of the comparison
method, by consistently providing URI references in an
already-normalized form (i.e., a form identical to what would be
produced after normalization is applied, as described below)./pp id=rfc.section.6.2.1.p.5Protocols
and data formats often limit some URI comparisons to simple string
comparison, based on the theory that people and implementations will,
in their own best interest, be consistent in providing URI references,
or at least consistent enough to negate any efficiency that might be
obtained from further normalization./ph3 id=rfc.section.6.2.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.6.2.26.2.2./anbsp;a name=normalize-syntax href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-syntaxSyntax-Based Normalization/a/h3pImplementations
may use logic based on the definitions provided by this specification
to reduce the probability of false negatives. This processing is
moderately higher in cost than character-for-character string
comparison. For example, an application using this approach could
reasonably consider the following two URIs equivalent:/ppre example://a/b/c/%7Bfoo%7Dbr / eXAMPLE://a/./b/../b/%63/%7bfoo%7dbr //prepWeb user agents, such as browsers, typically apply this type
of URI normalization when determining whether a cached response is
available. Syntax-based normalization includes such techniques as case
normalization, percent-encoding normalization, and removal of
dot-segments./ph4 id=rfc.section.6.2.2.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.6.2.2.16.2.2.1./anbsp;a name=normalize-case href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-caseCase Normalization/a/h4p id=rfc.section.6.2.2.1.p.1For
all URIs, the hexadecimal digits within a percent-encoding triplet
(e.g., %3a versus %3A) are case-insensitive and therefore should be
normalized to use uppercase letters for the digits A-F./pp id=rfc.section.6.2.2.1.p.2When
a URI uses components of the generic syntax, the component syntax
equivalence rules always apply; namely, that the scheme and host are
case-insensitive and therefore should be normalized to lowercase. For
example, the URI lt;HTTP://www.EXAMPLE.com/gt; is equivalent to
lt;http://www.example.com/gt;. The other generic syntax components
are assumed to be case-sensitive unless specifically defined otherwise
by the scheme (see a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-scheme title=Scheme-Based NormalizationSection 6.2.3/a)./ph4 id=rfc.section.6.2.2.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.6.2.2.26.2.2.2./anbsp;a name=normalize-encoding href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-encodingPercent-Encoding Normalization/a/h4p id=rfc.section.6.2.2.2.p.1The percent-encoding mechanism (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#percent-encoding title=Percent-EncodingSection 2.1/a)
is a frequent source of variance among otherwise identical URIs. In
addition to the case normalization issue noted above, some URI
producers percent-encode octets that do not require percent-encoding,
resulting in URIs that are equivalent to their non-encoded
counterparts. These URIs should be normalized by decoding any
percent-encoded octet that corresponds to an unreserved character, as
described in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#unreserved title=Unreserved CharactersSection 2.3/a./ph4 id=rfc.section.6.2.2.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.6.2.2.36.2.2.3./anbsp;a name=normalize-path href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-pathPath Segment Normalization/a/h4p id=rfc.section.6.2.2.3.p.1The complete path segments . and .. are intended only for use within relative references (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#uri-reference title=URI ReferenceSection 4.1/a) and are removed as part of the reference resolution process (a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#absolutize title=Relative ResolutionSection 5.2/a).
However, some deployed implementations incorrectly assume that
reference resolution is not necessary when the reference is already a
URI and thus fail to remove dot-segments when they occur in
non-relative paths. URI normalizers should remove dot-segments by
applying the remove_dot_segments algorithm to the path, as described in
a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-dot-segments title=Remove Dot SegmentsSection 5.2.4/a./ph3 id=rfc.section.6.2.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.6.2.36.2.3./anbsp;a name=normalize-scheme href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-schemeScheme-Based Normalization/a/h3pThe
syntax and semantics of URIs vary from scheme to scheme, as described
by the defining specification for each scheme. Implementations may use
scheme-specific rules, at further processing cost, to reduce the
probability of false negatives. For example, because the http scheme
makes use of an authority component, has a default port of 80, and
defines an empty path to be equivalent to /, the following four URIs
are equivalent:/ppre http://example.combr / http://example.com/br / http://example.com:/br / http://example.com:80/br //prepIn general, a URI that uses the generic syntax for authority
with an empty path should be normalized to a path of /. Likewise, an
explicit :port, for which the port is empty or the default for the
scheme, is equivalent to one where the port and its : delimiter are
elided and thus should be removed by scheme-based normalization. For
example, the second URI above is the normal form for the http scheme./pp id=rfc.section.6.2.3.p.2Another
case where normalization varies by scheme is in the handling of an
empty authority component or empty host subcomponent. For many scheme
specifications, an empty authority or host is considered an error; for
others, it is considered equivalent to localhost or the end-user's
host. When a scheme defines a default for authority and a URI reference
to that default is desired, the reference should be normalized to an
empty authority for the sake of uniformity, brevity, and
internationalization. If, however, either the userinfo or port
subcomponents are non-empty, then the host should be given explicitly
even if it matches the default./pp id=rfc.section.6.2.3.p.3Normalization
should not remove delimiters when their associated component is empty
unless licensed to do so by the scheme specification. For example, the
URI http://example.com/? cannot be assumed to be equivalent to any of
the examples above. Likewise, the presence or absence of delimiters
within a userinfo subcomponent is usually significant to its
interpretation. The fragment component is not subject to any
scheme-based normalization; thus, two URIs that differ only by the
suffix # are considered different regardless of the scheme./pp id=rfc.section.6.2.3.p.4Some
schemes define additional subcomponents that consist of
case-insensitive data, giving an implicit license to normalizers to
convert this data to a common case (e.g., all lowercase). For example,
URI schemes that define a subcomponent of path to contain an Internet
hostname, such as the mailto URI scheme, cause that subcomponent to
be case-insensitive and thus subject to case normalization (e.g.,
mailto:Joe@Example.COM is equivalent to mailto:Joe@example.com,
even though the generic syntax considers the path component to be
case-sensitive)./pp id=rfc.section.6.2.3.p.5Other scheme-specific normalizations are possible./ph3 id=rfc.section.6.2.4a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.6.2.46.2.4./anbsp;a name=normalize-protocol href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-protocolProtocol-Based Normalization/a/h3pSubstantial
effort to reduce the incidence of false negatives is often
cost-effective for web spiders. Therefore, they implement even more
aggressive techniques in URI comparison. For example, if they observe
that a URI such as/ppre http://example.com/databr //prepredirects to a URI differing only in the trailing slash/ppre http://example.com/data/br //prepthey will likely regard the two as equivalent in the future.
This kind of technique is only appropriate when equivalence is clearly
indicated by both the result of accessing the resources and the common
conventions of their scheme's dereference algorithm (in this case, use
of redirection by HTTP origin servers to avoid problems with relative
references)./ph1 id=rfc.section.7a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.77./anbsp;a name=security href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#securitySecurity Considerations/a/h1p id=rfc.section.7.p.1A
URI does not in itself pose a security threat. However, as URIs are
often used to provide a compact set of instructions for access to
network resources, care must be taken to properly interpret the data
within a URI, to prevent that data from causing unintended access, and
to avoid including data that should not be revealed in plain text./ph2 id=rfc.section.7.1a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.7.17.1./anbsp;a name=security-reliability href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#security-reliabilityReliability and Consistency/a/h2p id=rfc.section.7.1.p.1There
is no guarantee that once a URI has been used to retrieve information,
the same information will be retrievable by that URI in the future. Nor
is there any guarantee that the information retrievable via that URI in
the future will be observably similar to that retrieved in the past.
The URI syntax does not constrain how a given scheme or authority
apportions its namespace or maintains it over time. Such guarantees can
only be obtained from the person(s) controlling that namespace and the
resource in question. A specific URI scheme may define additional
semantics, such as name persistence, if those semantics are required of
all naming authorities for that scheme./ph2 id=rfc.section.7.2a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.7.27.2./anbsp;a name=security-malicious href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#security-maliciousMalicious Construction/a/h2p id=rfc.section.7.2.p.1It
is sometimes possible to construct a URI so that an attempt to perform
a seemingly harmless, idempotent operation, such as the retrieval of a
representation, will in fact cause a possibly damaging remote
operation. The unsafe URI is typically constructed by specifying a port
number other than that reserved for the network protocol in question.
The client unwittingly contacts a site running a different protocol
service, and data within the URI contains instructions that, when
interpreted according to this other protocol, cause an unexpected
operation. A frequent example of such abuse has been the use of a
protocol-based scheme with a port component of 25, thereby fooling
user agent software into sending an unintended or impersonating message
via an SMTP server./pp id=rfc.section.7.2.p.2Applications should
prevent dereference of a URI that specifies a TCP port number within
the well-known port range (0 - 1023) unless the protocol being used
to dereference that URI is compatible with the protocol expected on
that well-known port. Although IANA maintains a registry of well-known
ports, applications should make such restrictions user-configurable to
avoid preventing the deployment of new services./pp id=rfc.section.7.2.p.3When
a URI contains percent-encoded octets that match the delimiters for a
given resolution or dereference protocol (for example, CR and LF
characters for the TELNET protocol), these percent-encodings must not
be decoded before transmission across that protocol. Transfer of the
percent-encoding, which might violate the protocol, is less harmful
than allowing decoded octets to be interpreted as additional operations
or parameters, perhaps triggering an unexpected and possibly harmful
remote operation./ph2 id=rfc.section.7.3a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.7.37.3./anbsp;a name=security-transcoding href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#security-transcodingBack-End Transcoding/a/h2p id=rfc.section.7.3.p.1When
a URI is dereferenced, the data within it is often parsed by both the
user agent and one or more servers. In HTTP, for example, a typical
user agent will parse a URI into its five major components, access the
authority's server, and send it the data within the authority, path,
and query components. A typical server will take that information,
parse the path into segments and the query into key/value pairs, and
then invoke implementation-specific handlers to respond to the request.
As a result, a common security concern for server implementations that
handle a URI, either as a whole or split into separate components, is
proper interpretation of the octet data represented by the characters
and percent-encodings within that URI./pp id=rfc.section.7.3.p.2Percent-encoded
octets must be decoded at some point during the dereference process.
Applications must split the URI into its components and subcomponents
prior to decoding the octets, as otherwise the decoded octets might be
mistaken for delimiters. Security checks of the data within a URI
should be applied after decoding the octets. Note, however, that the
%00 percent-encoding (NUL) may require special handling and should be
rejected if the application is not expecting to receive raw data within
a component./pp id=rfc.section.7.3.p.3Special care should be
taken when the URI path interpretation process involves the use of a
back-end file system or related system functions. File systems
typically assign an operational meaning to special characters, such as
the /, \, :, [, and ] characters, and to special device names
like ., .., ..., aux, lpt, etc. In some cases, merely testing
for the existence of such a name will cause the operating system to
pause or invoke unrelated system calls, leading to significant security
concerns regarding denial of service and unintended data transfer. It
would be impossible for this specification to list all such significant
characters and device names. Implementers should research the reserved
names and characters for the types of storage device that may be
attached to their applications and restrict the use of data obtained
from URI components accordingly./ph2 id=rfc.section.7.4a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#rfc.section.7.47.4./anbsp;a name=security-ipv4 href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#security-ipv4Rare IP Address Formats/a/h2p id=rfc.section.7.4.p.1Although
the URI syntax for IPv4address only allows the common dotted-decimal
form of IPv4 address literal, many implementations that process URIs
make use of platform-dependent system routines, such as gethostbyname()
and inet_aton(), to translate the string literal to an actual IP
address. Unfortunately, such system routines often allow and process a
much larger set of formats than those described in a href=http://gbiv.com/protocols/uri/rfc/rfc3986.html#host title=HostSection 3.2.2/a./pp id=rfc.section.7.4.p.2For
example, many implementations allow