Wednesday 15 December 2021

What is a Uniform Resource Locator (URL) ?

 

Uniform Resource Locator (URL):

 

Uniform Resource Locator (URL), also termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. URLs occur most commonly to reference web pages (http), but are also used for file transfer (ftp), email (mailto), database access (JDBC), and many other applications.

Most web browsers display the URL of a web page above the page in an address bar. A typical URL could have the form http://www.example.com/index.html, which indicates a protocol (http), a hostname (www.example.com), and a file name (index.html).

Uniform Resource Locators were defined in RFC 1738 in 1994 by Tim Berners-Lee, the inventor of the World Wide Web, and the URI working group of the Internet Engineering Task Force (IETF), as an outcome of collaboration started at the IETF Living Documents birds of a feather session in 1992.

Every HTTP URL conforms to the syntax of a generic URI. The URI generic syntax consists of a hierarchical sequence of five components:

URI = scheme:[//authority]path[?query][#fragment]

where the authority component divides into three subcomponents:

authority = [userinfo@]host[:port]

This is represented in a syntax diagram as:

URL Syntax Diagram

The URI comprises:

·         A non-empty scheme component followed by a colon (:), consisting of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus (+), period (.), or hyphen (-). Although schemes are case-insensitive, the canonical form is lowercase and documents that specify schemes must do so with lowercase letters. Examples of popular schemes include httphttpsftpmailtofiledata, and irc. URI schemes should be registered with the Internet Assigned Numbers Authority (IANA), although non-registered schemes are used in practice.

·         An optional authority component preceded by two slashes (//), comprising:

o     An optional userinfo subcomponent that may consist of a user name and an optional password preceded by a colon (:), followed by an at symbol (@).

o     host subcomponent, consisting of either a registered name or an IP address. IPv4 addresses must be in dot-decimal notation, and IPv6 addresses must be enclosed in brackets ([]).

o    An optional port subcomponent preceded by a colon (:).

·         path component, consisting of a sequence of path segments separated by a slash (/). A path is always defined for a URI, though the defined path may be empty (zero length). A segment may also be empty, resulting in two consecutive slashes (//) in the path component. A path component may resemble or map exactly to a file system path, but does not always imply a relation to one.

·         An optional query component preceded by a question mark (?), containing a query string of non-hierarchical data.

·         An optional fragment component preceded by a hash (#). The fragment contains a fragment identifier providing direction to a secondary resource, such as a section heading in an article identified by the remainder of the URI. A web browser will usually dereference a URL by performing an HTTP request to the specified host, by default on port number 80. URLs using the https scheme require that requests and responses be made over a secure connection to the website.

No comments:

Post a Comment