URLs
Uniform Resource Locators, or URLs are fundamental to the way in which the web operates, and they have been formally described in RFC 3986. A URL represents a resource on a given host. How URLs map to the resources on the remote system is entirely at the discretion of the system admin. URLs can point to files on the server, or the resources may be dynamically generated when a request is received. What the URL maps to though doesn't matter as long as the URLs work when we request them.
URLs are comprised of several sections. Python uses the urllib.parse
module for working with URLs. Let's use Python to break a URL into its component parts:
>>> from urllib.parse import urlparse >>> result = urlparse('http://www.python.org/dev/peps') >>> result ParseResult(scheme='http', netloc='www.python.org', path='/dev/peps', params='', query='', fragment='')
The urllib.parse.urlparse()
function interprets our URL and recognizes http
as the
scheme, https://www.python.org...