uri4uri

This site was created for fun on April 1st 2013 by Christopher Gutteridge (University of Southampton) with a very over-the-top set of claims. Since July 2022, it is hosted and developed by IS4. Check out the news!

Generate…
URI
Domain/IP Address
URI Scheme
MIME
File Extension
Port
Protocol
URN Namespace
Well-Known URI
Named Service
URI Query/Fragment

How does it work?

The URI resolver just parses the URI to get the parts, then uses registries and external queries to retrieve information about MIME types, top-level domains and so forth. This data is taken from IANA and Wikidata and is periodically updated.

It accepts any possible URI, including web addresses, email addresses, ISBNs as well as some more obscure schemes such as Gopher.

URIs

The resources hosted by this site use URIs in the format:

https://w3id.org/uri4uri/<type>/<identifier>

...where <type> is one of uri, scheme, host, part, mime, suffix, urn, well-known, port, protocol, service. Percent-encoding <identifier> is necessary only for ? and # and the usual invalid characters. Note that due to the limitations of w3id.org, you should always use https:// and never escape / (in /uri/, escaped / in the described URI is encoded as %252F).

Resolving a URI will invoke content negotiation to pick one of Turtle, RDF/XML, JSON-LD or HTML and a 303 redirect to the relevant document. Each URI has an associated set of documents: ttl, rdf, jsonld, nt, html. These have URLs in the following formats:

https://w3id.org/uri4uri/<type>.<format>/<identifier>

Examples:

https://w3id.org/uri4uri/uri/http://xkcd.com/123/ - URI for "http://xkcd.com/123/"
https://w3id.org/uri4uri/host/totl.net - URI for the domain "totl.net"
https://w3id.org/uri4uri/suffix/pdf - URI for the suffix ".pdf"
https://w3id.org/uri4uri/scheme/ftp - URI for the URI scheme "ftp"
https://w3id.org/uri4uri/mime/text/plain - URI for the MIME Type "text/plain"
https://w3id.org/uri4uri/urn/uuid - URI for the "urn:uuid:" Namespace
https://w3id.org/uri4uri/well-known/void - URI for the "/.well-known/void" Service
https://w3id.org/uri4uri/port/80 - URI for the port 80
https://w3id.org/uri4uri/protocol/tcp - URI for the TCP protocol

Each identifier is normalized before a description is generated. All identifiers except for URIs are converted to lowercase. Domain names are converted to their Unicode-based variants. Invalid characters in URIs are percent-encoded.

How big is it?

Virtually infinite, and still growing! Since it generates most results on the fly however, the size can be pretty efficiently compressed to almost 0. The remainder are the registries from IANA, which are cached and take about 2.5 MiB of space in total.

What is included

URIs, Internet Domains, Mime Types, File Suffixes, URI Schemes, URN Namespaces, Well-Known URIs, Ports, Protocols.

How do I find the URI4URI which identifies the URL of a page I'm viewing?

Simple, just drag this handy "bookmarklette" into your browser tool bar: URI4URI when you click it, it runs a teeny tiny bit of javascript which takes you to a page telling you about the URL and it's uri4uri.

What parts of a URI are supported?

The majority of the effort has gone into calculating the components of http and https URIs. An example showing off all the parts of a URI would be http://foo:bar@bbc.co.uk:80/index.html?a=1&b=2#fragment, however other URI schemes are supported, e.g. tel: or secondlife:.

Can I see the source code?

Sure, you can find the uri4uri source on github.com

Changelog

This section summarizes the changes to this website since July 2022:

General
  • Individual pages have semantic <meta> tags!
  • JSON-LD was added as a supported format of serialization, and is also emitted as part of the HTML documents. Some issues in the other serializers were fixed.
  • A lot of new entity types and records were added, for use by URIs: ports, well-known URI suffixes, URN namespaces, protocols and named services.
  • Percent-encoding of most characters is optional; only %?# have to be encoded (in addition to the usual forbidden characters). Identifiers are always normalized.
  • All databases are generated on-the-fly and frequently updated (from sources such as IANA).
  • Under the hood, Wikidata is used to provide additional and rich info, beyond the stored databases. Thanks to this, even proprietary MIME types or file extensions can be resolved.
  • Provenance information, term statuses and vocabulary annotations added wherever possible.
  • Several pieces of vocabulary were deprecated and replaced with commonly used properties, such as linking to ports, file extensions or MIME types.
  • All datasets are published and described using VoID, which can be searched through as Triple Pattern Fragments.
  • Many general fixes in various parts of the code.
URIs
  • URI parsing improved (fixes PHP's edge cases).
  • Relative URIs are properly recognized and supported. Other types or URIs are also recognized (well-known URIs, URNs, PURLs). The port may be implied by the scheme.
  • Query and fragment parsing was unified and reworked. Individual query fields are represented as properties. XPointer and Media Fragments are recognized.
  • Conversion to ASCII/IRI is included as a notation.
Domains
  • IDN support for domain names added. Domains are normalized to their Unicode form and the ASCII form is added as a notation.
  • Different types of hosts are supported ‒ domains (special-use etc.), IPv4, IPv6, IPvFuture. Conversion back and forth using DNS is supported.
  • The WHOIS mechanism now uses RDAP to determine the WHOIS server, and can derive several properties from the RDAP record, for both domains and IP addresses.
Formats
  • Structural MIME types (with +xml and similar at the end) and MIME types with parameters (like ;charset=) are properly parsed and link to the base MIME type.