Skip to content

2 Design identifiers for use by others

Niall Beard edited this page Mar 30, 2016 · 10 revisions

Rule 2: Design identifiers for use by others

Permalink URI: https://w3id.org/id-rules/2

Pre-existing identifiers should be referenced without modifications (see Rule 10). However, when new local identifiers are necessary, there are some design decisions that can facilitate their use in diverse contexts (spreadsheets, other databases, web applications, publications, etc.).

We use the term Local Resource Identifier (LRI) to mean a publicly available identifier that is unique within a single dataset.

Characteristics of Local Resource Identifiers

  • Must comprise only printable ASCII characters without whitespace. This guards against corruption and mistranscription in many contexts.
  • Should contain both letters and numbers. This avoids misinterpretation as numeric data (e.g. truncation of leading zeros in spreadsheets).
  • Should avoid problem patterns; this avoids misinterpretation whether as dates, exponents in spreadsheets, or unintended words.
  • Should adhere to a fixed, documented case convention, preferably one that is case insensitive; this avoids accidental collisions.
  • Must adhere to a formal pattern (regular expression); this facilitates but does not guarantee validation and retrieval from scientific text. Consider a fixed length of 8-16 characters (according to the anticipated number of required LRIs). A pattern may be extended if all available identifiers are issued, but existing identifiers must not be changed. To minimize global LRI collisions, it is considerate to tightly specify your pattern (e.g. using two or more fixed letters at the start).
  • Should ideally not contain . except to denote version where appropriate (see Rule 7)

Two small considerations also make LRIs well suited for others to use in user-friendly compact notation and semantic web. We therefore recommend that LRIs:

  • Should not contain :, a reserved character for CURIE parsing
  • If additional delimiters (other than : and .) are needed, prefer -. This guards against certain CURIE parsers splitting inappropriately.

This section edited from the original to add sub-headlines.