View on GitHub

Common Provenance Model RO-Crate profile

RO-Crate profiles to capture W3C PROV provenance that follows the Common Provenance Model.

Common Provenance Model RO-Crate profile

Research objects, such as data, experimental results, computational models, or biological samples, are exchanged between organizations, so each of the organizations can provide provenance information only about a part of the research object’s life cycle. As a result, a complete provenance description of the object is then spread across different heterogeneous organizations.

The Common Provenance Model (CPM) provides a baseline for such distributed provenance chains. It defines how to interconnect distributed provenance parts encapsulated in PROV bundles, how to express standardized derivation paths between inputs and outputs of a process in a single bundle (so called provenance backbone), and how to attach domain specific information to the chain in a harmonized way.

This document specifies how to identify and handle CPM compliant provenance files and CPM compliant meta-provenance files in an RO-Crate.

General Requirements

Type/Property Required? Description
CPMProvenanceFile
extends MediaObject (@id is resolvable), dataEntity
@type MUST Type that identifies the CPM provenance file.

Array MUST include "File". Array MUST include "CPMProvenanceFile".

@id MUST Identifier of the CPM provenance file.

SHOULD be a relative URI to a data entity in the crate (e.g. "provenance/prov-training.provn") but MAY be an absolute URI . Resolving this identifier MUST return this provenance file in the given format.

identifier SHOULD Identifier of a provenance bundle present in the CPM provenance file.

MUST be an absolute URI. MUST match the expanded bundle identifier. MAY be equal to @id if absolute.

Note: PROV formats that support identified bundles SHOULD ensure their internally defined identifier also matches this identifier.

dateModified SHOULD The time this CPM provenance file was last modified/written (not necessarily when the bundle included was finalized or the file was added to the RO-Crate).

MUST be a string with format "ddMMYYYY".

encodingFormat MUST Encoding of the CPM provenance file.

Array MUST contain a string indicating the IANA media type of the file, e.g. text/turtle or text/provenance-notation or application/ld+json.

Array MUST also contain a reference to a CreativeWork that indicates the PROV format used in the serialization, which @id SHOULD be one of:

about SHOULD Array contains entity identifiers, which are documented by the CPM provenance file.

SHOULD contain at least one identifier.

CPMMetaProvenanceFile
extends CPMProvenanceFile
@type MUST Type that identifies the CPM meta provenance file.

Array MUST include File and CPMMetaProvenanceFile

@id MUST Identifier of the CPM meta provenance file.

SHOULD be an absolute URI, but MAY be a relative URI to a data entity in the crate (e.g. "provenance/prov-meta.jsonld")

dateModified SHOULD The time this CPM meta provenance file was last modified/written (not necessarily when the bundle included was finalized or the file was added to the RO-Crate).

MUST be a string with format "ddMMYYYY".

encodingFormat MUST Encoding of the CPM meta provenance file.

Array MUST contain a string indicating the IANA media type of the file, e.g. text/turtle or text/provenance-notation or application/ld+json.

Array MUST also contain a reference to a CreativeWork that indicates the PROV format used in the serialization, which @id SHOULD be one of:

hasPart MUST Identifiers of meta provenance bundles present in the CPM meta provenance file.

Array MUST contain absolute URIs. URIs MUST match the expanded bundle identifiers as used internally in the CPM provenance files.

Example

The example RO-Crate documents a single step computation implemented as a python script which takes a file as input and generates another file as output. In addition, the computation generated an arbitrary log file, which was transformed into a CPM compliant provenance bundle using a standalone python script. The resulting CPM provenance file documents the computation execution.

{ "@context": [
    "https://w3id.org/ro/crate/1.1/context",
    { "CPMProvenanceFile": "https://w3id.org/ro/terms/cpm#CPMProvenanceFile",
      "CPMMetaProvenanceFile": "https://w3id.org/ro/terms/cpm#CPMMetaProvenanceFile"
  ],
  "@graph": [

 {
    "@type": "CreativeWork",
    "@id": "ro-crate-metadata.json",
    "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
    "about": {"@id": "./"}
 },
 {
    "@id": "./",
    "@type": "Dataset",
    "datePublished": "2022",
    "conformsTo": [
       {"@id": "https://w3id.org/ro/wfrun/0.1/process"},
       {"@id": "https://w3id.org/cpm/crate/0.1"},
    ],
    "name": "...",
    "description": "",
    "hasPart": [
        { "@id": "INPUT_DATASET_PATH" },
        { "@id": "OUTPUT_DATASET_PATH" },
        { "@id": "COMPUTATION_LOG_FILE" },
        { "@id": "CPM_COMPLIANT_PROVENANCE" },
        { "@id": "COMPUTATION_SCRIPT" },
        { "@id": "CPM_PROVENANCE_GENERATION_SCRIPT" }
      ],
    "mentions": [
      { "@id": "#Exec-computation"},
      { "@id": "#Exec-CPM-provgen"}
    ]
 },
 {
   "@id": "INPUT_DATASET_PATH",
   "@type": "File",
   "description": "",
   "name": ""
 },
 {
   "@id": "OUTPUT_DATASET_PATH",
   "@type": "File",
   "description": "",
   "name": ""
 },
 {
   "@id": "COMPUTATION_SCRIPT",
   "@type": ["File","SoftwareSourceCode"],
   "description": "",
   "name": ""
 },
 {
   "@id": "COMPUTATION_LOG_FILE",
   "@type": "File",
   "description": "A log file generated by the computation
 execution.",
   "name": ""
 },
 {
   "@id": "CPM_PROVENANCE_GENERATION_SCRIPT",
   "@type": ["File","SoftwareSourceCode"],
   "description": "A python script that translates the computation log
 files into CPM compliant provenance file.",
   "name": ""
 },
 {
   "@id": "CPM_COMPLIANT_PROVENANCE",
   "@type": ["File", "CPMProvenanceFile"],
   "description": "CPM compliant provenance file generated based on 
the computation log file.",
   "encodingFormat": [
      "text/provenance-notation",
      { "@id": "http://www.w3.org/TR/2013/REC-prov-n-20130430/"},
   ],
   "name": "",
   "about": [{"@id": "#Exec-computation"}]

 },
 {
   "@id": "#Exec-computation",
   "@type": ["CreateAction"],
   "description": "Computation execution.",
   "name": "",
    "instrument": {
          "@id": "COMPUTATION_SCRIPT"
    },
    "object": [
          {"@id": "INPUT_DATASET_PATH"}
    ],
    "result": [
          {"@id": "OUTPUT_DATASET_PATH"},
          {"@id": "COMPUTATION_LOG_FILE"}
        ]
 },
 {
   "@id": "#Exec-CPM-provgen",
   "@type": ["CreateAction"],
   "description": "CPM compliant provenance generation.",
   "name": "",
   "instrument": {
        "@id": "CPM_PROVENANCE_GENERATION_SCRIPT"
      },
    "object": [
        {"@id": "COMPUTATION_LOG_FILE"}
    ],
    "result": [
          {"@id": "CPM_COMPLIANT_PROVENANCE"}
        ]
 }

]
}

Notes