View on GitHub

Workflow Run RO-Crate

RO-Crate profiles to capture the provenance of workflow runs

Workflow Run Crate

This profile uses terminology from the RO-Crate 1.1 specification.

Overview

This profile is used to describe the execution of a computational tool that has orchestrated the execution of other tools. Such a tool is represented as a workflow that can be executed using a Workflow Management System (WMS), or workflow engine (e.g. cwltool).

Workflow Run Crate is a combination of Process Run Crate and Workflow RO-Crate. In particular, the RO-Crate MUST have a ComputationalWorkflow mainEntity described according to the Workflow RO-Crate specification (main workflow), and CreateAction instances corresponding to its execution (thus having the main workflow as instrument) MUST be described as specified in Process Run Crate and this profile. Details regarding the execution of individual workflow steps can be described with the Provenance Run Crate profile.

Workflows can have multiple input and output parameter slots that have to be mapped to actual files, directories or other values (e.g., a string or a number) before they can be executed. It is OPTIONAL to define such entities for a ComputationalWorkflow. If included, parameter definitions MUST be provided as FormalParameter entities and referenced from the ComputationalWorkflow via input and output (see the Bioschemas ComputationalWorkflow profile).

A data entity or PropertyValue that realizes a FormalParameter definition SHOULD refer to it via exampleOfWork; additionally, if the data entity or PropertyValue is an illustrative example of the parameter, the latter MAY refer back to the former using the reverse property workExample. This links the input of a ComputationalWorkflow to the object of a CreateAction, and the output of a ComputationalWorkflow to the result of a CreateAction. An object item that does not match a slot in the workflow’s input interface (e.g., a configuration file read from a predefined path) MUST NOT refer to a FormalParameter of the ComputationalWorkflow via exampleOfWork. A FormalParameter that maps to a PropertyValue SHOULD have a subclass of DataType (e.g., Integer) — or PropertyValue, in the case of dictionary-like structured types — as its additionalType. See CWL parameter mapping for an example. To support reproducibility, the name field of a FormalParameter instance SHOULD match the name of the corresponding workflow parameter slot.

Additional properties described in the Bioschemas FormalParameter profile (e.g., defaultValue) MAY be used to provide additional information, but strict conformance is not required. A FormalParameter definition that strictly conforms to the Bioschemas profile SHOULD reference the relevant versioned URL via conformsTo.

The following diagram shows the relationships between provenance-related entities. Note the distinction between prospective provenance (plans for activities, e.g. a workflow) and retrospective provenance (what actually happened, e.g. the execution of a workflow).

Entity-relationship diagram

Example Metadata File (ro-crate-metadata.json)

{ "@context": "https://w3id.org/ro/crate/1.1/context",
  "@graph": [
    {
        "@id": "ro-crate-metadata.json",
        "@type": "CreativeWork",
        "about": {"@id": "./"},
        "conformsTo": [
            {"@id": "https://w3id.org/ro/crate/1.1"},
            {"@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"}
        ]
    },
    {
        "@id": "./",
        "@type": "Dataset",
        "conformsTo": [
            {"@id": "https://w3id.org/ro/wfrun/process/0.1"},
            {"@id": "https://w3id.org/ro/wfrun/workflow/0.1"},
            {"@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"}
        ],
        "hasPart": [
            {"@id": "Galaxy-Workflow-Hello_World.ga"},
            {"@id": "inputs/abcdef.txt"},
            {"@id": "outputs/Select_first_on_data_1_2.txt"},
            {"@id": "outputs/tac_on_data_360_1.txt"}
        ],
        "license": {"@id": "http://spdx.org/licenses/CC0-1.0"},
        "mainEntity": {"@id": "Galaxy-Workflow-Hello_World.ga"},
        "mentions": {"@id": "#wfrun-5a5970ab-4375-444d-9a87-a764a66e3a47"}
    },
    {   "@id": "https://w3id.org/ro/wfrun/process/0.1",
        "@type": "CreativeWork",
        "name": "Process Run Crate",
        "version": "0.1"
    },
    {   "@id": "https://w3id.org/ro/wfrun/workflow/0.1",
        "@type": "CreativeWork",
        "name": "Workflow Run Crate",
        "version": "0.1"
    },
    {   "@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0",
        "@type": "CreativeWork",
        "name": "Workflow RO-Crate",
        "version": "1.0"
    },
    {
        "@id": "Galaxy-Workflow-Hello_World.ga",
        "@type": ["File", "SoftwareSourceCode", "ComputationalWorkflow"],
        "name": "Hello World (Galaxy Workflow)",
        "author": {"@id": "https://orcid.org/0000-0001-9842-9718"},
        "creator": {"@id": "https://orcid.org/0000-0001-9842-9718"},
        "programmingLanguage": {"@id": "https://w3id.org/workflowhub/workflow-ro-crate#galaxy"},
        "input": [
            {"@id": "#simple_input"},
            {"@id": "#verbose-param"}
        ],
        "output": [
            {"@id": "#reversed"},
            {"@id": "#last_lines"}
        ]
    },
    {
        "@id": "#simple_input",
        "@type": "FormalParameter",
        "additionalType": "File",
        "conformsTo": {"@id": "https://bioschemas.org/profiles/FormalParameter/1.0-RELEASE"},
        "description": "A simple set of lines in a text file",
        "encodingFormat": [
            "text/plain",
            {"@id": "http://edamontology.org/format_2330"}
        ],
        "workExample": {"@id": "inputs/abcdef.txt"},
        "name": "simple_input",
        "valueRequired": "True"
    },
    {
        "@id": "#verbose-param",
        "@type": "FormalParameter",
        "additionalType": "Boolean",
        "conformsTo": {"@id": "https://bioschemas.org/profiles/FormalParameter/1.0-RELEASE"},
        "description": "Increase logging output",
        "workExample": {"@id": "#verbose-pv"},
        "name": "verbose",
        "valueRequired": "False"
    },
    {
        "@id": "#reversed",
        "@type": "FormalParameter",
        "additionalType": "File",
        "conformsTo": {"@id": "https://bioschemas.org/profiles/FormalParameter/1.0-RELEASE"},
        "description": "All the lines, reversed",
        "encodingFormat": [
            "text/plain",
            {"@id": "http://edamontology.org/format_2330"}
        ],
        "name": "reversed",
        "workExample": {"@id": "outputs/tac_on_data_360_1.txt"}
    },
    {
        "@id": "#last_lines",
        "@type": "FormalParameter",
        "additionalType": "File",
        "conformsTo": {"@id": "https://bioschemas.org/profiles/FormalParameter/1.0-RELEASE"},
        "description": "The last lines of workflow input are the first lines of the reversed input",
        "encodingFormat": [
            "text/plain",
            {"@id": "http://edamontology.org/format_2330"}
        ],
        "name": "last_lines",
        "workExample": {"@id": "outputs/Select_first_on_data_1_2.txt"}
    },
    {
        "@id": "https://orcid.org/0000-0001-9842-9718",
        "@type": "Person",
        "name": "Stian Soiland-Reyes"
    },
    {
        "@id": "https://w3id.org/workflowhub/workflow-ro-crate#galaxy",
        "@type": "ComputerLanguage",
        "identifier": "https://galaxyproject.org/",
        "name": "Galaxy",
        "url": "https://galaxyproject.org/"
    },
    {
        "@id": "#wfrun-5a5970ab-4375-444d-9a87-a764a66e3a47",
        "@type": "CreateAction",
        "name": "Galaxy workflow run 5a5970ab-4375-444d-9a87-a764a66e3a47",
        "endTime": "2018-09-19T17:01:07+10:00",
        "instrument": {"@id": "Galaxy-Workflow-Hello_World.ga"},
        "subjectOf": {"@id": "https://usegalaxy.eu/u/5dbf7f05329e49c98b31243b5f35045c/p/invocation-report-a3a1d27edb703e5c"},
        "object": [
            {"@id": "inputs/abcdef.txt"},
            {"@id": "#verbose-pv"}
        ],
        "result": [
            {"@id": "outputs/Select_first_on_data_1_2.txt"},
            {"@id": "outputs/tac_on_data_360_1.txt"}
        ]
    },
    {
        "@id": "inputs/abcdef.txt",
        "@type": "File",
        "description": "Example input, a simple text file",
        "encodingFormat": "text/plain",
        "exampleOfWork": {"@id": "#simple_input"}
    },
    {
        "@id": "#verbose-pv",
        "@type": "PropertyValue",
        "exampleOfWork": {"@id": "#verbose-param"},
        "name": "verbose",
        "value": "True"
    },
    {
        "@id": "outputs/Select_first_on_data_1_2.txt",
        "@type": "File",
        "name": "Select_first_on_data_1_2 (output)",
        "description": "Example output of the last (aka first of reversed) lines",
        "encodingFormat": "text/plain",
        "exampleOfWork": {"@id": "#last_lines"}
    },
    {
        "@id": "outputs/tac_on_data_360_1.txt",
        "@type": "File",
        "name": "tac_on_data_360_1 (output)",
        "description": "Example output of the reversed lines",
        "encodingFormat": "text/plain",
        "exampleOfWork": {"@id": "#reversed"}
    },
    {
        "@id": "https://usegalaxy.eu/u/5dbf7f05329e49c98b31243b5f35045c/p/invocation-report-a3a1d27edb703e5c",
        "@type": "CreativeWork",
        "encodingFormat": "text/html",
        "datePublished": "2021-11-18T02:02:00Z",
        "name": "Workflow Execution Summary of Hello World"
    }
]
}

Adding engine-specific traces

Some engines are able to generate contextual information about workflow runs in the form of logs, reports, etc. These are not workflow outputs, but rather additional files automatically generated by the engine, either by default or when activated via a configuration parameter or command line flag. It is RECOMMENDED to add any such files to the RO-Crate; the corresponding entities SHOULD refer to the relevant Action instance via about:

{
    "@id": "#action-1",
    "@type": "CreateAction",
    ...
},
{
    "@id": "trace-20230120-40360336.txt",
    "@type": "File",
    "name": "Nextflow trace for action-1",
    "conformsTo": "https://www.nextflow.io/docs/latest/tracing.html#trace-report",
    "encodingFormat": "text/tab-separated-values",
    "about": "#action-1"
},
{
    "@id": "https://www.nextflow.io/docs/latest/tracing.html#trace-report",
    "@type": "CreativeWork",
    "name": "Nextflow trace report CSV profile"
}

Environment variables as formal parameters

The Process Run Crate profile specifies how to represent environment variable settings that affected the execution of a particular action via environment. A workflow, in turn, MAY indicate that it is affected by a certain environment variable by using the same environment property and having it point to a FormalParameter whose name is equal to the variable’s name. If an action corresponding to an execution of the workflow sets that variable, the PropertyValue SHOULD point to the FormalParameter via exampleOfWork:

{
    "@id": "run_blast.cwl",
    "@type": ["File", "SoftwareSourceCode", "ComputationalWorkflow"],
    ...
    "environment": [
        {"@id": "run_blast.cwl#batch_size"}
    ]
},
{
    "@id": "run_blast.cwl#batch_size",
    "@type": "FormalParameter",
    "additionalType": "Integer",
    "name": "BATCH_SIZE",
},
{
    "@id": "#cb04c897-eb92-4c53-8a38-bcc1a16fd650",
    "@type": "CreateAction",
    "instrument": {"@id": "run_blast.cwl"},
    ...
    "environment": [
        {"@id": "#batch_size-pv"}
    ]
},
{
    "@id": "#batch_size-pv",
    "@type": "PropertyValue",
    "exampleOfWork": {"@id": "run_blast.cwl#batch_size"},
    "name": "BATCH_SIZE",
    "value": "100"
}

Requirements

This profile inherits the requirements of Process Run Crate and Workflow RO-Crate. Additional specifications are listed below.

Property Required? Description
Dataset (the root data entity, e.g. "@id": "./")
conformsTo MUST Array MUST reference a CreativeWork entity with an @id URI that is consistent with the versioned Permalink of this document, and SHOULD also reference versioned permalinks for Process Run Crate and Workflow RO-Crate.
PropertyValue or data entity that realizes a FormalParameter
exampleOfWork SHOULD Identifier of the FormalParameter instance realized by this entity.
FormalParameter
name SHOULD SHOULD match the name of the corresponding workflow parameter slot, e.g. n_lines
description MAY A description of the parameter's purpose, e.g. Number of lines
workExample MAY Identifier of the data entity or PropertyValue instance that realizes this parameter. The data entity or PropertyValue instance SHOULD refer to this parameter via exampleOfWork.
additionalType MUST SHOULD include: File, Dataset or Collection if it maps to a file, directory or multi-file dataset, respectively; PropertyValue if it maps to a dictionary-like structured value (e.g. a CWL record); DataType or one of its subtypes (e.g. Integer) if it maps to a non-structured value.