Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.

There is a newer version of the record available.

Published March 27, 2018 | Version v1
Conference paper Open

CWLProv - Interoperable Retrospective Provenance capture and its challenges

  • 1. The University of Melbourne; Common Workflow Language project
  • 2. The University of Manchester; Common Workflow Language project
  • 3. Common Workflow Language project
  • 4. The University of Melbourne

Description

The automation of data analysis in the form of scientific workflows is a widely adopted practice in many fields of research nowadays. Computationally driven data-intensive experiments using workflows enable Automation, Scaling, Adaption and Provenance support (ASAP).

However, there are still several challenges associated with the effective sharing, publication, understandability and reproducibility of such workflows due to the incomplete capture of provenance and the dependence on particular technical (software) platforms. This paper presents CWLProv, an approach for retrospective provenance capture utilizing open source community-driven standards involving application and customization of workflow-centric Research Objects (ROs).

The ROs are produced as an output of a workflow enactment defined in the Common Workflow Language (CWL) using the CWL reference implementation and its data structures. The approach aggregates and annotates all the resources involved in the scientific investigation including inputs, outputs, workflow specification, command line tool specifications and input parameter settings. The resources are linked within the RO to enable re-enactment of an analysis without depending on external resources.

The workflow provenance profile is represented in W3C recommended standard PROV-N and PROV-JSON format to capture retrospective provenance of the workflow enactment. The workflow-centric RO produced as an output of a CWL workflow enactment is expected to be interoperable, reusable, shareable and portable across different plat-
forms.

This paper describes the need and motivation for CWLProv and the lessons learned in applying it for ROs using CWL in the bioinformatics domain.

Notes

Preprint submitted to IPAW 2018.

Files

cwltool-5dd64adccb6350a67a127802e1bf212af01c5f00.zip

Files (4.6 MB)

Name Size Download all
md5:c6de6baedb712d344dbc94b0bdcbfbe7
3.5 MB Preview Download
md5:85d382fc0d8e8a2613d579736a7dd4ab
584.8 kB Preview Download
md5:edee8762b7c0a735350e204c294ffe8e
556.5 kB Preview Download

Additional details

Funding

BioExcel – Centre of Excellence for Biomolecular Research 675728
European Commission