Enabling the reuse, extension, scaling, and reproducibility of scientific workflows (2018-cwl)

The Marine Metagenomics Community has adopted the use of the Common Workflow Language (CWL) as an interoperable way to describe their analysis pipelines. One of the most complex and fully developed CWL workflows implements the EBI metagenomics analysis pipeline.

In coordination with MG-RAST, a US based metagenomics analysis pipeline, there are now two different large-scale metagenomics CWL workflows. Each uses a different CWL execution framework (namely Toil and AWE) and are run on different compute infrastructures. During the course of the coming year, the Marine Use Case expects META-pipe (the ELIXIR-NO, marine specific metagenomics pipeline) and other metagenomics related tools (e.g. ITS1 analysis from ELIXIR-IT) to adopt CWL. These additional tools can be used as alternatives for pre­existing tools or extend the functionality of the current workflows.

This Implementation Study aims to:

  1. demonstrate the benefits of using CWL by combining different workflows components to make new workflows;
  2. extend the current CWL workflows to enable greater reuse;
  3. enhance the execution frameworks to improve both deployment and scalability;
  4. deploy a single CWL workflow on different ELIXIR cloud environments to enable parallel processing and reproducibility.

To provide an exemplar to both the ELIXIR and the broader scientific communities, we will work through a community case study and ensure that the data, analysis and results conform to a bona fide Research Object (RO), ensuring that they comply with FAIR principles. We will develop appropriate training materials for two key target audiences - producers of (workflows and ROs) and consumers.

This study is closely linked with the work of the Bioschemas Community.