WP3 will extend existing infrastructure components to implement the best-practices and indicators for research software and workflows developed in WP2.
To ensure adoption of the best practices and aid recognition of the indicators, WP3 will integrate relevant features in community-adopted and recommended registries, and expose them in platforms used for assessing research outputs.
Further dissemination of these practices will be done through the development of a Software Management Planning tool, with integrated guidelines.
To stimulate reproducible analysis methodologies and raise the awareness of the environmental impact computational data analysis has, we will focus on benchmarking common analyses of the ELIXIR Communities. To make optimisations readily available for researchers, we will implement new features in workflow management systems. By leveraging commonly used technologies and platforms, we will contribute to a more sustainable ecosystem for computational analysis.
Objectives
O3.1 | Make software, tools, and workflows first class citizens for assessment of researchers | Task 3.1 |
O3.2 | Stimulate the usage of Software Management Plans to support sustainable research software | Task 3.2 |
O3.3 | Benchmark resource usage for computational tasks in life sciences | Task 3.3 |
O3.4 | Implement resource optimisation features in workflow management systems | Task 3.4 |
Tasks
Task 3.1 Enable crediting scientists for research assets
In this task, we will build upon existing platforms and services (e.g., APICURON, ORCID, BIP! Scholar) that empower scientists to be credited for research assets and activities beyond publications.
Our focus will be on delivering new or improved features related to software, tools, and workflows, to expose the credit information in relevant software and workflow registries (e.g. bio.tools, WorkflowHub).
This will be achieved through enabling interoperability of the aforementioned platforms and services, and open Science Knowledge Graphs (e.g. the OpenAIRE Research Graph) using schema.org (Bioschemas) markup.
Leadership: Thanasis Vergoulis (ATHENA RC), Damiano Piovesan (University of Padua), Veit Schwämmle (University of Southern Denmark), Lars Juhl Jensen (University of Copenhagen), Olivier Sallou (University of Rennes), Henning Hermjakob (EMBL-EBI), Balazs Gyorffy (HUN-REN Research Centre for Natural Sciences), Carole Goble (University of Manchester), Salvador Capella-Gutierrez and Laura Portell-Silva (BSC).
Task 3.2 Contributing towards sustainable research software through Software Management Plans
We will develop new features in the Data Stewardship Wizard (DSW) to enable the generation of Software Management Plans (SMPs), based on input from WP2.
We will align with T3.1 and build on the work of ELIXIR and RDA to implement the guidelines and best practices for software development in research and infrastructure projects. As part of the process, engagement with relevant industry users of ELIXIR will provide additional insights to both the content and interface of the SMP.
For ML-based software, we will build on the work of the ELIXIR Machine Learning Focus Group e.g. the DOME recommendation, and the integration with bio.tools. Also, new integration services and resources will be created to support researchers in composing and using a SMP e.g. to retrieve information from code repositories.
We will disseminate the best practices, integrated in DSW, to both academia and industry, in collaboration with WP5.
Leadership: Marek Suchánek and Jan Slifka (UOCHB), Mark Ibberson and Vassilios Ioannidis (SIB), Frederik Coppens (VIB), Salvador Capella-Gutierrez and Laura Portell-Silva (BSC), Yvonne Kallberg (Stockholm University), Veit Schwämmle (University of Southern Denmark).
Task 3.3 Identify fit-for-purpose reproducible workflows through technical and scientific benchmarking
In this task, we will extend the OpenEBench community-led evaluation feature, enabling the benchmarking of workflows used for common analyses in the life sciences. The workflows will be identified in collaboration with WP2 and the ELIXIR Communities.
The evaluation will be based on community agreed indicators (defined in T2.2), covering both technical and scientific aspects, with a specific focus to enable assessment of impact and usage of software, e.g., energy consumption and physical resources needed, algorithmic and data effectiveness, as well as the recognition received by its developers and users.
We will engage with industry (with WP5) e.g. to explore (energy) optimised infrastructure for specific computational jobs.
We will also consider relevant practices and techniques from the TIER2 (HORIZON-WIDERA-2022-ERA-01-41) project like the badging approaches. The results will be exposed as FAIR Digital Objects represented as Workflow-RO-Crates using Bioschemas workflow markup as developed in the EOSC-Life Cluster project and ESG project (HORIZON-INFRA-2021-EOSC-01-04). These will be integrated in WorkflowHub for reproducibility and dissemination to end-users, both in academia and industry.
Leadership: Laura Portell-Silva and Salvador Capella-Gutierrez (BSC), Carole Goble (University of Manchester), Thanasis Vergoulis (ATHENA RC), Wei Gu (PNED), Marco Tangaro (CNR), Mark Ibberson and Vassilios Ioannidis (SIB), Wolmar Nyberg Åkerström (Uppsala University), Ana Portugal Melo (biodata.pt), Veit Schwämmle (University of Southern Denmark), Dan Ben-Avraham (Weizmann Institute of Science), Kjell Peterson, (University of Bergen), Espen Robertsen (UiT The Arctic university of Norway), Brane Leskošek (University of Ljubljana).
Task 3.4 Integrating optimisation criteria for environmental impact in commonly used workflow management systems
Building upon commonly-used technologies and workflow management systems (WMSs e.g. Galaxy, Nextflow) we will deliver features that make researchers aware of the environmental impact of an analysis, and enable them to take it into account.
Our approach will be three-fold:
- We will avoid unnecessary computations in WMSs by implementing a “job cache” that looks at provenance information from previous tool executions to reuse results if the relevant parameters match.
- We will optimise tools and workflows to do less computation by studying the efficiency of tools and workflows (T3.3) and optimising them.
- In cooperation with the ESG project, we will enhance job scheduling in WMSs by adding environmental impact minimisation to the criteria used when assigning jobs to specific compute infrastructures.
Leadership: Nicola Soranzo (Earlham), Björn Grüning (University of Freiburg), Thanasis Vergoulis (ATHENA RC), Lukas Hejtmanek, (UOCHB), Alexander Kanitz (SIB), Juha Törnroos (CSC), Marco Tangaro (CNR), Frederik Coppens (VIB), Anthony Bretaudeau (INRAE), Hedi Peterson (University of Tartu)