ELIXIR CONVERGE WP5: Demonstrator Projects

ELIXIR Nodes are facing increasing demands of support for the development of data management plans from their communities of users. These communities are developing very diverse types of research in life sciences, producing even more diverse types of data (e.g. sequences, Biotechnologies) and associated with different needs in terms of analysis and data cycles.

As an infrastructure, ELIXIR has developed different tools and standards that can be used in DMPs across these fields, but needs its capacity boosted to jointly address user’s needs in transnational projects. This WP supports the implementation of the necessary processes testing them on real cases.

The goal of this WP is to assess on real cases the capacity of ELIXIR and its national Nodes to assist users’ projects in implementing data management plans in their projects at a EU scale. A set of six very diverse demonstrator projects will be addressed, some of them also requiring the management of sensitive data

WP5 will implement data management plans, as machine actionable as possible, for two ready-to-go demonstrators using the toolkits and approaches developed in WP3. In a second stage, the method developed on these two demonstrators will be applied to the four remaining ones.

The work performed in WP5 will be supported by training and capacity building in collaboration with WP2. From these experiences, WP5 will propose a mechanism to assign demonstrator projects to different categories based on the type of resources needed to implement their DMPs. This effort will be used as input for WP1 on developing a sustainable and scalable operating model for ELIXIR Nodes supporting their users needs on project DMPs. Indicators for the evaluation of the developed DMPs will also be proposed as part of the effort in WP5.

Objectives

O5.1	Develop a categorisation for projects based on resources needed for DM	Task 5.1
O5.2	Support Demonstrator projects with coordinated DMPs across nodes	Task 5.2
O5.3	Define and monitor KPIs for DMPs including their implementation	Task 5.3
O5.4	Inform and deliver capacity building to strengthen Node-Node coordination	Task 5.4

Prioritised demonstrators

1. Harmonised FAIR plant genotype & phenotype data management toolkit for Europe

Data types and challenge: phenotyping data (tabular text, ISA-tab), genotyping data (SNP genotyping matrices in VCF either from genotyping arrays or GBS experiments, SSR type identification markers), includes data with restricted access

Key deposition databases, standards and interoperability resources: ENA, EVA, BioSamples, ISA-toolkit, MIAPPE, BrAPI, CropOntology

2. Reproducible, comparable and FAIR Epitranscriptomics

Data types and challenge: Available high-throughput sequencing data sets: e.g. deep RNA-seq of 25k samples; , deep RNA-seq (MeRIP, miCLIP, ...) and Ribo-seq of 2.3k samples.

Key Deposition databases, standards and interoperability resources: Harmonised/benchmarked workflows , MIAME, BioSamples

3. Common Data management plans for the marine metagenomics Community

Data types and challenge: Large Metagenomics datasets: whole metagenomes shotgun data, metagenome assembled genome data, metatranscriptome, metabarcoding data

Key Deposition databases, standards and interoperability resources: Drive implementation of community standards across ELIXIR Nodes: https://doi.org/10.1093/gigascience/gix047

4. Federated access to human genomics data: GDPR

Data types and challenge: A typical ELIXIR Community member would be a research institute/medical centre that is willing to share their research data, but doing so under more restrictive conditions than general research reuse. Reusability of sensitive data in particular requires special effort from data users as well as data providers in the context of GDPR and its country-specific implementation in context of 1M Genomes Declaration.

Key deposition databases, standards and interoperability resources: Local EGA functionalities, Beacon, ELIXIR Beacon Network, Data Access Management tools, GDPR tools, ELIXIR AAI, GA4GH standards

5. FAIR encoding and access to Toxicology data

Data types and challenge: Chemical structures, pharmacological data, toxicological data, data from clinical trials, information on drug side-effects, pharmacovigilance data, etc.

Key Deposition databases, standards and Interoperability resources: ChEMBL, OLS, Transnational access to complex data, terminology mappings (OxO)

6. FAIR organisation of biomolecular simulation information

Data types and challenge: long coordinate files containing the evolution of 3D positions of atoms of the system across time; output of a wide variety of analysis tools applied to raw data, metadata describing system setup and the simulation parameters. Simulation data is typically stored locally in an undocumented manner, without any external curation and lacking associated metadata required for their reusability.

Key Deposition databases, standards and interoperability resources: Some initial attempts in the generation of usable ontologies.

Tasks

Task 5.1 Categorization of projects based on the type of resources needed to implement a DMP

This task will analyse the demonstrators in the light of the process necessary to implement their DMPs and propose a categorization based on users’ needs. This effort will provide input to WP1 with needs in terms of experts and resources across ELIXIR Nodes (Task 1.3, Business plans).

It is expected that some of the categories will be generic whereas others will be very specific to projects, which is an important clue for establishing a strategy towards sustainability of such diverse services.

Leadership: INRAE (Anne-Françoise Adam-Blondon, ELIXIR France)

Participants: BSC (ELIXIR Spain), BSRC (ELIXIR Greece), CNR (ELIXIR Italy), DTU (ELIXIR Netherlands), Heidelberg Institute for Theoretical Studies (ELIXIR Germany), Hungarian Academy of Sciences (ELIXIR Hungary), UCD (ELIXIR Ireland), University of Bergen (ELIXIR Norway), University of Cambridge (ELIXIR UK), University of Cyprus (ELIXIR Cyprus), University of Luxembourg (ELIXIR Luxembourg), University of Manchester (ELIXIR UK), University of Tartu (ELIXIR Estonia), ÚOCHB (ELIXIR Czech Republic), Uppsala University (ELIXIR Sweden), VIB (ELIXIR Belgium), Weizmann Institute of Science (ELIXIR Israel).

Task 5.2 Implementation of pilot projects data management plans

Task 5.2 will start with two ready-to-go pilot projects (number 1 and 2 of the pilot projects description above), i.e. projects with already well identified resources and problem statements about their management according to the FAIR data principles.

In collaboration with WP3, a toolkit will be assembled to develop and implement a DMP for each of those projects. These activities will provide feedback to WP3 contributing to the development of processes to enrich and extend the DM toolkit (Task 3.2).

A second wave of four very different projects will be then taken. This will allow to apply and improve the processes, expert network and training approaches developed in ELIXIR-CONVERGE, and assess how the general process scales up when applied to other communities served by ELIXIR.

Leadership: BSC (Salvador Capella-Gutierrez, ELIXIR Spain)

Participants: CNR (ELIXIR Italy), CNRS (ELIXIR France), Heidelberg Institute for Theoretical Studies (ELIXIR Germany), Hungarian Academy of Sciences (ELIXIR Hungary), IGC (ELIXIR Portugal), INESC-ID (ELIXIR Portugal), INRAE (ELIXIR France), SIB (ELIXIR Switzerland), University of Bergen (ELIXIR Norway), University of Ljubljana (ELIXIR Slovenia), University of Luxembourg (ELIXIR Luxembourg), Uppsala University (ELIXIR Sweden).

Task 5.3 Development, implementation and refinement of key performance indicators to monitor the demonstrator projects’ implementation of data management plans

Task 5.3 will propose, implement and refine key performance indicators (KPIs) to measure and monitor demonstrator projects’ implementation of DMPs. These KPIs will contribute to identifying potential gaps when implementing DMPs. Gaps will be addressed in collaboration with WP1, WP2 and WP3.

KPIs’ use and adoption will also be evaluated during the project to produce relevant indicators which can be adopted beyond the proposal itself. The defined KPIs will feed into WP1 (Best practice, Operating models) and WP4 (Impact assessment).

Leadership: University of Bergen (Inge Jonassen, ELIXIR Norway)

Participants: BSC (ELIXIR Spain), CNRS (ELIXIR France), Heidelberg Institute for Theoretical Studies (ELIXIR Germany), INRAE (ELIXIR France), University of Luxembourg (ELIXIR Luxembourg).

Task 5.4 Capacity building actions based on demonstrator projects outcomes

Task 5.4 will use the outcomes of the demonstrator projects to propose specific Capacity Building actions to WP2. Despite the specificities of each demonstrators, this task aims to identify common topics, which can be developed and implemented in Capacity Building activities. These actions can be personalised for specific scenarios e.g. controlled sensitive human data access.

Based on the Capacity Building activities, and in coordination with WP2, a “train the trainers” programme will be proposed on the selected demonstrator projects tailored to different users profiles. The aim is to identify then which aspects can be leveraged to other demonstrators first, and then more generally, later on.

Leadership: University of Ljubljana (Brane Leskosek, ELIXIR Slovenia)

Participants: CNRS (ELIXIR France), Heidelberg Institute for Theoretical Studies (ELIXIR Germany), IGC (ELIXIR Portugal), INRAE (ELIXIR France), University of Bergen (ELIXIR Norway), University of Luxembourg (ELIXIR Luxembourg), Weizmann Institute of Science (ELIXIR Israel).

Deliverables

D5.1	Categorization of the pilot projects. Description of the needs in term of DMP of the demonstrator projects and development of a projects categorisation from that point of view.	January 2021
D5.2	Report on the first two DMP processes Description of the first two DMPs co-constructed with WP3 including the gap analysis.	July 2021
D5.3	Report on the dedicated training and capacity building activities Report on the training and capacity building activities co-constructed with WP2.	September 2022
D5.4	Report on KPI Report on KPIs developed for WP5 based on demonstrator projects.	November 2022
D5.5	Report on the remaining four DMP processes Description on the remaining DMPs constructed taking into consideration the outcomes of the previous experiences.	January 2023

WP leaders

**Anne-Françoise Adam-Blondon**
INRAE
(ELIXIR France)

**Salvador Capella-Gutierrez**
BSC
(ELIXIR Spain)