ELIXIR CONVERGE WP7: Federated European Genome-phenome Archives for transnational access of COVID-19 host data

This work package will work together with the ELIXIR expert data managers network (WP1) to collate the national COVID-19 DMPs and metadata requirements. The results will be used as part of the impact assessment and long-term sustainability strategy (WP4) to demonstrate the added value for pan-European COVID-19 data management collaboration.

The WP will also build on the competence in ELIXIR-CONVERGE (WP1, WP3 and WP4) to facilitate the development of the Federated EGA node maturity model by examining how this model aligns with the national or regional plans for COVID-19 host data sharing initiatives.


O7.1 Federation architecture, interfaces, and compliance tests for the Federated EGA network. Task 7.1
O7.2 Coordination of the development of a reference implementation sufficient to create a functional Federated EGA node. Task 7.1​
O7.3 Development of phenotype metadata model to enable mapping and linkage of COVID-19 host-clinical measures across European national nodes, and robust linkage between host and viral datasets. ​Task 7.3
O7.4 The development of documentation and guidelines for the operational practices of federated EGA nodes. ​Task 7.4


Task 7.1 Architecture, interfaces, and compliance to support EGA federated network on COVID-19 host data management

This task will define the architecture and the required interfaces for interoperable and federated EGA Network supporting COVID-19 data management processes. This will include establishing the communication to enable data sharing via the European COVID-19 platform.

The work will start with collecting the requirements and the use cases that are specific for COVID-19 data management, and define the requirements for coordinated data management processes with the European COVID-19 data platform. More specifically, this work package will focus on metadata exchange, user identification and authorized data dissemination processes. It will define those interfaces that enable shared management of permissions, data access, researcher identity, phenotype submission and data exchange between the nodes.

We will take into account annotations of data reflecting data use conditions and GDPR-relevant specifications will be important to filter the right subsets of data for access requests. It will also design all necessary interfaces to manage communication and data transfer from the Federated EGA nodes to the European COVID-19 platform. The technical interfaces and their priority will be coordinated with the ELIXIR Federated Human Data WP2 in order to draw potential synergies but also avoid any duplication of work across related projects.

The work will be aligned with emerging international standards (where possible) to promote interoperability between the Federated EGA nodes and similar international initiatives. For example, we will align:

  • data discovery with the ELIXIR Beacon Network and the GA4GH Discovery Work Stream
  • data dissemination with the GA4GH Large Scale Genomics Work Stream, and
  • access and authorisation will use ELIXIR AAI, which is aligned with the Data Use and Researcher Identity (DURI) Work Stream standards of GA4GH.

The overall goal is to achieve unified user experience and ensure security across the ecosystem while maintaining flexibility to facilitate local requirements. A review of use-cases and requirements for federation will be initially carried out to ensure the essential requirements are fulfilled for a functional network.


Participants: CSC (ELIXIR-Finland), Uppsala University (ELIXIR Sweden), CRG (ELIXIR Spain), FCG-IGC and INESC-ID (ELIXIR Portugal), BSC (ELIXIR Spain).

Task 7.2 A technical implementation required for a federated EGA Network

This task will develop the technical capability for federated EGA nodes to be interoperable based on the interfaces defined as part of task 7.1.

For nodes with less mature infrastructure, we will provide a packaged implementation to allow rapid deployment using the existing Local EGA technology. This solution will enable the core functionalities in a node, such as: data submission, secure archiving, metadata storage, permissions management, and secure data distribution. We continue to develop this technology based on the use cases identified in task 7.1.

All EGA node services originate from the same codebase, but due to infrastructural differences the actual service deployments may differ. So this task will coordinate the development work for each partner and ensure their compatibility with the shared interfaces. The task will also produce a reference implementation of these interfaces in order to enable a fast way for new nodes to start building the technical capability. All developed software code will be open source and shared publicly. We will also provide the needed documentation and other training material for task 7.5.

This task will provide:

  • A deployable solution using modern containerisation technologies
  • A definition of internal interfaces for the pluggable software modules and some implemented modules;
  • The relevant documentation;
  • Coordination for the development and maintenance on Task 7.1 defined interfaces;
  • Establishing, administering, and maintaining a shared code repository;
  • Support to Task 7.5 in producing the documentation describing the software and its deployment processes, to assist capacity building in other Nodes.

Leadership: CRG (ELIXIR Spain)

Participants: ​ EMBL-EBI, CSC (ELIXIR Finland), Uppsala University (ELIXIR Sweden), BSC (ELIXIR Spain), University of Oslo (ELIXIR Norway), FCG-IGC and INESC-ID (ELIXIR Portugal), DKFZ/GHGA (ELIXIR Germany, University of Tübingen/GHGA (ELIXIR Germany), EMBL/GHGA (ELIXIR Germany), University of Pécs (ELIXIR Hungary), SIB (ELIXIR Switzerland).

Task 7.3 Coordination of metadata standards for phenotype submission and access of COVID-19 data

The wide variety of phenotypic manifestations in COVID-19 patients pose a challenging problem to attempts to understand the etiology of the disease. In the absence of a universal vaccine, we need to identify which genetic and environmental factors are directly linked to disease severity. This will allow us to build predictive models to inform pandemic management strategies.

The clinical genetics community across Europe is rapidly collecting metadata on disease severity, comorbidity, gender and age profiles, ancestry, lifestyle risk factors, and hospitalisation information from COVID-19 patients. This information is collected at a local level using a variety of collection methods, ontological terms, and coding schemes. The challenge is to standardise and harmonise disease metadata across the Federated EGA network to enable downstream genetic association and host risk factor identification studies.

To address this challenge, we will construct a common minimal metadata model that will map across COVID-19 studies in the federated EGA network. It will facilitate data flow among interested groups, most importantly from hospital systems (e.g. HL7 FHIR) to research (using e.g. Human Phenotype Ontology). We will base our model on emerging COVID-19 specific data dictionaries from initiatives such as The COVID-19 Host Genetics Initiative and the World Health Organisation (WHO) case report form. This will allow COVID-19 studies in the Federated EGA to be co-analysed with other large cohorts.

We will work with the VODAN GO-FAIR implementation network to align metadata standards to actual data collection initiatives (ELIXIR Netherlands). The metadata standard will also be communicated across the ELIXIR nodes via our Data Management Network (WP1) and using the ELIXIR Federated Human Data project to interact with other relevant international projects.

Leadership: CSC (ELIXIR Finland)

Participants: EMBL-EBI, CRG (ELIXIR Spain), Uppsala University (ELIXIR Sweden), BSC (ELIXIR Spain), FCG-IGC and INESC-ID (ELIXIR Portugal), University of Pécs (ELIXIR Hungary), DTL (ELIXIR Netherlands), SIB (ELIXIR Switzerland)

Task 7.4 Operational support and maturity model for Federated EGA nodes

This task provides the roadmap for national nodes to engage and become operational in the EGA federated network. We will develop a maturity model that allows us to coordinate the network, validate each node against their responsibilities, ensure use of agreed international standards, and most importantly provide peer support, workshops, training and documentation at each step of the process of becoming a fully operational EGA node.

These activities will be coordinated through Task 7.5 within the ELIXIR-CONVERGE project. We will also actively discuss with other projects on similar requirements through ELIXIR Federated Human Data project. Importantly, each node is responsible for their own national funding, infrastructure, security and operational legal framework.

The objectives of this task are to:

  • Publish a maturity model that supports expanding the EGA network interoperability across Europe to support COVID-19 host data management;
  • Develop, coordinate, and organise operational agreements, standard operating procedures, and best practices (for example on security incident reporting; aligned with the GA4GH Data Security work stream recommendations);
  • Develop of suite of compliance tests to verify federated node deployment of EGA federated interfaces to be implemented as part of Task 7.2;
  • Coordinate relevant training material for new staff members within the nodes together with Task 7.5;
  • Coordinate workshops for interested nodes on how to become operational partner of the federated EGA services;
  • Define and monitor relevant KPIs for monitoring successful service delivery across the EGA network, coordinated actions with data access committees and work with the European COVID-19 platform. This work will be done in close collaboration with the ELIXIR Beacon project and the ELIXIR Compute platform.

Leadership: BSC (ELIXIR Spain)

Participants: EMBL-EB, ELIXIR Hub, CRG (ELIXIR Spain), CSC (ELIXIR Finland), Uppsala University (ELIXIR Sweden), University of Oslo (ELIXIR Norway), BSRC (ELIXIR Greece), University of Luxembourg (ELIXIR Luxembourg), EMBL-EBI, University of Tübingen/GHGA (ELIXIR Germany), EMBL/GHGA (ELIXIR EMBL), University of Pécs (ELIXIR Hungary), University of Ljubljana (ELIXIR Slovenia), SIB (ELIXIR Switzerland), FCG-IGC and INESC-ID (ELIXIR Portugal)


D7.1 Report of use-cases and architecture for the EGA federation
A description of the target use-cases that will be fulfilled by Federated EGA and how these are met by the proposed network architecture.
January 2021
D7.2 COVID-19 metadata mapping model across COVID-19 studies in federated EGA)
This report will provide details of the metadata mapping between the COVID-19 submissions to Federated EGA nodes to enable cross cohort discovery queries and joint analysis.
January 2021
D7.3 Implementation and documentation to create an operational EGA node
The report will include details of the implementation of the Local EGA node outlining the main areas of functionality and include links to public documentation necessary to create, deploy, and operate the instance.
January 2021
D7.4 Report on operational processes across the federated EGA network and data transfer with the European COVID-19 platform
This report will provide details and links to the documentation, best practices, and SOPs for operating EGA nodes, and details of how EGA COVID-19 studies can be published in the European COVID-19 platform.
July 2021

WP leaders

Jordi Rambla
Jordi Rambla (ELIXIR Spain)
Thomas Keane
Thomas Keane (EMBL-EBI)