ELIXIR Federated Human Data (2019-21)

Over the last forty years, we have seen the emergence of large cohorts of human samples from research and national healthcare initiatives. Many countries in Europe now have nascent personalised medicine programmes meaning that human genomics is undergoing a step change from being a predominantly research-driven activity to one funded through healthcare. This is evidenced by the recent Declaration of 19 European countries to sequence and share transnationally at least 1M human genomes by 2022. This initiative will catalyse the transition of genomics from the bench to bedside in Europe.

We envisage that a significant subset of these data will be made available for secondary research. However genetic data generated through healthcare is not likely to be shared as widely as research data. Healthcare is subject to national laws, and it is often unacceptable for health data from one country to be exported outside regional or national jurisdictions. Our vision for the ELIXIR Federated Human Data Community is to create a federated ecosystem of interoperable services that enables population scale genomic and biomolecular data to be accessible across international borders accelerating research and improving the health of individuals resident across Europe.

This project will coordinate the delivery of FAIR compliant metadata standards, interfaces, and reference implementation to support the federated ELIXIR network of human data resources. The overall goal is to provide secure, standardized, documented and interoperable services under the framework of the European Genome-phenome Archive (EGA). This three year plan includes a structured roadmap for ELIXIR Nodes to join the EGA federated network by providing the necessary technical, logistical, and training coordination across the network.

This project builds on earlier work in the ELIXIR-EXCELERATE, CORBEL and Tryggve projects. It will be led by the European Genome-phenome Archive (EGA) to ensure work described in this proposal is aligned with the policies, legal agreements, and governance model for establishing the Federated EGA. WP3 will build on work in EXCELERATE WP9 to create a reference software implementation, the Local EGA, that Nodes can use to operate their federated node.

The work is divided between five Work Packages (WPs):

WP1: Coordination of metadata standards for sensitive human data

Lead: Ilkka Lappalainen (ELIXIR FI)

The overall objective of this work package is to coordinate a clearly defined, documented and maintained set of metadata standards for data collected from human samples. The data and metadata collected by disparate cohorts varies widely from the type of biomaterials collected from participants (e.g. blood, DNA, tissue samples), lifestyle information by participant questionnaires and molecular measurements used to record phenotypes.

The cohorts are also assembled for different purposes - population longitudinal studies vs. disease progress for example. Cohort variables are considered separately, and in groups where they may be grouped to represent a diagnosis e.g. those measured variables related to metabolic syndrome or risk factors for stroke. Standardisation and interoperability of these data are critical for this project and application of the FAIR principles brings benefits to cohort owners and the wider community.

The work package partners are actively involved in both global and European standards bodies (e.g. Global Alliance for Genomics and Health - GA4GH, and the International Nucleotide Sequence Database Collaboration - INSDC), infrastructure coordination projects (e.g. CORBEL, EXCELERATE, dbGap/DATS, BioCADDIE, CINECA), and disease specific resources (e.g. ICGC, rare diseases RD-Connect, Solve-RD).

This work package provides the necessary coordination activity from existing forums as the basis for deciding which metadata standards will be adopted by the ELIXIR federated human data network and provides documentation on how to apply these standards across the Federated EGA nodes. The work package will not create new standards nor provide coordination to harmonise national patient registries, clinical use of ontologies or vocabularies for example.

WP2: Architecture and interfaces to support ELIXIR federated human data infrastructure

Lead: Thomas Keane (EMBL-EBI)

One of the fundamental requirements for creating an interoperable federated network is to determine what information needs to be communicated between the nodes, and to translate these requirements into a technical specification and protocol for sending/receiving messages. This work package will deliver the overall architecture and define interfaces for the federated human data services.

Maintaining architectural integrity is essential for developing the federation in controlled fashion and ensuring interoperability between the locally run services. This work package will focus on interfaces for data discovery, metadata exchange, data dissemination, access and authorisation. This work package will align with emerging international standards (where possible) to promote interoperability between the Federated EGA nodes and similar international initiatives.

For discovery, we will align with the ELIXIR Beacon and the GA4GH Discovery Work Stream, data dissemination will align with GA4GH Large Scale Genomics Work Stream (e.g. htsget, GA4GH file formats, encryption container format, and reference retrieval API), and access and authorisation will use the ELIXIR AAI which is aligned with the DURI Work Stream of GA4GH. The overall goal is to achieve unified user experience, ensure security across the ecosystem while maintaining flexibility to cater local requirements.

We envisage two levels of engagement from federation partners:

  1. Nodes that have an existing operational human data sharing platform with deep investment in existing IT infrastructure and data security infrastructure solutions. These nodes would join the federated network by implementing the APIs recommended by WP2 as a layer over their existing infrastructure.
  2. Nodes with less well mature human data sharing infrastructure (but with resources and competence) that would want to contribute via WP3 to the joint development effort and deploy it locally for their Node.

We will provide foundation for data analysis by providing suitable access (e.g. streaming) to data. However implementation of analysis protocols to be used is out of the scope for this work.

WP3: Coordination of interface implementation

Lead: Jordi Rambla (ELIXIR ES)

This work package will coordinate the software development work that provides an implementation of the WP2 defined interfaces to support the federated services for human data. This task will leverage the work done as Local EGA in the context of EXCELERATE, extending it whenever is necessary.

WP3 will support Nodes by coordinating through:

  1. Hosting periodic coordination calls and facilitate the interface implementation work.
  2. Use of a shared code repository and coordination of relevant documentation.
  3. Coordinating maintenance, development on WP2 defined interfaces.
  4. Providing relevant documentation together with WP5 describing software and deployment processes to assist capacity building in other Nodes.

While the priorities of the implemented interfaces will be decided during the project we have already identified the following important development tasks based on the preceding work on EXCELERATE tasks on sensitive human data: (1) unified data deposition interface that enables submission in uniform fashion to any of the federated archives, (2) support shared user authentication and authorization framework (e.g. ELIXIR AAI and EGA AAI) to ease submitter authentication and re-use of archived data for authorized researchers, (3) quality control processes for submitted data files and (4) validation processes for submitted phenotypic attributes for the samples.

As the majority of the funding for implementation will need to be provided by the participating Nodes, the main risk to this work package is alignment of delivery times with other relevant projects that also fund the development of this infrastructure. During the project it may also become necessary to implement interfaces to services that are not part of the federation and may need to be adapted to support required information exchange.

WP4: Coordination of operational nodes as part of the federated services

Lead: Ilkka Lappalainen (ELIXIR FI)

This work package provides a mechanism for ELIXIR Node services to become part of the federated interoperable services for human data. The process of a Node transitioning from an expression of interest through to becoming a fully operational EGA federated instance requires coordination and support in the form of training, documentation, and advice on best practices.

It is designed to offer support especially to those Nodes that have signed the Declaration to share one million human genomes cross-border by 20221. Importantly, each Node is responsible for their own infrastructure, security and operational legal framework.

The specific objectives of WP4 are to:

  1. Coordinate relevant training material for new staff members within the Nodes together with WP5.
  2. Coordinate workshops for interested Nodes on how to become operational partner of the federated EGA services.
  3. Development, coordination, and organisation of best practices for security incident practices (aligned with the GA4GH Data Security work stream recommendations).
  4. Define and monitor relevant KPIs together with WP2 and WP3 for monitoring successful service delivery. This work will be done in close collaboration with the ELIXIR Beacon project and the ELIXIR Compute platform.

This work package will not fund the local infrastructure, software deployment processes or operational staff members to manage the local services. Furthermore, it will not provide the legal framework for the Federated EGA. The risks include lack of resources to solve infrastructure dependent issues on software deployment or management of local dependencies such as availability of staff members for training.

WP5: Project overall coordination, outreach and training

This work package will provide the project management and coordination for this project.

The specific objectives of WP5 are to:

  1. Establish a project management team that ensures effective communication and timely delivery through the project life time.
  2. Coordination of calls, workshops and meetings.
  3. Development and implementation of contingency and risk management plan.
  4. Development and implementation of outreach strategy.
  5. Coordination and delivery of training material for WP1-WP4 in collaboration with the ELIXIR Training Platform ensuring that materials are accessible via ELIXIR’s Training Portal.