Human data and translational research is a high priority for ELIXIR and builds on the progress made in the previous programmes by the Human Data Communities. Within the Science Tier of the ELIXIR 2024–2028 Programme, advances will be focussed on enabling researchers (including research clinicians) to use ELIXIR’s infrastructure, for human genomic, phenotypic, imaging and demographic data to support discovery, analysis, innovation and integration of research findings into the clinic and healthcare. More specifically, through these projects we will ensure that millions of human genomes are discoverable and exploited in a biomedical setting through ELIXIR-supported infrastructure and community-endorsed standards, software, workflows and analysis environments across ELIXIR Nodes.
On Data Deposition:
- FAIR-FEGA: Accelerating high quality FAIR data deposition in Federated EGA
- FHDportal: Open National Submission and Access Portal for Federated Human Data
On Federated Data Analysis:
- Leveraging federated learning and RO-Crates for human genomic data analysis and provenance tracking
- Empowering Users: Orchestrating Sensitive Data Access for Interactive Federated Analysis in Virtual Research Environments
On Linking Data:
- FEGA-Connect: Linking European human multi-omic data deposition databases, biobanks and derived knowledge resources
Theme: Data Deposition
The Federated European Genome-Phenome Archive (FEGA) network is an ELIXIR-supported infrastructure for making human genomic data discoverable and accessible across ELIXIR Nodes. This project seeks to accelerate data depositions into FEGA, which will significantly increase the data flow in and from FEGA nodes.
In alignment with the goals of the Human data and translational research Tier of the ELIXIR 2024–2028 programme, this project will promote seamless data integration and increase global researchers’ confidence in the data stored within FEGA, thus strengthening the network's position as a trusted resource for genomic data. It will build capacity within the FEGA Nodes and increase awareness among a wide range of stakeholders, thus altogether achieving the ultimate goal of enhancing data reuse.
The project will be carried out by a strategic consortium comprising seven ELIXIR Nodes and two ELIXIR Communities. Partners represent four FEGA nodes at different levels of maturity, a member of the Cancer Data Community and both institutions managing Central EGA. The proposal is formulated around five timely coordinated tasks where all partners contribute their expertise to the final outcomes, converging in the deposition of several datasets to different nodes, testing the new tools and metadata model and blueprinting deposition of high-quality FAIR data in the future.
Nodes involved: ELIXIR Switzerland, ELIXIR Spain, ELIXIR France, ELIXIR Norway, ELIXIR Portugal, EMBL-EBI
Communities: Cancer Data, Federated Human Data
Theme: Data Deposition
Human data, especially genomic data, is increasingly being federated across borders and institutions, with many stakeholders participating in multinational and global biomedical and health data networks, fostering collaborations and partnerships. While such international efforts are essential for the compilation and reuse of data, regulatory constraints often hinder the movement of certain data beyond organisational or national boundaries. Centralised approaches such as the Central European Genome-Phenome Archive (CEGA) are valuable, but not all data can be centralised.
The Federated European Genome-phenome Archive network (FEGA) addresses this, with early work concentrated on local collection of data with central archiving of metadata. FHDportal aims to support both federated and central submission of metadata. It will do this by providing a reusable portal for gathering and storing metadata at a national level, and submitting required metadata centrally to enable discovery of datasets via the CEGA. FHDportal complements the existing system by providing a way to explore richer metadata (for example, including detailed information on specific datasets or local funding information), while enabling a core set of metadata to be queried centrally.
FHDportal will be deployed and tested on FEGA nodes, and should be of interest to the many other countries seeking to join FEGA. The need for FHDportal is based on experience during onboarding and in moving to production nodes. It will offer a common solution for local mobilisation of data and metadata, which can be adapted to local situations. During development, it will be tested on both new and well-established nodes using different technical platforms and infrastructures. The resulting software will be provided to the whole community, and will hopefully become part of the emerging toolkit for new FEGA nodes wishing to establish themselves, and to ensure their nodes meet local needs while bringing European scale benefits.
Nodes involved: ELIXIR Switzerland, ELIXIR Finland, ELIXIR Luxembourg, ELIXIR UK
Communities: Federated Human Data, Human Copy Number Variation
Theme: Federated Data Analysis
Federated analysis (FA) revolutionises genomics research by enabling collaborative analysis across distributed datasets, while safeguarding data privacy and facilitating comprehensive insights into genetic diseases. Federated access and analysis of human datasets is part of the ELIXIR scientific program. ELIXIR is also involved with the EUCAIM (European Cancer Imaging Initiative project, and coordinates the European Genomic Data Infrastructure (GDI) project, which aims to provide federated access to 1+M whole genome sequences (WGS). While the GDI project explores federated solutions to analyse its data, it does not foresee deploying FA solutions for evaluation.
This project seeks to implement FA across four ELIXIR Nodes, using synthetic and real, publicly accessible Genome-Wide Association Studies (GWAS) data. To maximise the impact of this proposal, we plan to leverage the developments already made in the context of the EUCAIM project, specifically the orchestration solution around the Flower Framework and the ongoing developments in the FA, in the context of the Staff Exchange BRIDGE between ELIXIR and DCEG/NIH, where Yjs framework is the chosen solution.
We also aim to represent the analysis using RO-Crates to track the provenance of the analysis, following the Five Safes Framework. The proposal is built around ongoing collaborations on deploying and testing FA solutions for analysing sensitive data across different projects like GDI, EUCAIM, BY-COVID and TRE-FX.
This project aims not only to boost this interaction using the Flower Framework for FA, but also to strengthen the connection to NIH/DCEG through dataset sharing and comparing different FA frameworks. All Nodes involved in this project are active members of the ELIXIR Human Data Communities, especially the Federated Human Data and Cancer Data ones. The outcomes derived from this project will be disseminated not only to these Communities but also to all ELIXIR projects where this topic is relevant.
Nodes involved: ELIXIR Belgium, ELIXIR Spain, ELIXIR France, ELIXIR Portugal, ELIXIR UK
Communities: Cancer Data, Federated Human Data
Theme: Federated Data Analysis
Through the 1+Million Genomes (1+MG) initiative, Europe is scaling up efforts to build a shared framework and infrastructure to safely access and integrate clinical human data across borders, following regulatory efforts like the General Data Protection Regulation (GDPR) and the European Health Data Space (EHDS). These are pivotal in safeguarding sensitive information, while enabling authorised access for researchers, healthcare professionals and other actors.
Integral to biomedical data security considerations are the European Genome-Phenome Archive (EGA), in both Central and Federated forms, recognised as the predominant European repository for the secure storage of pheno-clinical and genomics data. Mobilising data for secure analysis in Virtual Research Environments (VREs) remains challenging. Indeed, it is an active focus in ongoing projects like the European Genomic Data Infrastructure (GDI), EOSC-ENTRUST and EOSC4Cancer.
Galaxy is a popular open-source, community-driven VRE for bioinformatics analysis that represents a unique platform for developing and testing novel strategies for data analysis. A prototyping strategy for the access and processing of sensitive data was demonstrated in a previous ELIXIR implementation study (2021–2023). By adopting GA4GH Crypt4GH encryption standard features, we enabled Galaxy users within Trusted Research Environments (TREs) to decrypt sensitive data for workflow execution without sharing private encryption keys.
We propose expanding this prototype into a comprehensive solution for secure data analysis in Galaxy, facilitating encrypted data access and transfer from FEGA/EGA repositories to designated TREs, all interactively orchestrated by the users on a public Galaxy server. The proposed solution offers flexibility with different levels of enforced restrictions ranging from scenarios with no limitations on encrypted data transfer and storage, to fully federated analysis scenarios, where analysis occurs near the data. Most of the required infrastructure can also be deployed independent of Galaxy, simplifying the potential implementation of these concepts in other VREs.
Nodes involved: ELIXIR Belgium, ELIXIR Germany, ELIXIR Spain, ELIXIR Norway
Communities: Cancer Data, Federated Human Data, Galaxy
Theme: Linking Data
Today, research generates more data than ever, and a multitude of experimental data types. Such data types are often connected at source: perhaps generated from the same samples or as part of the same study. It is important that different data types are made available for re-use in a linked and coordinated manner, enabling full reuse of all the data in integrated analysis. Experimental data types are often siloed in varied specialised repositories, using different metadata models, so linking them is not straightforward. Also, data obtained from living humans is sensitive and shared under a controlled access model, adding an extra layer of complexity.
In this project, partners will establish a strong foundation for developing solutions to integrate multi-omic sensitive data effectively among FEGA nodes, biobanks and ELIXIR Core Data Resources such as PRIDE and GWAS Catalog. Five ELIXIR Nodes will be involved, as well as the Polish FEGA node (in-kind contribution) from two ELIXIR Communities (Federated Human Data and Proteomics), spanning three diverse data use cases to address the challenges of this open call.
The project will start by developing a comprehensive landscape analysis of current human data linkage challenges and solutions (Task 1). Based on this, concrete models and prototypes will be proposed to link sensitive proteomics data (Task 2), cohorts and biobank data (Task 3), and population cohort-derived data (Task 4) to genomics data. Results from tasks 2 to 4 will be used to improve the FEGA metadata model. The project will result in more coherent data deposition, discoverability and retrieval of multi-omics datasets, providing FAIRer data, and accelerating research. To facilitate broad engagement, the project will engage the ELIXIR Communities through dedicated online and in-person events, where both interim and final results of the project will be disseminated.
Nodes involved: ELIXIR Finland, ELIXIR Germany, ELIXIR Spain, ELIXIR Sweden, EMBL-EBI
Communities: Federated Human Data, Proteomics