The European Commission amended the ELIXIR-CONVERGE grant on 26 November 2020 to provide additional funding for ELIXIR to implement the Federated EGA (European Genome phenome Archive). The extra €1.3 million for this task will have a special focus on COVID-19 data, ensuring the Federated EGA forms an arm of the European COVID-19 Data Platform and is accessible through the COVID-19 Data Portal.
Thomas Keane (EMBL-EBI), Jordi Rambla (Center for Genomic Regulation, ELIXIR Spain), Ilkka Lappalainen (CSC, ELIXIR Finland) and Salvador Capella-Gutierrez (Barcelona Supercomputing Center, ELIXIR Spain) will lead the new Federated EGA tasks.
The value of sharing COVID-19 biomedical data
COVID-19 symptoms range from asymptomatic to life-threatening. Host factors such as age, weight and genetics account for most of this variability. Thus, by connecting genetic and other biomedical data to COVID-19 disease phenotypes, scientists can identify host factors that determine the susceptibility and severity of the disease.
These insights are best achieved by connecting large cohorts of data across countries to obtain the required volume to identify trends. This, however, is hindered by difficulties in sharing sensitive data.
ELIXIR aims to facilitate the solution to this conundrum: making biomedical data accessible to all researchers across borders, sharing common metadata and harmonising data standards.
Federated EGA for accessing COVID-19 host data
Most biomedical host data come from a clinical setting and are considered extremely sensitive on the basis individuals can be identified. This data is stored nationally and is subject to privacy laws and regulations, preventing the data being shared outside of the country.
Given the value of biomedical data, especially during a pandemic, there is a need to overcome these challenges — this is being achieved through the implementation of Federated EGA.
The EGA — an ELIXIR Core Data Resource run jointly by EMBL-EBI and CRG (ELIXIR Spain) — is a resource for the secure archival and access to all types of potentially identifiable genetic and phenotypic data resulting from biomedical research projects.
In an effort to better connect biomedical data across Europe, the EGA is currently transitioning from a centralised model to a federated resource. Users will deposit data and access from nodes across Europe. The Federated EGA will allow researchers to discover biomedical datasets across Europe, and analyse them remotely, whilst retaining the data within the country of origin.
The ultimate goal is to facilitate the reuse of healthcare data for research to accelerate the development of new therapies for COVID-19 and other diseases.
Many ELIXIR Nodes have already set up a local instance of EGA, such as ELIXIR Norway. All ELIXIR Nodes, along with other organisations and projects, are invited to join and link their datasets to the federated network.
GA4GH standards for harmonising metadata
Beyond providing the necessary infrastructure to promote data sharing across borders, challenges remain to harmonise metadata standards across countries and research communities.
Research communities across Europe use different data standards making connecting diverse data sets and types frustratingly difficult. Thus, there is a need for common standards to combine data from a range of research settings, across numerous fields and countries. The value of data is significantly improved when providing a mechanism for connectivity.
The Federated EGA will use research models based on emerging COVID-19 specific data dictionaries developed by international projects, such as the COVID-19 Host Genetics Initiative.
It will also apply emerging international standards driven by the Global Alliance for Genomics and Health (GA4GH).
This strategy will ensure data from COVID-19 biomedical studies are locally hosted at Federated EGA Nodes and allows these data to be combined.
The ELIXIR-CONVERGE project started in February 2020 to help standardise life science data management across Europe. It is funded from the EU’s Horizon 2020 programme and involves 29 institutes in 22 ELIXIR Nodes. In addition to the uplift of €1.3 million for the Federated EGA activities, an uplift of €1.5 million was provided for the implementation of the COVID-19 Data Platform.