Human Copy Number Variation (hCNV) services

Name Description ELIXIR Node

The initial 2019-2021 hCNV community implementation study employed a set of perceived needs to a) deliver first community standards and procedures; b) identify intersections with other ELIXIR communities and stakeholders in ELIXIR connected organizations, such as GA4GH; and c) to streamline priorities for relevant, achievable deliveries of hCNV community projects. This project for an hCNV implementation study focuses on those potential high-value targets for data access and delivery, using reference resources and community stakeholder engagement to directly implement and test hCNV resources aligned with ELIXIR ecosystems.
The main target here will be the empowerment of the Beacon protocol, to act as standard for federated hCNV discovery and data delivery, in conjunction with additional GA4GH derived standards.

ELIXIR Switzerland, ELIXIR France, EMBL-EBI, ELIXIR Spain, ELIXIR UK, ELIXIR Germany

Cellular and molecular biology are fundamental to ELIXIR's mission. As part of our 2024–28 Programme, we are committed to advancing data services and software for research on nucleic acids, proteins and other biomolecules. This initiative will address new demands for multi-omics and multi-modal analyses, including imaging, by developing methods and partnerships. We will also expand expertise in reusable data and software to incorporate FAIR models, ensuring robust solutions for modelling at all scales. 

The following projects are key to connecting the latest developments with established data resources, unlocking the potential of cellular and molecular biology:

ELIXIR Belgium, ELIXIR Czech Republic, ELIXIR France, ELIXIR Greece, ELIXIR Hungary, ELIXIR Italy, ELIXIR Netherlands, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR UK, EMBL-EBI
ELIXIR Belgium, ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Estonia, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Ireland, ELIXIR Israel, ELIXIR Italy, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI
ELIXIR Belgium, ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Israel, ELIXIR Italy, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI

Theme: Data Deposition

Project objectives

Human data, especially genomic data, is increasingly being federated across borders and institutions, with many stakeholders participating in multinational and global biomedical and health data networks, fostering collaborations and partnerships. While such international efforts are essential for the compilation and reuse of data, regulatory constraints often hinder the movement of certain data beyond organisational or national boundaries. Centralised approaches such as the Central European Genome-Phenome Archive (CEGA) are valuable, but not all data can be centralised. 

The Federated European Genome-phenome Archive network (FEGA) addresses this, with early work concentrated on local collection of data with central archiving of metadata. FHDportal aims to support both federated and central submission of metadata. It will do this by providing a reusable portal for gathering and storing metadata at a national level, and submitting required metadata centrally to enable discovery of datasets via the CEGA. FHDportal complements the existing system by providing a way to explore richer metadata (for example, including detailed information on specific datasets or local funding information), while enabling a core set of metadata to be queried centrally. 

FHDportal will be deployed and tested on FEGA nodes, and should be of interest to the many other countries seeking to join FEGA. The need for FHDportal is based on experience during onboarding and in moving to production nodes. It will offer a common solution for local mobilisation of data and metadata, which can be adapted to local situations. During development, it will be tested on both new and well-established nodes using different technical platforms and infrastructures. The resulting software will be provided  to the whole community, and will hopefully become part of the emerging toolkit for new FEGA nodes wishing to establish themselves, and to ensure their nodes meet local needs while bringing European scale benefits.

People

SIB leads Swiss FEGA (onboarding in progress). Mark Ibberson and Owen Appleton bring expertise in human data and service development as partners of the Swiss TRE which will host the FEGA service. Patrick Ruch will contribute to query and metadata mining of FHDportal as an established expert in the field. Michael Baudis (co-lead of Beacon protocol development and GA4GH Discovery work stream) will provide Beacon implementation and alignment to GA4GH data standards.

CSC hosts an established FEGA node providing extensive technical expertise in sensitive data service design and architecture. Riku Riski and Jaakko Leinonen will design and deliver testing results for the alpha version of the portal with the testing partners

Venkata Satagopam (UNILU) brings extensive experience in clinical and translational data curation, FAIRification, data integration, knowledge management and ML/AI analysis. UNILU will test the alpha version of the portal with metadata from different health use cases.

Tim Beck (UNOTT) is part of the Health Data Research UK (HDR UK) Federated Analytics infrastructure programme and lead for human data activities at the ELIXIR-UK Node. He will lead the testing and feedback of the portal with HDR UK use cases. 

ELIXIR Finland, ELIXIR Luxembourg, ELIXIR UK, ELIXIR Switzerland

This Study's work will address the following themes:

  1. Optimal CNV detection pipelines for research and diagnostics: to release a set of sensitive and reliable pipelines, optimized and validated to detect CNV from various high throughput datasets. These pipelines will be available either through the ELIXIR compute nodes and/or as stand-alone solutions.
  2. Definition of reference datasets: to provide open reference datasets of fully validated somatic and germline CNVs representing a wide range of samples types and experimental technologies.
  3. Improvement of community formats for CNV exchange: to improve the VCF format and identify other nomenclatures and widely used formats in other communities (in alignment with GA4GH and the ELIXIR Interoperability Platform)
  4. Enabling CNV data discovery in diagnostic and phenotypic context: The hCNV Community will work towards enabling the ELIXIR Beacon Project for the envisioned patient discovery, through the support of extended clinical descriptions including enabling and testing of relevant annotation standards (e.g. HPO, NCIt and additional ontologies).
  5. Functional annotation of CNVs
  6. Combinatorial approaches to CNV interpretation
  7. Identification of landmark genes in regions of interest
  8. FAIRification of hCNV databases and datasets: The FAIR principles (Findable, Accessible, Interoperable, Reusable) will be applied to those systems to demonstrate the feasibility and utility of distributed CNV databases in order to allow interoperability (including resource and data discovery).
  9. Dissemination:
    1. Training materials
    2. Train actors, patients and the general public
    3. Capacity Building training events across certain ELIXIR Nodes
ELIXIR France, ELIXIR Switzerland, ELIXIR Germany, EMBL-EBI, ELIXIR Spain, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Hungary, ELIXIR Slovenia, ELIXIR UK

Human data and translational research is a high priority for ELIXIR and builds on the progress made in the previous programmes by the Human Data Communities. Within the Science tier of the ELIXIR 2024–2028 Programme, advances will be focussed on enabling researchers (including research clinicians) to use ELIXIR’s infrastructure, for human genomic, phenotypic, imaging and demographic data to support discovery, analysis, innovation and integration of research findings into the clinic and healthcare. More specifically, through these projects we will ensure that millions of human genomes are discoverable and exploited in a biomedical setting through ELIXIR-supported infrastructure and community-endorsed standards, software, workflows and analysis environments across ELIXIR Nodes. 

On Data Deposition:

On Federated Data Analysis:

On Linking Data:

ELIXIR Belgium, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Luxembourg, ELIXIR Norway, ELIXIR Portugal, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI
ELIXIR France, ELIXIR Germany, ELIXIR Hungary, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI

The ELIXIR human Copy Number Variation Community (hCNV) was created in December 2018. In two years contributions to the field have been numerous (ELIXIR IS, Rare Diseases, Federated Human Data, Beacons, GA4GH, EJP-RD and Beyond 1 Million Genomes - B1MG).

The Community now aims to address the major challenge of NGS data interpretation in the era of whole genome sequencing: Copy Number Variation. During the first commissioned service offered as a starting grant, the Community has identified various gaps to proceed with CNV tools benchmarking and in particular for Exome and targeted sequencing, which are by far the most widely used technologies in diagnostic laboratories and in research.

Within this implementation study we want to provide solutions and bioinformatic infrastructure solutions to fill identified gaps, and to make these biomedical reference materials available (i.e. via Open Science) to the various communities and platforms.

ELIXIR France, ELIXIR UK, ELIXIR Switzerland, ELIXIR Spain, ELIXIR Germany

Spatial transcriptomics (ST) was named ‘Method of the Year 2020’ by Nature Methods and was more recently featured in Nature’s Seven technologies to watch in 2024. ST is now a prerequisite for researching transcriptional pathology at the cellular and molecular levels. Current use of ST is ubiquitously applied to multiple pathologies, including neurodegenerative disease, cancer, cardiomyopathy and nephrology. There is also an emerging application of ST in plant and microbiome research. While there are a plethora of spatial analysis applications, these are not unified or easily manageable by research scientists and they lack any hope of delivering FAIR and reproducible results.

To address this challenge, we will implement Spatial2Galaxy (S2G) – a self-contained, reproducible, scalable FAIR spatial transcription analysis platform for researchers and bioinformaticians alike. We will develop S2G based on our success with developing Galaxy workflows, training materials and ST and single-cell analysis pipelines. 

S2G will provide state-of-the-art ST tools and workflows with proven high performance in benchmarking studies, ensuring the uptake of best practices. These tools will be demonstrated on datasets that connect various ST databases. This will consolidate community guidelines for integrative multi-modal single-cell omics and imaging analysis. Compared to non-spatial single-cell sequencing, presented as the Nature ‘Method of the Year 2013', it took six years until practical training and workflows for its analysis were FAIRified and available in Galaxy by 2019. In contrast, S2G aims to reduce this gap between technologies becoming relevant and provision of FAIR resources to the life science community for ST. 

ELIXIR Germany, ELIXIR France, ELIXIR Netherlands, ELIXIR UK
ELIXIR Switzerland, ELIXIR UK