Cancer Data services
Name | Description | ELIXIR Node |
---|---|---|
Cellular and molecular biology are fundamental to ELIXIR's mission. As part of our 2024–28 Programme, we are committed to advancing data services and software for research on nucleic acids, proteins and other biomolecules. This initiative will address new demands for multi-omics and multi-modal analyses, including imaging, by developing methods and partnerships. We will also expand expertise in reusable data and software to incorporate FAIR models, ensuring robust solutions for modelling at all scales. The following projects are key to connecting the latest developments with established data resources, unlocking the potential of cellular and molecular biology:
|
ELIXIR Belgium, ELIXIR Czech Republic, ELIXIR France, ELIXIR Greece, ELIXIR Hungary, ELIXIR Italy, ELIXIR Netherlands, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR UK, EMBL-EBI | |
Theme: Federated Data AnalysisThrough the 1+Million Genomes (1+MG) initiative, Europe is scaling up efforts to build a shared framework and infrastructure to safely access and integrate clinical human data across borders, following regulatory efforts like the General Data Protection Regulation (GDPR) and the European Health Data Space (EHDS). These are pivotal in safeguarding sensitive information, while enabling authorised access for researchers, healthcare professionals and other actors. Integral to biomedical data security considerations are the European Genome-Phenome Archive (EGA), in both Central and Federated forms, recognised as the predominant European repository for the secure storage of pheno-clinical and genomics data. Mobilising data for secure analysis in Virtual Research Environments (VREs) remains challenging. Indeed, it is an active focus in ongoing projects like the European Genomic Data Infrastructure (GDI), EOSC-ENTRUST and EOSC4Cancer. Galaxy is a popular open-source, community-driven VRE for bioinformatics analysis that represents a unique platform for developing and testing novel strategies for data analysis. A prototyping strategy for the access and processing of sensitive data was demonstrated in a previous ELIXIR implementation study (2021–2023). By adopting GA4GH Crypt4GH encryption standard features, we enabled Galaxy users within Trusted Research Environments (TREs) to decrypt sensitive data for workflow execution without sharing private encryption keys. We propose expanding this prototype into a comprehensive solution for secure data analysis in Galaxy, facilitating encrypted data access and transfer from FEGA/EGA repositories to designated TREs, all interactively orchestrated by the users on a public Galaxy server. The proposed solution offers flexibility with different levels of enforced restrictions ranging from scenarios with no limitations on encrypted data transfer and storage, to fully federated analysis scenarios, where analysis occurs near the data. Most of the required infrastructure can also be deployed independent of Galaxy, simplifying the potential implementation of these concepts in other VREs. |
ELIXIR Belgium, ELIXIR Germany, ELIXIR Norway, ELIXIR Spain | |
Theme: Data DepositionThe Federated European Genome-Phenome Archive (FEGA) network is an ELIXIR-supported infrastructure for making human genomic data discoverable and accessible across ELIXIR Nodes. This project seeks to accelerate data depositions into FEGA, which will significantly increase the data flow in and from FEGA nodes. In alignment with the goals of the Human data and translational research Tier of the ELIXIR 2024–2028 programme, this project will promote seamless data integration and increase global researchers’ confidence in the data stored within FEGA, thus strengthening the network's position as a trusted resource for genomic data. It will build capacity within the FEGA Nodes and increase awareness among a wide range of stakeholders, thus altogether achieving the ultimate goal of enhancing data reuse. The project will be carried out by a strategic consortium comprising seven ELIXIR Nodes and two ELIXIR Communities. Partners represent four FEGA nodes at different levels of maturity, a member of the Cancer Data Community and both institutions managing Central EGA. The proposal is formulated around five timely coordinated tasks where all partners contribute their expertise to the final outcomes, converging in the deposition of several datasets to different nodes, testing the new tools and metadata model and blueprinting deposition of high-quality FAIR data in the future. |
ELIXIR France, ELIXIR Norway, ELIXIR Portugal, ELIXIR Spain, ELIXIR Switzerland | |
Human data and translational research is a high priority for ELIXIR and builds on the progress made in the previous programmes by the Human Data Communities. Within the Science tier of the ELIXIR 2024–2028 Programme, advances will be focussed on enabling researchers (including research clinicians) to use ELIXIR’s infrastructure, for human genomic, phenotypic, imaging and demographic data to support discovery, analysis, innovation and integration of research findings into the clinic and healthcare. More specifically, through these projects we will ensure that millions of human genomes are discoverable and exploited in a biomedical setting through ELIXIR-supported infrastructure and community-endorsed standards, software, workflows and analysis environments across ELIXIR Nodes. On Data Deposition:
On Federated Data Analysis:
On Linking Data: |
ELIXIR Belgium, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Luxembourg, ELIXIR Norway, ELIXIR Portugal, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI | |
Theme: Federated Data AnalysisFederated analysis (FA) is transforming genomics research by enabling collaborative computation across distributed datasets, all while preserving data privacy. It supports comprehensive insight generation without centralising sensitive data – a crucial advancement in genomic medicine. Federated access and analysis of human datasets is a key component of the ELIXIR Scientific Programme. ELIXIR is actively involved in several major initiatives, including the European Cancer Imaging Initiative (EUCAIM) and coordination of the European Genomic Data Infrastructure (GDI) project, which aims to facilitate federated access to over one million whole genome sequences (WGS). While GDI is exploring federated solutions for data analysis, it does not plan to deploy full FA systems for evaluation at this stage. In genomics, many well-established algorithms assume centralised data environments and are not directly compatible with federated architectures. From an infrastructure perspective, FA introduces technical and organisational challenges. These include deploying and maintaining new software stacks, ensuring data interoperability across sites, and securely isolating compute environments capable of executing remote algorithms. Project objectives and implementationThis project will implement federated analysis across four ELIXIR Nodes, using both synthetic and publicly accessible Genome-Wide Association Studies (GWAS) datasets. It will leverage the EUCAIM orchestration framework, Flower, to conduct real federated computations on synthetic genomic data. Through this work, we aim to:
Improving provenance and complianceBeyond infrastructure, the project contributes to improving provenance tracking in FA settings. We will adapt the RO-Crate standard to generate provenance packages at each participating site. These will capture both machine- and human-readable metadata about local data, tools, and processes. RO-Crates will then be aggregated, either automatically or via secure manual upload, into a central provenance dashboard. This ensures traceability and compliance with regulatory frameworks such as GDPR. The RO-Crate packages will reflect the Five Safes framework (safe data, projects, people, settings, outputs) within federated settings that may include varied infrastructure configurations. This approach also supports the FAIR principles by documenting each computational step in a standardised, shareable format. These efforts lay essential groundwork for building scalable, trustworthy federated infrastructures across national and international networks. Broader context and collaborationsThis project builds on existing collaborations to deploy and test FA solutions across sensitive data projects including GDI, EUCAIM, BY-COVID and TRE-FX. It also aligns with the BRIDGE staff exchange between ELIXIR and the NIH’s Division of Cancer Epidemiology and Genetics (DCEG), where the Yjs framework has been selected for collaborative work. We aim to reinforce this partnership by using Flower to evaluate and compare FA frameworks and by sharing relevant datasets. All participating Nodes are active members of the ELIXIR Human Data Communities, particularly the Federated Human Data and Cancer Data communities. Results and insights from this project will be shared broadly across these communities and other ELIXIR initiatives where federated analysis is relevant. Co-leadsDilza Campos, Carles Hernandez |
ELIXIR Belgium, ELIXIR France, ELIXIR Portugal, ELIXIR Spain, ELIXIR UK | |
Spatial transcriptomics (ST) was named ‘Method of the Year 2020’ by Nature Methods and was more recently featured in Nature’s Seven technologies to watch in 2024. ST is now a prerequisite for researching transcriptional pathology at the cellular and molecular levels. Current use of ST is ubiquitously applied to multiple pathologies, including neurodegenerative disease, cancer, cardiomyopathy and nephrology. There is also an emerging application of ST in plant and microbiome research. While there are a plethora of spatial analysis applications, these are not unified or easily manageable by research scientists and they lack any hope of delivering FAIR and reproducible results. To address this challenge, we will implement Spatial2Galaxy (S2G) – a self-contained, reproducible, scalable FAIR spatial transcription analysis platform for researchers and bioinformaticians alike. We will develop S2G based on our success with developing Galaxy workflows, training materials and ST and single-cell analysis pipelines. S2G will provide state-of-the-art ST tools and workflows with proven high performance in benchmarking studies, ensuring the uptake of best practices. These tools will be demonstrated on datasets that connect various ST databases. This will consolidate community guidelines for integrative multi-modal single-cell omics and imaging analysis. Compared to non-spatial single-cell sequencing, presented as the Nature ‘Method of the Year 2013', it took six years until practical training and workflows for its analysis were FAIRified and available in Galaxy by 2019. In contrast, S2G aims to reduce this gap between technologies becoming relevant and provision of FAIR resources to the life science community for ST. |
ELIXIR Germany, ELIXIR France, ELIXIR Netherlands, ELIXIR UK |