Cellular and molecular biology are fundamental to ELIXIR's mission. As part of our 2024–28 Programme, we are committed to advancing data services and software for research on nucleic acids, proteins and other biomolecules. This initiative will address new demands for multi-omics and multi-modal analyses, including imaging, by developing methods and partnerships. We will also expand expertise in reusable data and software to incorporate FAIR models, ensuring robust solutions for modelling at all scales.
The following projects are key to connecting the latest developments with established data resources, unlocking the potential of cellular and molecular biology:
- Advancing structural and functional ontologies of disordered proteins
- DBTLHub: Towards a one-stop shop for connecting databases, datasets and tools for the Design-Build-Test-Learn cycle in biotechnology
- Spatial2Galaxy: There is no Galaxy without Space
- Next level of reproducible, comparable and integrable Metabolomics
This project addresses the limitations of current ontologies in capturing the dynamic nature of disordered protein regions by pursuing several primary objectives. Firstly, novel structural and functional ontologies will be developed to accurately represent the structural heterogeneity and dynamic functional annotations of proteins. These ontologies will incorporate timescales, annotating the kinetics of structural transformations to elucidate molecular mechanisms and regulatory pathways governing protein dynamics.
Collaborating with existing databases and consortia will ensure seamless integration of ontological resources and experimental data, fostering interoperability and accelerating discoveries. A standardised file format specification will also be developed in collaboration with the Human Proteome Organisation Proteomics Standards Initiative, facilitating the encoding of structural state transitions within disordered protein regions. This specification will enhance data interoperability and exchange among research groups and databases, providing a common language for describing structural transitions and advancing our understanding of the functional implications of protein dynamics in biological systems.
Nodes involved: ELIXIR Belgium, ELIXIR Hungary, ELIXIR Italy, EMBL-EBI
Communities: 3D BioInfo, Intrinsically Disordered Proteins
This project aims to strengthen the basis for a one-stop shop connecting databases, datasets and tools for the deployment of the engineering Design-Build-Test-Learn (DBTL) framework in biotechnology. It will do so by surveying the tools and data landscape, pinpointing gaps and opportunities, and establishing design patterns for task-specific workflows for analysis, integration and sharing of multimodal data.
It will provide a resource that will allow users to navigate the complex landscape of biotechnology tooling and data, as well as to establish solutions that fit their specific DBTL requirements. Use cases from ongoing programmes in various communities will be used to ascertain and establish the pragmatic value of the solutions.
The work will be carried out through hands-on activities, dedicated workshops and hackathons, providing training and resources, as well as fostering industrial engagement. The experience of the communities and platforms involved in systems biology, industrial biotechnology, metabolic modelling, metabolomics, enzymes, bioprospecting and data management will be particularly valuable in this respect, as well as their respective industrial relations. Accordingly, the project engages participants from seven ELIXIR nodes and connects researchers and their activities from six communities.
The project outcomes will contribute to advancing the ambition of connecting the latest developments and established data resources across ELIXIR to realise the potential of cellular and molecular biology, particularly in the fields of industrial biotechnology and biomanufacturing.
Nodes involved: ELIXIR Spain, ELIXIR Greece, ELIXIR France, ELIXIR Netherlands, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR UK
Communities: Biodiversity, Microbiome, Metabolomics, Microbial Biotechnology, Research Data Management, Systems Biology
Spatial transcriptomics (ST) was named ‘Method of the Year 2020’ by Nature Methods and was more recently featured in Nature’s Seven technologies to watch in 2024. ST is now a prerequisite for researching transcriptional pathology at the cellular and molecular levels. Current use of ST is ubiquitously applied to multiple pathologies, including neurodegenerative disease, cancer, cardiomyopathy and nephrology. There is also an emerging application of ST in plant and microbiome research. While there are a plethora of spatial analysis applications, these are not unified or easily manageable by research scientists and they lack any hope of delivering FAIR and reproducible results.
To address this challenge, we will implement Spatial2Galaxy (S2G) – a self-contained, reproducible, scalable FAIR spatial transcription analysis platform for researchers and bioinformaticians alike. We will develop S2G based on our success with developing Galaxy workflows, training materials and ST and single-cell analysis pipelines.
S2G will provide state-of-the-art ST tools and workflows with proven high performance in benchmarking studies, ensuring the uptake of best practices. These tools will be demonstrated on datasets that connect various ST databases. This will consolidate community guidelines for integrative multi-modal single-cell omics and imaging analysis. Compared to non-spatial single-cell sequencing, presented as the Nature ‘Method of the Year 2013', it took six years until practical training and workflows for its analysis were FAIRified and available in Galaxy by 2019. In contrast, S2G aims to reduce this gap between technologies becoming relevant and provision of FAIR resources to the life science community for ST.
Nodes involved: ELIXIR Germany, ELIXIR France, ELIXIR Netherlands, ELIXIR UK
Communities: Cancer Data, Galaxy, Human Copy Number Variation, Single-Cell Omics
The ELIXIR metabolomics community relies on standards, formats and data treatment solutions development and adoption, but it remains challenging to ensure high-quality reported metadata, sufficiently contextualised results, interoperable and reusable datasets and to integrate these metabolomics data with other omics or studies.
This project is designed to address these issues and aims to connect key international standards with ELIXIR resources, as well as creating associated community guidelines and training materials.
Based on the FAIRification framework, activities in the project will: i) increase interoperability and reuse of public metabolomics datasets and workflows through enhanced and extended open data standards, resources and new semantic annotations, ii) define, ensure and establish quality control for study baselines in Metabolomics and Exposomics, and iii) facilitate metabolomic data interpretation and meta-analysis integration with multi-omics and systems biology studies.
As a first necessary step, the project will create a Semantic Metabolomics Data Model to standardise metadata, ensuring unambiguous reuse of metabolomics projects. This model will focus on integrating key ontologies, providing open training initiative and enhancing the interoperability of metabolomics data through the production of open guidelines for annotation steps. By linking with ELIXIR’s Deposition databases, ISA Framework and other services, the project seeks to boost interconnection with ELIXIR platforms, other ELIXIR communities (Systems Biology, Food and Nutrition, Galaxy, Proteomics, Toxicology, Research Data Alliance Focus Group ...), the FAIR Cookbook and BioSchemas.org communities. Project outcomes are expected to promote the emergence of ambitious and innovative semantic-based solutions for inter-comparison of studies in healthcare, clinical and plant domains.
Nodes involved: ELIXIR Czech Republic, ELIXIR Germany, ELIXIR Italy, ELIXIR Spain, ELIXIR France, ELIXIR Netherlands, ELIXIR Sweden, ELIXIR UK, EMBL-EBI
Communities: Food and Nutrition, Galaxy, Metabolomics, Proteomics, Research Data Management, Single-Cell Omics, Systems Biology, Toxicology