Plant Sciences services

Name Description ELIXIR Node

Apple is one of the most famous fruits globally and occupies a central position in folklore, culture, and art. Apple cultivars have retained high genetic and phenotypic diversity, evidenced by the high number of apple varieties cultivated today. The economic and cultural importance of apple has driven efforts to catalogue and exploit this genetic diversity, but few of these data are currently integrated into ELIXIR resources. We propose a data implementation study to integrate the high quality apple reference genome and its associated catalogue of genetic diversity, representing the most widely cultivated apple varieties around the world. We will use apple as a case study for managing the growing number of ‘multi-genome’ fruit projects, testing and where necessary, improving tools to streamline data import and exchange between ELIXIR supported resources, specifically BioSamples, ENA, EVA, ORCAE and Ensembl Plants.

ELIXIR Italy, ELIXIR Belgium, EMBL-EBI

Over the coming decade, Europe will face critical challenges in maintaining biodiversity, ensuring food security and combating pathogens. Our 2024–28 Programme will address these issues by mobilising and integrating molecular data, using successful coordination models from human genomics. Through strategic investments and collaboration in externally-funded projects, ELIXIR will enhance scientific services and support transnational research in these essential areas.

The following projects have been selected as part of the ELIXIR 2024–28 Programme’s Biodiversity, food security and pathogens Science tier:

ELIXIR Belgium, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR UK, EMBL-EBI, ELIXIR Italy

The ELIXIR Plant Sciences Community has been implementing a distributed infrastructure for FAIR plant genotype-phenotype data publication and access, which aims to support agronomic research and industrial development.

This infrastructure is based on a central search service, FAIDARE, which sits atop a federation of distributed data repositories across several ELIXIR Nodes, all of which implement a common web service specification, the Breeding API (BrAPI, https://brapi.org/). BrAPI ensures accessibility, and also interoperability and reusability because it implements the MIAPPE metadata standard (https://www.miappe.org/), whereas FAIDARE ensures findability.

While the need for data FAIRness and the solutions of the ELIXIR Plant Sciences Community to enable it have been gaining traction in academia, their penetration in the industry has been almost null. Our goal is that industry stakeholders not only make use of the publicly available data in ELIXIR’s infrastructure, but also can deposit their (meta)data in that infrastructure or even implement their own BrAPI endpoints. Realizing this goal requires outreach activities, to divulge ELIXIR’s plant data infrastructure and FAIR-enabling standards, and training activities on how to use the infrastructure and standards.

The Navigator Company a leading force in the international pulp and paper market and one of Portugal's strongest brands on the world stage. Its production structure is based on three major industrial sites in Cacia, Figueira da Foz and Setúbal, where the facilities set international standards for the pulp and paper industry. In addition to its industrial activities, it carries out, mostly through RAIZ Forest and Paper Institute, extensive research on Eucalyptus breeding and genetics, generating genotypic and phenotypic data on over 300,000 specimens across a range of sites and covering up to 4 generations of pedigree. This wealth of data makes The Navigator Company a prime candidate for a pilot knowledge-transfer project to enable it to draw value from and contribute to ELIXIR’s plant data infrastructure.

The goals of this project are (1) to transfer knowledge on standards for FAIR plant data access and publication (particularly BrAPI) from the ELIXIR Plant Sciences Community to the Navigator Company; (2) to collaborate with The Navigator Company in organizing its data on eucalyptus breeding according to the MIAPPE and BrAPI standards; and (3) to establish an access protocol for the Navigator Company to submit its datasets to the ELIXIR-PT BrAPI end-point in bulk.

The accomplishment of these goals will lead to the expansion of the plant datasets provided by the ELIXIR-PT BrAPI end-point to the Plant Sciences community, and more importantly, will bring a key industrial partner into the fold of FAIR plant data publication. Furthermore, due to the prominent role of The Navigator Company in Europe, we expect this project to play a key outreach role and pave the way to further collaborations with other partners in the industry.

ELIXIR Portugal

The aim of this Implementation Study is to determine the requirements for validation with ELIXIR partners, to build prototype open validation services for archetype archival databases and knowledge bases, in particular:

  • Content validation according to minimum information checklists.
  • Syntactic format validation according to a standard format in conjunction with the GA4GH file formats team as part of the Large Scale Genomics Workstream.
  • Syntactic format validation for Phenotyping data.
  • Semantic validation according to a publicly available ontology.
ELIXIR Belgium, ELIXIR France, EMBL-EBI, ELIXIR UK

With the declining cost of genome sequencing, the focus of plant researchers is shifting towards characterising the wide genomic diversity present within a species. Crop pan-genomes consist of the sequencing, comparison and integration of multiple different genomes from the same agriculturally important species such as wheat, rice and potatoes. Exploiting the information encoded within these pan-genomes can lead to the development of new cultivars more resilient to upcoming challenges like increased drought and heat stress. 

Multiple consortia are independently generating and integrating these pan-genomes, but there is currently little progress in streamlining and homogenising these efforts. While sequence quality is no longer a major issue, the completeness of both assembly and subsequent gene annotation are much harder to correctly quantify, while being the major drivers in explaining the adaptive differences between genotypes. Where there are efforts to visualise and browse pan-genomes, for example by using graph representations, the easy retrieval of gene Presence Absence Variation information or structural rearrangements is currently lacking, hampering knowledge learning. 

E-PAN aims to streamline the efforts of different research groups within the ELIXIR Plant Science Community. This encompasses the development of effective standards, computational pipelines and tutorials to assess the quality of pan-genomes and provide solutions to identified problems. We will also evaluate and integrate different approaches for data visualisation and browsing, which will be used by different partners sharing pan-genomics results. A one-day meeting and an online workshop will be organised to disseminate results and initiate new collaborative projects. These concerted efforts will lead to a standardised approach to be used in future pan-genome projects, a reduction in duplication efforts across consortia, and a set of tools to visualise and mine pan-genomics results.

Project objectives

The adaptive differences between genotypes of the same species can only be explained by exploring genomic diversity through pan-genomics. The sequencing and assembly of pan-genomes no longer pose significant scientific challenges, but the subsequent data integration and exploration is hampered by a lack of standards and tools. For example, a simple comparison of the sequences of 44 potato genomes would not lead to any significant insights. However, the investigation of gene Presence Absence Variation (PAV) and structural rearrangements highlighted loci of interest to link genotypic with phenotypic diversity. 

The global objective of the E-PAN project is to accelerate the use of pan-genome data through advances in data quality control, curation, integration, and visualization. These objectives can be placed on different axes, which can be progressed in parallel: 

Methods for ensuring high-quality pan-genome data. In order to ascertain that PAVs are biological and not technical artifacts, the need for quality control (QC) and standardization is clear. Solely using BUSCO for estimating genome and gene space completeness of pan-genomes will miss nearly all intra-species differentiation. Therefore new standards and methodologies, benchmarked against real data from a broad phylogenetic species selection, are required. This will result in new guidelines and tutorials for pan-genome QC.

Computation and standardised reporting of gene-based PAVs and structural variants. Apart from implementing efficient and scalable algorithms to quantify different types of structural variants, the obtained results will be shared in a standardised manner between the different plant resources, resulting in better access to pan-genomics results for diverse plant species.

Visualisation and integration of pan-genomes and derived results. Current approaches to visualising pan-genomes do not automatically lead to easily-interpretable results using the current set of data integration tools. Therefore, we aim to evaluate and prototype a set of visualisation modules that can be (re)used in multiple plant databases. These will connect complementary resources and enhance the extraction of biological knowledge for key agricultural species.

Based on the expertise of all partners involved and the numerous interactions with different stakeholders involved in data generation, tool development or adaptation as well as  biological interpretation of plant genome information, these objectives will be addressed in multiple work packages.

Project outcomes

Quality control standardisation

The E-PAN project will create standards for the Quality Control (QC) of pan-genomes, and provide reference implementations that adhere to these standards. 
Additional standardisation proposals for working with pan-genomes will also be provided, such

Pan-genomics PAV characterisation and visualisation

The E-PAN project will develop and implement algorithms for the characterisation of pan-genome dynamics at the gene level. A direct follow-up to detection of PAVs consists of visualization templates for further inspection of PAVs by researchers. 

Sharing pan-genome information

Development of standards for data exchange, and prototyping of said data formats, will lead to improvements in pan-genome research, as results and data can now more easily be shared and interpreted by multiple platforms.

Co-leads

Klaas Vandepoele, Sebastian Beier, Uwe Scholz, Keywan Hassani-Pak

ELIXIR Belgium, ELIXIR Germany, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR UK
ELIXIR Belgium, ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Estonia, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Ireland, ELIXIR Israel, ELIXIR Italy, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI
ELIXIR Belgium, ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Israel, ELIXIR Italy, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI
ELIXIR Belgium, ELIXIR Netherlands

ELIXIR is about integration of diverse resources including tools, training materials and technical services. Within EXCELERATE, ELIXIR is building portals to collate information on tools and data services (bio.tools), training events and material (TeSS, WP11 e-learning environment), compute resources (WP4 technical service registry) and cross-linked policy, standards and databases (FAIRsharing, WP4). A focus of EXCELERATE is to set up these portals such that they can interoperate.

Currently, a scientist can use TeSS to find training events and materials and then, in a separate search, use bio.tools to find relevant tools, and FAIRsharing to find standards and databases. At the moment these ELIXIR portals provide a useful, but fragmented service.  Ideally, linking TeSS and bio.tools to ELIXIR’s computer resources via common workflow diagrams would enable end-users to discover and learn about the prevalent bioinformatics workflows. In this implementation study, we want to achieve the first step and link TeSS and bio.tools via most prevalent bioinformatics workflows and lay the foundation to later incorporate other ELIXIR platforms, such as the compute resources, to provide an even more useful service for the researcher.

The goal of this implementation study is to provide the life-scientist end-user with a powerful tool to find and use ELIXIR resources - across the spectrum - based on intuitive graphical diagrams of the most prevalent scientific workflows.

ELIXIR UK, ELIXIR Estonia, ELIXIR Belgium, ELIXIR Denmark, ELIXIR Switzerland, EMBL-EBI, ELIXIR Norway, ELIXIR France

Over the past four years, the ELIXIR Plant Sciences Community has been making large strides towards enabling FAIR plant phenotyping data: the ELIXIR plant data search service FAIDARE (https://urgi.versailles.inra.fr/faidare/) addresses findability by integrating the various BrAPI (https://brapi.org) end-points of ELIXIR Nodes, which address accessibility and ensure compliance with the MIAPPE metadata standard (https://www.miappe.org/) and therefore interoperability.

Combined, these resources represent a fully FAIR-compliant data management framework. However, there is one final critical hurdle impeding its broad adoption by plant scientists: there is no standardized user-friendly way to submit a dataset to a BrAPI end-point (or more precisely the database underlying it).

The goal of this project is to develop a web interface for MIAPPE-compliant data submission that can be deployed by any plant phenotyping database.

This interface will be modular, including an interactive web form for metadata entry mirroring the organization of MIAPPE, web services for key functionalities such as ontology lookup and validation of MIAPPE compliance, and modules for database entry that upload the data to a database. For the latter, we will develop a module that makes use of BrAPI PUT calls to upload data directly through a BrAPI endpoint, but also a module for uploading the data through FAIRDOM’s SEEK platform, which is already being deployed by some partners.

The project is expected to substantially benefit the sustainability and increase the adoption of the data management framework put together by the ELIXIR Plant Sciences Community, which includes key ELIXIR services such as FAIDARE. Furthermore, the project will both build capacity and increase collaboration between ELIXIR Nodes on data management, and enhance interactions between Node experts contributing to the Plant Sciences Community.

ELIXIR Portugal, ELIXIR Netherlands, ELIXIR France, ELIXIR Belgium

Recent progress in sequencing technologies has produced several large scale genotyping data sets for crops. The insights afforded by this data have been published in high profile scientific articles, but the underlying raw genotype data and the associated sample and population metadata have not been routinely submitted to appropriate archives.

The aim of this implementation study, led by the ELIXIR Plant Community and in coordination with the ELIXIR Interoperability Platform and Data Platform, is to provide this wealth of data according to FAIR principles. It will ensure an interoperable link with the phenotypic data that is stored in distributed institutional repositories which is crucial for excelerated crop breeding.

We propose to create a sustainable toolbox to submit data to the ELIXIR Deposition Database “European Variation Archive” (EVA) and enrich the data with interoperable metadata regarding plant data standards like “Multi-Crop Passport Descriptor” (MCPD) and “Minimum Information About a Plant Phenotyping Experiment” (MIAPPE).

ELIXIR France, ELIXIR Germany, ELIXIR Belgium, ELIXIR Netherlands, EMBL-EBI
ELIXIR France

The standardisation and accessibility of plant data is a major challenge for agricultural research. MIAPPE, which was developed as part of the transPLANT and ELIXIR-EXCELERATE projects, has made a decisive contribution to unifying data capturing. Also, the FONDUE Implementation Study facilitated the integration of phenotypic and genotypic data. 

Nevertheless, challenges persist in achieving full FAIRness of plant data. The development of guidelines and best practice documents within the Commissioned Service INCREASING has improved this. However, further enhancements are required, such as providing additional documentation and reference datasets. 

To address these needs, it is important to assess the practical effort required to FAIRify datasets using MIAPPE, ISA, ARC and RO-Crate standards. The idea is to provide biologist-friendly data documentation and at the same time  introduce machine-actionable formats for bioinformaticians to use. A further challenge arises from the scattered nature of the information, as there is no single resource on which all the information is collated. 

In HARVEST, we aim to address these challenges by FAIRifying datasets (DROPS, AGENT) using the latest version of MIAPPE as a basis, which now covers more diverse and complex use cases. This process will include enriching the MIAPPE documentation in particular with example datasets, updating training material and refining mappings to other interoperable formats such as BrAPI, Bioschemas and ISA-Tab/JSON. We will also establish links using FAIDARE to repositories such as EMBL-EBI EVA, e!DAL-PGP, recherche.data.gouv and Zenodo, to enhance data sharing and reuse opportunities. An extension of the RDMkit Plant Sciences pages will be implemented to serve as a primary hub for information on FAIRification of plant data. Furthermore, we will be consolidating resources and improving accessibility through direct linking to the original web resources and recipes, also adding Jupyter notebooks to the FAIR Cookbook where possible.

Co-lead

Sebastian Beier

ELIXIR Germany, ELIXIR France, ELIXIR Netherlands, ELIXIR UK, EMBL-EBI

The Plant Sciences Community has already implemented some critical elements of its roadmap, but needs some funding to coordinate the next steps.

The first point is about disseminating ELIXIR results through reusable training material and service bundles. The target audience will be biologists, agronomists and bioinformaticians involved in data production and analysis.

The second point is about improving data findability. We propose to specify a European one-stop portal giving access to plant data and tools in collaboration with the data platform. It will leverage existing ELIXIR resources: aggregation databases dedicated to plants (FAIDARE, Ensembl Plant, InterMine) and tools and standards collections (FAIRsharing, bio.tools).

Last, the gathering and formatting of data and metadata increasingly relies on community driven toolboxes such as FAIRDOM/Seek, COPO and ISA-Tools. There is an opportunity to improve their interoperability through aligned validation profiles and API use, hence easing submission to ELIXIR Databases.

ELIXIR France, ELIXIR Belgium, ELIXIR Germany, EMBL-EBI, ELIXIR Greece, ELIXIR Italy, ELIXIR Netherlands, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR UK
ELIXIR France