ELIXIR Data Platform (2024-26)

Work plan overview

The ELIXIR Data Platform has been successful in creating a vibrant group of people interested in all aspects of life science data, from generation and curation to storage, archiving, use and reuse.

It has achieved its first mission and defined a sustainable and well-funded collection of Core Data Resources (CDRs) that represent the gold standard on the world stage. It has more recently begun developing and defining tools to boost data integration and curation capacities and helped grow a coordinated ecosystem of Node data resources within ELIXIR.

As the demand for data, its storage and interpretation continues to grow and change, it is appropriate for the Data Platform to collectively adapt and look to the next chapter to address the technical and societal challenges of users and life sciences researchers in Europe.

Looking to the future, the Data Platform will focus on three key themes:

  1. Strengthening data connectivity and accessibility through brokering;
  2. Recognition and credit attribution for contributors and
  3. FAIRification of existing data.

It will also work to support a greater diversity of data resources by defining their context, quality and unique challenges within the ELIXIR framework.

Key to achieving this, is its ability to leverage the robust network of ELIXIR members — reachable through Platforms, Communities and Focus Groups — to connect data users, contributors and resources across Nodes while fostering collaborative discussions and leveraging technological advancements.

The Platform will deliver the services to support data resources in the life sciences through five complementary Work Packages (WPs):

Lead Partners:  SIB (Switzerland), HITS (Germany), University of Padua (Italy) 

The objective of the ELIXIR Data Platform Executive Committee (ExCo) is to manage and coordinate activities throughout the 2024-2026 work plan, ensuring smooth operation and collaboration within the Platform.

  • Monthly Coordination Calls will be organized by the ExCo to promote effective communication and collaboration among members, facilitating timely information exchange and progress updates.
  • An annual face-to-face meeting will also be arranged by the ExCo, offering a platform for detailed discussions, strategic planning, and decision-making to advance the Platform's objectives.
  • Joint Platform Activities will be facilitated by the ExCo to encourage collaboration and knowledge sharing among members, potentially combining with regular face-to-face interactions.
  • Oversight and Responsibility for implementing the work plan, monitoring progress, and ensuring timely objective completion will rest with the ExCo, holding monthly exclusive meetings for this purpose.
  • Regular reporting and progress updates will be provided by the ExCo to ELIXIR governance bodies and relevant stakeholders, promoting transparency and accountability.
  • The ExCo will actively engage with external partners and organisations for collaboration and resource-sharing, to enhance the Platform's capabilities and impact. The ExCo will evaluate Platform activities' outcomes, identifying areas for improvement and proposing necessary adjustments to enhance operational efficiency and effectiveness. 

Activity 1: Face-to-Face Meetings: Alternating between hosts and jointly with other Platforms

This activity involves organising an annual face-to-face meeting for the ELIXIR Data Platform members in loose collaboration with one of the other ELIXIR Platforms. The meetings will rotate among the ExCo and their member states as hosts.

The purpose of these meetings is to provide a dedicated platform for in-depth discussions, strategic planning, and decision-making. The ExCo will collaborate with the Hub to plan and execute the meeting effectively. The agenda will be designed to address key topics related to the Platform goals, initiatives, progress, challenges, and opportunities.

The meeting will also provide a unique opportunity for Platform members to interact in person, fostering stronger working relationships and promoting a sense of unity within the Platform and across ELIXIR. The ExCo will ensure that the meeting outcomes are documented and disseminated.

Activity 2: Monthly virtual meetings

The monthly teleconferences (TC) or virtual meetings are a crucial element of the coordination and collaboration strategy of the ELIXIR Data Platform. These regular virtual meetings will provide a forum for effective communication, information exchange and activity among Platform members. The ExCo will oversee the scheduling of these TCs and develop an agenda for each.

Lead Partners: EMBL-EBI (international organisation), VIB (Belgium)

Activity 1: Landscape of data brokering

Data brokering is the act of submitting data to one or more databases on behalf of another person/institute. It involves the coordination between data producers in an institution of an ELIXIR Node, which provides data management support, and deposition databases like the ELIXIR Deposition Database (EDD).

Moreover, data producers might use Platforms or technical services (e.g. brokerage platforms) provided by the Node to collect and prepare (meta)data for publication into databases. Often, multi-omics studies would also require references among several datasets published in different databases to preserve the relations among data (data linkage). For any of these use cases, connections are based on data exchange formats and the mapping of metadata schemas.

This task will build on and continue the work from the ELIXIR-CONVERGE project T1.2 on “Models for brokering data to ELIXIR Deposition Databases”, and in particular on multi-omics studies. The focus will be on the definition of “brokering” and landscape analysis of ELIXIR data resources and their existing requirements for data linkage and submission of multi-omics studies.

Activity 2: Defining best practices for data brokering

Data brokering can be done on different levels of operational complexity using different data exchange formats and metadata standards. In this task best practices and guidelines for data brokering will be defined to support repositories in developing or improving data submission and data linkage.

Activity 3: Implementation use cases of best practices

The implementation of best practices activity will build on the work of activity 2 (WP2.2) and the three data brokering scenarios analysed in the ELIXIR-CONVERGE project (WP1 T1.2), including the results from the BioHackathon 2022 project 27.

The work will initially explore and further develop solutions based on the ISA abstract model and its implementation as ISA-JSON, which were used in ELIXIR-CONVERGE for brokering multi-omics studies. Then, the feasibility of extending the same approach to the other scenarios will be investigated.

Following the work of activity 1 (WP2.1) and activity 2 (WP2.2), additional use cases will be selected to explore how the ISA-based solution can be expanded to support the proposed best practices for a wider range of domains/techniques and corresponding ELIXIR deposition databases. The activity will also evaluate a selection of complementary technologies (including RO-Crate) to develop a high-level technical proof of concept showcasing the feasibility and functionality of at least one of the representative use cases.

Lead Partners: University of Padua (Italy)

This WP will support the recognition and credit attribution of activities which curate, annotate or otherwise contribute to the increase of data in relevant resources such as knowledge bases. Further, it will leverage the work of APICURON to provide an infrastructural component to use in different contexts beyond more traditional biocuration activities (e.g. knowledge bases), ranging from ELIXIR registries (e.g. bio.tools, RDMkit, FAIR Cookbook, FAIRsharing), to code contributions (e.g. in GitHub) and data management/stewardship (e.g. data brokering).

The resulting service will link the mapped data citations and credits for primary contributions to citation networks (e.g. OpenAIRE KG). Moreover, it will explore the possibility to generalise the approach and employ it in other contexts such as crediting trainers in Training Platform activities. In addition to the technological component, it will explore the “sociological” implications for the “person infrastructure”, including career pathways, fostering adoption and user engagement (e.g. with gamification techniques). 

Activity 1: Technical developments for recognition 

This task will continue the technical developments for implementing recognition and credit attribution mechanisms. The APICURON platform, currently restricted to curated databases, will be expanded to cover a wider range of activities, based on user input. On-boarding of additional ELIXIR resources will lead to technical changes, as well as requests for additional features supported by this task.

A major step will be the establishment of a prototype mechanism for harvesting GitHub contributions on a small subset of selected profiles, e.g., training materials developed using the GitHub template established by the Training Platform. The visualisation of additional statistics for curators both on the APICURON website and via widgets for third-party websites will be supported based on the outputs of WP3.2. In addition, work will be carried out to create appropriate Bioschemas entities for information in APICURON in order to connect to the OpenAIRE knowledge-graph.

Activity 2: Engagement and gamification for recognition

This task focuses on the sociological aspects related to recognition and credit attribution. It will define strategies for implementing gamification effectively, through the evaluation of use cases and establishment of guidelines.

The work will span two orthogonal views, from the resources implementing recognition and credit attribution mechanisms (resource view e.g. PomBase) as well as the individual contributors (people view, e.g. curators).

The resource view will help define best practices for the granularity of events to be captured in recognition and how to attribute credit for them. The people view will instead focus on how to promote engagement via gamification, i.e. how to build meaningful statistics based on the captured events.

Activity 3: Outreach and engagement with other initiatives

This task focuses on promoting alignment with and uptake of the recognition and credit infrastructure by other initiatives, both within and outside ELIXIR (e.g. Bionomia, LifeWatch). It will create opportunities for Community engagement by presenting the recognition work at different ELIXIR meetings (e.g. All Hands Meeting, BioHackathon), and within Platforms e.g. by aligning with related work in the contemporary activities of the Training Platform.

It will also provide the necessary coordination with relevant stakeholders for alternative career assessment, including EU projects (such as STEERS, EVERSE and GraspOS) and the EOSC task force on research careers.

Lead Partners: SIB (Switzerland)

The backlog of supplementary data attached to published scientific reports, as well as generalist deposited contents, the so-called long tail of data, is a potential goldmine for research. Unfortunately, these data are buried in the contents of millions of semantically poor files. This long tail of data needs FAIR-ification using automatic methods.

Activity 1: Coordination of literature curation challenges and practices

The primary channel of communication in life science is and remains the literature. However, practices and standards for literature publication are evolving and a growing set of publications contains structured dataset descriptions (e.g., DOME) and links to specialised data repositories or general ones (e.g., Zenodo).

Further, and although CDR and Community Databases may exhibit very heterogeneous biological interests, we aim at defining some minimal data typology shared by all data resources: paper- vs. passage-centric curation, continuous vs. session-driven curation efforts, role of supplementary data or generalist repositories in curation guidelines, and named entities as in EuropePMC/SIBiLS annotations.

Activity 2: Turning the long tail of literature and supplementary data into FAIR digital objects

This subtask aims to complement the transformational efforts focusing on the FAIR-ification of literature using semantic web technologies. The idea is to leverage on-going efforts (e.g., PDF2JATS, RO-Crate, DOME, Zenodo) to both improve FAIR archiving standards and to explore how such formats can be discovered through an index or a Knowledge Graph.

First, we plan to establish a communication channel with related European initiatives. We will leverage EU research infrastructure projects, such as BiCIKL or FAIRClinical (ELIXIR-LU, CH, FR, UK), to coordinate efforts in and outside ELIXIR with lead stakeholders (e.g., CERN, LifeWatch, GBIF, GBC).

Second, the task will explore how these digital objects can be made available for discovery.

Activity 3: Accessing traceable author statements from curated databases

ELIXIR Core Data Resources and Community Databases tend to cross-reference articles to provide their end-users with access to the source of their knowledge. Unfortunately, the unit of evidence is generally the article, which may mean up to 20-30 pages of PDF.

Such granularity is often not sufficient to efficiently access an explicit traceable author statement. However, some databases and communities (e.g. System biology, Rare diseases, Biodiversity) propose to record evidence at a finer granularity. In particular GeneRiFs (Gene Reference Into Functions) and MINT/IntAct can track evidence at the level of a unique sentence. The same applies to biotic interactions.

Based on a sample of GBC, CDR and CDB (e.g., DisProt, CelloSaurus, MINT/IntAct, OLIDA), and together with WP3.1, we propose to explore how published evidence could be better captured, cross-referenced and displayed. Such a curation model will leverage methods to uniquely identify both sentences and sections in articles (e.g., Europe PMC SciLite, Biocuration Toolkit); thus evolving article and supplementary data representation standards such as JATS and BioC.

Lead Partners: SIB (Switzerland), University of Padua (Italy)

Activity 1: Leverage the interactions with the Global Biodata Coalition to support the CDR and outreach

The aim of this task is the coordination of the interactions between ELIXIR Data Platform and Global Biodata Coalition. It also includes established processes of assessment for granting CDR/EDD status and Periodic Review of existing resources.

Activity 2: Establish guidelines for Community Database identification and monitoring

This task aims to establish a robust and simple procedure for the identification and badging of ELIXIR Community Databases. This entails landscape analysis, covering all ELIXIR Databases, based on Nodes’ service delivery plans (SDPs) to establish simple minimum quality criteria for Community Databases. The identified criteria should be a checklist that does not require additional discussion or complex assessment to be applied. At the end of the process, a first iteration of the ELIXIR Community Database badge should be awarded to all data resources meeting the criteria.

Activity 3: Apply and adapt monitoring methodology for indicators

The current criteria to define ELIXIR CDR include various bibliometric indicators. Such statistical indicators include the counting of mentions, citations and database accession numbers (e.g., identifiers.org). This acitvity will produce recommendations to establish an improved list of Key Performance Indicators (KPIs) to support the evaluation of ELIXIR databases.