ELIXIR-CONVERGE WP8 will provide the technical and operational components of the European COVID-19 Data Platform. These components will serve to strengthen national operations around COVID-19 data - on both viral and human data sides - through the ELIXIR Nodes.
The work will include the assembly of existing elements of informatics infrastructure, from core EMBL-EBI infrastructure and pathogen-focused tools built under the COMPARE project, extension and enhancement of these elements and their operationalisation in both the COVID-19 Data Portal and SARS-CoV-2 Data Hubs.
|O8.1||Data management support for EU projects||Task 8.1|
|O8.2||Mobilisation of analysis upon SARS-CoV-2 sequence data||Task 8.2|
|O8.3||Enhanced access to data, tools and support||Task 8.3|
Task 8.1 Data management support
We will provide support for data management for EU projects.
Sub-task 8.1.1: This subtask will ensure appropriate availability of data and tools from the COVID-19 Portal from EU research projects in which EMBL-EBI is a partner (other than VEO and ReCoDID, as these are covered in their respective projects). These projects include (aligned EU projects shown in italics):
- deeply mined COVID-19 related literature, from Europe PMC. Development work on this will include COVID-19 specific text mining as an EMBL-EBI contribution to the OpenAIRE initiative (OpenAIRE);
- compound screening and assay data relating to ongoing COVID-19 related work (EUbOPEN and eTRANSAFE);
- chemoinformatics tools that will assist with the integration of compound-related COVID-19 data (EU-ToxRisk, EUbOPEN and TransQST);
- access to tools and interfaces (such as metadata validation and discovery tools) relating to clinical and epidemiological data (CINECA).
Sub-task 8.1.2: This task will develop and implement data management plans for the other EC-funded Horizon 2020 COVID-19 projects. The projects will be those awarded under SC1-PHE-CORONAVIRUS-2020, which collect primary clinical-epidemiological or omics data types or biospecimens. We will offer direct support for the implementation of these data management plans, such as through the curation of study metadata and omics data to internationally accepted standards for given data types, and the provision of tools and utilities for local data transformation and validation.
Sub-task 8.1.3: This task will extend support to further COVID-19-related EU projects as these emerge. We will gather, develop and steer best practice in COVID-19 data management and bring this into the Data Managers Network and overall data management best practice developed within ELIXIR-CONVERGE WP1.
Task 8.2 Mobilise analysis
We will deploy the data processing and visualisation components of the SARS-CoV-2 Data Hubs system for the purposes of mobilising viral sequence data analysis at scale. The SARS-CoV-2 Data Hubs form one of the three components of the European COVID-19 Data Platform, and are built upon the foundations of the existing EMBL-EBI infrastructure. This includes the European Nucleotide Archive (ENA), the open sequence database of record and European node of the International Nucleotide Sequence Database Collaboration (INSDC), and its services and those elements put in place as an extension to this infrastructure under the EU COMPARE project, known as “COMPARE Data Hubs”. Under other funding (EOSC-Life), multiple SARS-CoV-2 Data Hubs will be established for institutions, collaborating groups and nations. These will support the sharing of sequence data. We will:
- provide access to computational analysis workflows as required. These will provide processing according to sequencing library type, such as to call variations, provide assembled sequences, provide coding feature annotation or provide phylogenetics analysis.
- source appropriate workflows from the open software community, adapting and extending them for the SARS-CoV-2 cloud computational environment as required, or writing them where suitable workflows do not exist. Choice of workflows will be guided by community requirements from EU, EMBL and ELIXIR Member States - in particular via the ELIXIR Nodes - and will include such workflows as those of the ARTICnetwork (https://artic.network/).
- provide intuitive web data exploration and visualisation environments.
The VEO project, which has a defined and bounded set of SARS-CoV-2 analyses, will have its own analysis mobilisation task. While available to users alongside ELIXIR-CONVERGE analyses, development work relating to VEO SARS-CoV-2 analyses will be funded exclusively by the VEO project.
Task 8.3 Enhance access
We will extend and enhance the access points for data in the system, providing tools and support, for example, for automated synchronisation with all data in the platform into external computational facilities.
We will work with our many networks to enable data flow from the system and the connection of new third-party tools and interfaces. Drawing from and extending the tools and services offered via ELIXIR’s Tools, Interoperability and Compute Platforms, we will, for example, enable data and workflow access from within Galaxy, present data in compliance with BioSchemas.org and synchronise data into cloud instances. Under the ELIXIR Data Platform, the COVID-19 pandemic will allow us to present and develop ELIXIR Core Data Resources as key components in infectious disease research into the future.
|D8.1||Access to EU tools and project data.
Systems to capture and maintain user tools and datasets internal and external to EMBL-EBI’s molecular data resources using the ELIXIR bio.tools registry and the EMBL-EBI BioStudies database.
|D8.2||Workshop to explore and plan the inclusion of tools and data from projects that have diverted towards COVID-19 research
Workshop to explore and plan inclusion of tools and data from projects that have diverted towards COVID-19 research.
|D8.3||Raw sequence data processing workflow in operation
Systematic and autonomous processing and analysis of incoming data with preliminary results presentation.
|D8.4||Phylogenetic tools and enhanced results visualisation
Improved navigation and visualisation tools for systematic processing and analysis, including phylogenetic trees.
|D8.5||Data access points and tools
New and enhanced data access points to SARS-CoV-2 data supporting such operations as full data synchronisation, accessibility from external cloud compute and embedding in third party tools.