New tool simplifies the submission of SARS-CoV-2 data to open databases

ENA page for submitting data with coronavirus icon

ELIXIR Belgium and ELIXIR Germany (de.NBI) help researchers share FAIR COVID-19 data

ELIXIR Belgium, in collaboration with ELIXIR Germany and the European COVID-19 Data Platform, have developed a tool to simplify the submission of viral sequencing data to the European Nucleotide Archive (ENA), an ELIXIR Core Data Resource providing open access to nucleotide sequences. The new submission tool offers an easy-to-use interface, guides researchers through the submission process and verifies the data format and description.

Why submit data to ENA?

The research community has undertaken an unprecedented effort to study the novel coronavirus and its disease, COVID-19. Thanks to this coordinated effort, there is now an abundance of SARS-CoV-2 genomes freely accessible through multiple international databases.

However, some databases, such as GISAID, overlook the underlying raw data and only provide controlled access to consensus pre-assembled genomes. ENA provides a clear advantage to this restriction — it gives access to both assemblies and raw data. The latter is not only essential to reproduce findings but also to repurpose data to address new scientific questions.

For instance, when viruses such as SARS-CoV-2 replicate their genome, mutations can occur, giving rise to genome variants within an individual host. These mutations are rare, so although these can be identified in the raw data, they are not represented in the consensus pre-assembled genomes. This process is important to understand as it is a major driving force behind the evolution of the virus and its disease.

What challenges does the new submission tool overcome?

ENA is a truly open system that stores detailed information about each deposited sequence. But uploading data to ENA can be a challenge for two reasons: a lack of bioinformatics skills and privacy requirements.

Firstly, when submitting data, researchers may be confronted with the restriction of having to learn bioinformatics skills, and lack the time to do so during a pandemic. Frederik Coppens, ELIXIR Belgium Head of Node and Lead of the ENA submission, explains:

'Uploading data to ENA requires a high level of computational ability and can be time-consuming, especially for wet-lab scientists who have little or no bioinformatics background. Given these challenges, we wanted to develop a tool that would enable all researchers to easily share data. We joined forces with the Galaxy Europe team and made use of the graphical user interface in Galaxy; this simplifies and streamlines the submission process for the end-user.'

Secondly, all virus data uploaded to ENA must be clear of human sequences — ensuring the privacy of patient sensitive data. By using the Galaxy platform, the ENA submission tool removes human sequences thanks to the implementation of a pre-processing workflow developed by Rob Finn’s team at the European Bioinformatics Institute. Hence, enabling and encouraging more researchers to submit their data to ENA without the fear of exposing sensitive human data.

Who can use the new tool?

The ENA submission tool is available to all researchers interested in submitting their data to ENA.

ELIXIR Belgium also offers scientists in Belgium personalised support. Frederik Coppens adds:

'Prior to developing the ENA submission tool, there were over 1200 SARS-CoV-2 genome submissions from Belgium, but none were deposited to ENA. We are currently reaching out to researchers within Belgium that have already deposited SARS-CoV-2 genomes to databases to offer our support in submitting their data to ENA.'

What does the future hold?

Moving forward, ELIXIR Belgium plans to expand the tool to multiple databases and improve the user experience. The current submission tool and future iterations will support the ELIXIR open data infrastructure and facilitate sharing and exchange of life science data globally.

This work has been developed by ELIXIR Belgium (VIB-UGent Center for Plant Systems Biology), Galaxy Europe (funded by de.NBI) and EBI, and has been supported by:

  • Research Foundation — Flanders (FWO) for ELIXIR Belgium (I002819N)
  • The European Union’s Horizon 2020 research and innovation programme under grant agreement No 824087 (EOSC-Life)
  • The European Union’s Horizon 2020 research and innovation programme under grant agreement No 871075 (ELIXIR-CONVERGE)
  • German Federal Ministry of Education and Research (BMBF) for de.NBI (031 L0101C)
Tue 17 November 2020