A beacon in the ocean of SARS-CoV-2 data

An ELIXIR Spain platform allows researchers to browse SARS-CoV-2 variability at the genome, amino acid, structural and motif level.

When responding to a health crisis, data play a critical role in identifying drug targets, vaccines or disease-related symptoms, and understanding individual variations in response to the infection. Yet the tools, standards and computational workflows to analyse these data play an even more prominent role. Today, data analysis is often multi-faceted, combining multi-omics data with clinical and even environmental data. However, many studies require a more holistic approach; for instance, looking at small amino acid variations while navigating the changes at the genomic level. 

Bioinformaticians from the Center for Genomic Regulation (CRG), based in Barcelona and part of ELIXIR Spain, have developed a platform that fills in the gaps in the SARS-CoV-2 ocean of data— the COVID-19 Viral Beacon. It provides scientists with the means for an in-depth analysis of raw and consensus COVID-19 data from several datasets: European Nucleotide Archive (ENA), Oxford Nanopore, Illumina, NCBI/SRA and GISAID.

What can the COVID-19 Viral Beacon offer?

The platform presents a one-stop-shop. It allows users to search for specific genetic variants and explore the associated metadata. For instance, it filters viral strains from a particular geographic region.

Additionally, the platform offers a friendly interface. Researchers can focus on finding an answer to their research hypothesis instead of immersing themselves into the time-consuming task of downloading, analysing and curating data or building software. Jordi Rambla, Team Leader at CRG, explained: ‘We have developed the CRG Viral Beacon platform to facilitate knowledge flow among genomic researchers, epidemiologists and amateurs. The platform is readily accessible to quickly search SARS-CoV-2 genomic information even via mobile phones.’

  • SNP query — search for single nucleotide polymorphisms, insertions or deletions.
  • Region query — search for all variants within a given position range.
  • Feature query — search for all variants in the genomic annotation.
  • Motif query — search for short motifs on genomic regions.
  • Amino Acid query — search for amino acid changes on protein regions.

A theoretical scenario

A COVID-19 patient develops severe symptoms and presents no other risk factors, such as obesity or diabetes. Researchers hypothesise that a distinct genetic variant of the virus could confer higher virulence.

However, most available databases, such as GISAID, only provide consensus sequences (the most representative ones). This approach is crucial for epidemiological studies but masks individual variations that exhibit a low proportion in the viral population of each individual.

The Viral Beacon can shed some light on this matter. It can help challenge hypothesis and unveil insights hidden in the vast amounts of COVID-19 data thanks to well-defined queries. Researchers can quickly analyse both raw and consensus data to discover variants that, for instance, could account for individual differences in cellular tropism or immune response that may lead to more severe clinical phenotypes.

The team seeks to make this scenario a reality. Babita Singh, part of the CRG team, said, ‘Now we seek additional collaborations with experts in the field to help us extend the functionalities of this tool to make Beacon a quick go-to genomic variants search tool for COVID and other infectious diseases. In future, our goal is also to include human genomic variants information to study the interplay between human and virus genomics.’

The architecture behind the platform — a GA4GH product

CRG’s Viral Beacon utilises the GA4GH product of the Beacon API, whose prime goal is to share genomic data in biomedical research without compromising privacy. ELIXIR has been a long time investor in the generation and implementation of the Beacon API; this viral Beacon provides an exciting extension of its use.

Initially, the GA4GH Beacon API facilitated the discovery of human Single Nucleotide Polymorphisms (SNP), but since its inception, the protocol has evolved towards more complex applications with increased functionality.

A few months back, CRG quickly modified and extended the architecture and functionality of this tool for SARS-CoV-2 data discovery. The ELIXIR Spain Node started compiling data from public sequence data through dedicated pipelines (read the Galaxy paper for more details). 

The resulting product, the Viral Beacon, allows the efficient discovery and analysis of SARS-CoV-2 genetic variability, and associated COVID-19 data, at an accelerated rate; a critical development in the fight against this global pandemic.

‘The sudden blow of the pandemic demanded ‘act first and think later’ approach, to develop something useful that we can readily provide to the genomics community working on COVID-19 with what we had in hand. We are still adding new functionalities to Beacon platform and, in fact, this is the right time for experts in the field to jump in and guide us to design pathogen-specific Beacon platforms for infectious diseases’, claimed Rambla.

Funding & Acknowledgements

Tue 8 September 2020