The Galaxy Community and ELIXIR organise a webinar series to demonstrate how open software and public research infrastructures can be used in analysing and publishing SARS-CoV2 data.
In a series of five webinar sessions, experts from ELIXIR and the Galaxy community in the US and Europe demonstrated how open access and open science are fundamental for fast and efficient response to public health crises. The focus was on research reproducibility and transparency, using exclusively open source tools and the Galaxy platform.
The goal of the series is to demonstrate publicly accessible infrastructure and workflows for SARS-CoV-2 data analyses. The webinar sessions will guide participants step-by-step through setting up and executing the SARS-CoV-2 data analyses workflows developed by the global Galaxy community. After completing the series, participants will be able to fully reproduce the workflows and conduct their own analyses of SARS-CoV-2 data.
The webinar series took place between 30 April and 28 May 2020.
More information about Galaxy analyses of COVID-19 data: covid19.galaxyproject.org
Programme
Session 1: Introduction to Galaxy and the Galaxy workflows for SARS-CoV-2 data analysis
30 April 2020, 17.00-18.00 CEST (starts at 16.00 BST, 11.00 EDT, 8.00 PDT)
The first session introduced the Galaxy platform and other public research infrastructure to be used throughout the webinar series. It also explained the motivation behind the Galaxy COVID-19 projects and explained the benefits of open reproducible research and transparent and interoperable analytics.
Speakers:
- Anton Nekrutenko, Professor of Biochemistry and Molecular Biology at Penn State University (USA) and co-founder of the Galaxy Project.
- Sergei Pond, Professor of Biology, Institute for Genomics and Evolutionary Medicine, Temple University, US
- Frederik Coppens, Head of Node of ELIXIR Belgium and Co-Lead of the ELIXIR Galaxy Community
- Björn Grüning, Technical Coordinator of ELIXIR Germany and Co-Lead of the ELIXIR Galaxy Community
Session 2: Genomics/Variant Calling
7 May 2020, 17.00-18.00 CEST (starts at 16.00 BST, 11.00 EDT, 8.00 PDT)
The second session will present the initial analysis of the SARS-CoV-2 genome, published on bioRxiv. It will guide the participants through accessing and collecting the available datasets, the genome assembly and the analysis of the within-sample sequence variants. It will also explain how to deploy on a Galaxy instance all the tools and workflows needed to reproduce the analysis.
Speakers:
- Anton Nekrutenko, Professor of Biochemistry and Molecular Biology at Penn State University (USA) and co-founder of the Galaxy Project.
- Wolfgang Maier, PostDoc at the University of Freiburg and member of the European Galaxy team
- Marius van den Beek, Institut Curie, Paris
Session 3: Cheminformatics: Screening of the main protease
14 May 2020, 17.00-18.00 CEST (starts at 16.00 BST, 11.00 EDT, 8.00 PDT)
This session presented the Galaxy workflow to identify candidate molecules for COVID-19 drug treatment, using molecular docking simulation of the SARS-CoV-2 main protease. These simulations are used to predict the binding positions of the candidate molecules in the protease binding site, score the quality of each pose, and compare the results with experimental crystallographic data.
The computationally intensive workflow was executed through a distributed compute network available via the Galaxy Europe platform. The webinar will present methods and workflows for the identification of potential COVID-19 drug candidates. Special emphasis will be given to the complex methods that have been applied and that have consumed more than 25 years of CPU and GPU time.
Speakers:
- Tim Dudgeon, Founder and CEO of Informatics Matters
- Simon Bray, PhD at the University of Freiburg and member of the European Galaxy team
Session 4: Evolution of the Virus
20 May 2020, 17.00-18.00 CEST (starts at 16.00 BST, 11.00 EDT, 8.00 PDT)
Speakers:
- Sergei Pond, Professor of Biology, Institute for Genomics and Evolutionary Medicine, Temple University, US
Session 5: Behind the scenes: Global Open Infrastructures at work
28 May 2020, 17.30-18.30 CEST (starts at 16.30 BST, 11.30 EDT, 8.30 PDT)
This session presented the Pulsar network that connects data centres and High Performance Computing clusters to share their computation power in support of the Galaxy users and provide examples of how to submit an analysis job from the user’s perspective.
Speakers:
- Gianmauro Cuccuru, University of Freiburg, Germany, member of the European Galaxy team
- Marco Antonio Tangaro, The Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Bari, Italy
- Simon Gladman, University of Melbourne, Australia
- Nate Coraor, Pennsylvania State University, USA
- Frederik Coppens, Head of Node of ELIXIR Belgium and Co-Lead of the ELIXIR Galaxy Community
- Björn Grüning, Technical Coordinator of ELIXIR Germany and Co-Lead of the ELIXIR Galaxy Community
Acknowledgement
The analyses have been performed using the Galaxy platform and open source tools from BioConda. Tools were run using XSEDE resources maintained by the Texas Advanced Computing Center (TACC ), Pittsburgh Supercomputing Center (PSC), and Indiana University in the U.S., de.NBI, VSC cloud resources and IFB cluster resources on the European side, STFC-IRIS at the Diamond Light Source, and ARDC cloud resources in Australia.