Galaxy is an open, web-based platform for computational biomedical research. It allows researchers without programming experience to run data analysis workflows on their data, share their analyses with others, and enable others to repeat the same analysis. This makes science reproducible, it facilitates sharing of data and results, and removes the need for users to compile and install software tools.
ELIXIR's Galaxy Community evolved from its Galaxy Working Group, which was established in 2015 to monitor and foster the use of Galaxy in ELIXIR. The overall goal of the community is to build on the achievements of the Working Group and to foster the Galaxy community, to make it easier to import data into Galaxy instances, to help develop and share Galaxy tools and workflows, and to increase the provision of Galaxy training. This group was developed through an Implementation Study (Starting Sept. 2018 for 12 months.)
Goals of the Community
The Elixir Galaxy Community was created in 2018. It builds on the work done by the ELIXIR Galaxy Working Group (2015-18), and its goals are:
To build a European network of Galaxy communities
- An increasing number of sub-communities have grown up to address specific tasks in Galaxy. The collaborative portal Workflow4Metabolomics (W4M) and PhenoMeNal, for example, are dedicated to handling metabolomic data, and feature in the work of the ELIXIR Metabolomics community.
- The Galaxy community's goal is to foster interaction between the data-specific Galaxy communities, to set up common analysis workflows and standards, and to provide training in these.
To extend Galaxy training provision
- As a result of ELIXIR-organised workshops Galaxy Training was established. This is a training material repository that contains over 70 tutorials and provides material for using and developing Galaxy, which we will add to. The repository is open for everyone to use and contribute. We endeavour to keep this up-to-date and expand it to cover all areas of life sciences.
- Galaxy training material is already integrated and listed in TeSS, and we will add support for BioSchemas in the next year.
- We will promote use of these resources, and the provided workflows, including more information about where this training can be run, capable instances, trainers, needed tools and virtual Galaxy images. This will be done in collaboration with the Training, Tools and Compute Platforms.
To create a Galaxy cloud infrastructure across Europe
- The growing amount of data generated in life science, and the growing number of Galaxy communities, requires increasing amounts of computing power. Our aim is to facilitate access to a broad portfolio of analysis workflows for European researchers.
- Some nodes already offer a centralised instance of Galaxy (IFB, France; de.NBI, Germany). In 2018, in collaboration with the Galaxy Core Team, we launched usegalaxy.eu, hosted in Freiburg (de.NBI). We aim to expand this into a network of Galaxy instances worldwide, guaranteeing a base level of compatibility and enabling all training materials of the Galaxy Training Network. See the flyer for usegalaxy.eu.
- We also want to facilitate the usage of Galaxy across the different ELIXIR clouds, e.g. by using CloudLaunch as a single entry point for users.
- We aim to enable easy authentication to Galaxy instances by using the ELIXIR AAI. This will require a close collaboration with the Compute Platform in ELIXIR.
To make it easier to access and transfer data
- Getting data from public databases into a Galaxy instance is the first step for most analyses. But identifying files and their URLs and uploading these files in a computational environment is not easy for users with limited technical skills.
- We aim to facilitate uploading data into Galaxy instances from the Core Data Resources such as ENA, ArrayExpress, PRIDE and UniProt, and also more specialised databases such as Brenda, Silva and RNACentral. To optimise data access and integration in Galaxy we need to standardise and the automate data transfer. This will require close collaboration with the data providers, ELIXIR is ideally placed with all the life-science data providers in Europe, part of the Data platform, to achieve this.
- We are working with closely with the Galaxy community to create and maintain a shared storage of common reference data for genomes across Galaxy instances, based on CVMFS. This will facilitate adding new genomes to any Galaxy instance, including indices and annotations.
To improve tools and data integration
- Currently, a data-to-tools approach is prevalent in data analysis. This involves copying of a large volume of data to a compute environment for analysis. To avoid this, we propose a tools-to-data approach based on virtualization, such as Docker, Singularity or rkt.
- Galaxy already supports BioContainers, meaning that tools and workflows in Galaxy can run in isolated BioContainers. We aim to maintain, update and extend BioContainers integration to keep the resource relevant and up-to-date.
- We also aim to improve the accessibility of tools and data, allowing users to easily combine public and private storage and compute cloud services.
To promote FAIR principles in Galaxy
- The ELIXIR Galaxy community will promote the use of Galaxy projects that enhance the FAIRness of Galaxy. These include Galaxy ToolShed (a repository of Galaxy tools and utilities) and GalaxyCat (an online catalogue of the tools available on various Galaxy instances).
- We will promote the use of the ELIXIR Tools registry bio.tools as a place to find Galaxy utilities. Tools in a Galaxy instance can already be registered in bio.tools quickly via ReGaTE. The community will also work with bio.tools developers to integrate BioConda, BioContainers and the Galaxy ToolShed more tightly into the registry.
- The Galaxy community aims to work with the Interoperability Platform to annotate Galaxy objects (histories, workflows, etc.) as standardised ResearchObjects to facilitate sharing.
- We will extend the myFAIR analysis framework into a cloud based service using a scalable INDIGO-DataCloud service with built-in data security features.
The Galaxy Community has been involved in a number of short-term, technical projects called Implementation Studies. The current studies it is involved in are:
- Expanding the Galaxy: meeting (the needs of) ELIXIR Communities
- Reuse, extension, scaling of scientific workflows (2018-cwl)
- A Scalable approach to Personal FAIR Data (2018-Human-myFAIR)
For completed studies see the Implementation Studies page.
Find out more
- For Galaxy activities from 2015-18 see the Galaxy Working Group page.
- Doppelt-Azeroual, O., Mareuil, F., Deveaud, Kalaš, M., Soranzo, N., van den Beek, M., Grüning, B., Ison, J. and Ménager, H. (2017). ReGaTE: Registration of Galaxy Tools in Elixir GigaScience, doi:10.1093/gigascience/gix022
- Contact galaxy-wg[at]elixir-europe[dot]org if you'd like to know more about the Community's work.