Extending open proteomics data analysis pipelines in the cloud: Additional tools and focus on scalability, supporting the dramatic growth of public proteomics data

An ELIXIR implementation study started in February 2017, as a collaboration between EMBL-EBI and ELIXIR-DE. Its main objective is to develop open, robust, scalable and reproducible proteomics data analysis workflows based on OpenMS, directly connected to the PRIDE database (an ELIXIR core data resource) and to deploy these pipelines in the EMBL-EBI "Embassy Cloud" as a proof of concept.

Building on this work, we here propose a follow-up project that has three objectives: 

  1. The inclusion of additional open tools developed by other ELIXIR nodes
  2. The improvement of the overall infrastructure supporting the implementation of proteomics data analysis pipelines
  3. The inclusion of quality control pipelines.

The overarching goal is that these tools can be deployed in other cloud infrastructures, and can be easily reused by anyone in the community, thus bringing the users closer to the tools, and the tools closer to the data.

Impact of the study

The outcome will be that an increased range of open proteomics tools will be included in an extended range of cloud infrastructures, including new quality control features based on OpenMS. Impact – increased facility for proteomics analysis across multiple cloud platforms – all with increased degree of quality control.

Duration
1st August 2018 - 31st July 2019