Increasing the translational value of public proteomics datasets: Automatic metadata-driven reanalysis in cloud infrastructures

Resulting from the work performed in previous and/or ongoing ELIXIR implementation studies (IS) led by the ELIXIR Proteomics Community, PRIDE public proteomics datasets including metadata annotations using the Sample and Data Relationship Format (SDRF) as well as some open proteomics data analysis pipelines are starting to be available.

In this proposed follow-on IS previous results/outputs will be used as the base to develop a set of open and user-friendly analysis pipelines, which will be applied to assess the possibilities for performing more automated re-analyses using the metadata SDRF-encoded annotations of public datasets, not only at an identification but also on a quantitative and a statistical level for both data-dependent acquisition (DDA) and data-independent acquisition (DIA) datasets. Additionally, common ideas in this context and in others overlapping topics of interest will be explored e.g. in a joint gap analysis performed between the ELIXIR Proteomics with the IDP and 3D-BioInfo ELIXIR Communities, to further serve the overall ELIXIR goals.

This study provides a well-documented use case, which will motivate users to perform SDRF-annotations of public datasets. By developing and providing the community with data processing and analysis pipelines as well as by helping to standardize data management and annotation, two goals of the proteomics community will be addressed.