Crowd-sourcing the annotation of public proteomics datasets to improve data reusability

To dramatically improve the reusability of public proteomics datasets by substantially increasing the amount and quality of technical and biological annotations for datasets stored in the PRIDE database.

First, we will develop a posteriori annotation system for PRIDE, for both technical as well as biological metadata, which will leverage the unique synergies of already existing tools and pipelines developed by different ELIXIR nodes. Second, we will create data structures that can capture the most-frequently used experimental designs in proteomics studies. Third, an appropriate API will be built to allow annotation tools to be developed easily. Fourth, we will reach out to actively involve the whole proteomics community in the annotation process.

The outcome will be that an increased range of open proteomics tools will be included in an extended range of cloud infrastructures, including new quality control features based on OpenMS. Impact – increased facility for proteomics analysis across multiple cloud platforms – all with increased degree of quality control.