BioSchemas supports the discovery of public datasets

Guidelines recently published by Google for the discovery of science datasets help data providers to describe their datasets in a structured way using schema.org, enabling internet search engines to find and index rich metadata to better present scientific datasets.

The published guidelines draw on the metadata specifications for life-science datasets developed by BioSchemas. One of the early adopters of the specifications is the Omics Discovery Index (OmicsDI), which has been presented as a good practice example in recent Google Research Blog post. OmicsDI has been developed by EMBL-EBI and supported by BD2K, and is an active member of the BioSchemas community. It provides dataset discovery service across a heterogeneous, distributed group of -omics data from eight repositories across the world.

BioSchemas is an open community initiative driven by ELIXIR to improve interoperability of life-science data. Building on and extending the schema.org markup, Bioschemas develop a collection of specifications that provide guidelines for describing metadata about life science information. Besides life science datasets, BioSchemas is working on specifications for samples, phenotypes, data repositories or proteins sequences.

To support the work of Bioschemas, ELIXIR has recently launched the BioSchemas Implementation study. The main partners in the study are BBMRI, BD2K and FORCE11, however, it has support of over 40 stakeholders. The BioSchemas group for life science datasets includes representatives from PDBe, UniProt, Pfam, DataMed and DATS, Repositive, OmicsDI, Intermine and Google.

Carole Goble, the Head of ELIXIR UK and one of the leaders of the Implementation study said: “Improved discoverability of data will encourage data re-use and sharing and I am delighted to see the growing momentum among so many institutions. Our goal is to bring together data providers, data users, domain experts and developers; BioSchemas as an open community of life science organisations, plays an important role in this effort.”

The BioSchemas Implementation study is led by Carole Goble, Alasdair Gray (ELIXIR UK) and Rafael Jimenez (ELIXIR Hub). Besides ELIXIR UK, the project also involves ELIXIR Nodes at EMBL-EBI and in Netherlands, Denmark, Sweden, Germany and Finland. The kick-off meeting will take place 6-8 March 2017 in Hinxton, UK.

The work of the BioSchemas community will also feed into the new Horizon 2020 project EOSCpilot (European Open Science Cloud pilot). One of the priorities of the project’s interoperability activities will be the findability of data; the goal is to build on BioSchemas results in the life-science domain and extend them to general scientific data types like datasets and samples.

More information:

Posted

Mon 6 February 2017

Story collection shows interoperability successes	6 February 2025
Eight new projects funded by ELIXIR	2 February 2024
ELIXIR announces new Core Data Resources and Recommended Interoperability Resources	14 December 2023
Supporting the adoption of container services	23 October 2023
New toolkit for impact evaluation in research infrastructures	6 October 2023

ELIXIR co-located event - February 2026	9 February 2026 - 12 February 2026
ELIXIR Interoperability Platform F2F Meeting (Hybrid) 2024	25 November 2024 - 26 November 2024
ELIXIR Toxicology Community Webinar 1	2 May 2024
Data & Interoperability Platforms: F2F Meeting (Hybrid) - 2023	21 November 2023 - 23 November 2023
Data-Interoperability Joint Platform F2F Hybrid Meeting	28 November 2022 - 30 November 2022

BioSchemas supports the discovery of public datasets

Related news

Related events