ELIXIR Guest Webinar: Two universes, one world - Community standards vs. formal ISO standards in the life sciences

GMT

This ELIXIR Webinar was given by Martin Golebiewski, a guest speaker from The Heidelberg Institute for Theoretical Studies.

Given the increasing flood and complexity of data in life sciences, standardization of these data and their documentation are crucial. This comprises the description of methods, biological material and workflows for data processing, analysis, exchange and integration (e.g. into computational models), as well as the setup, handling and simulation of models. Hence, standards for formatting and describing data, workflows and computer models have become important, especially for data integration across the biological scales for multiscale approaches.

To this end many grassroots standards for data, models and their metadata have been defined by the scientific communities and are driven by standardization initiatives such as COMBINE (http://co.mbine.org) and others. For providing the potential users with an overview and comparable information about such standards web-based information resources have been developed and are publicly available, such as the NormSys registry for modelling standards (http://normsys.h-its.org).

For facilitating the integration of data and models, standards have to be harmonized to become interoperable and allow interfacing between the often heterogenous datasets. To support this, novel standards are defined by the International Organization for Standardization (ISO) in its technical committee ISO/TC 276 – Biotechnology ( https://www.iso.org/committee/4514241.html). One example is the emerging standard ISO 20691 “Requirements for data formatting and description in the life sciences for downstream data processing and integration workflows” that defines a framework and guideline for community data standards in the life sciences and their application. Such standards aim at enhancing the interoperability of standards for life science data and models and therefore facilitate complex and multiscale data integration and model building with heterogenous data gathered across the domains.