Machine Learning (ML) has emerged as a discipline that enables computers to assist humans in making sense of large and complex data sets.
With the drop in the cost of high-throughput technologies, large amounts of omics data are being generated and made accessible to researchers. Analyzing these complex high-volume data is not trivial, and the use of classical statistics can not explore their full potential.
Machine Learning can thus be very useful in mining large omics datasets to uncover new insights that can consequently lead to the advancement of Life Sciences.
The ELIXIR Machine Learning Focus Group was initiated in October 2019, in order to capture the emerging need in Machine Learning expertise across the network.
Goals of the group
Standards for Machine Learning
This includes aspects such as controlled terminology/ontology and services for ML model description and sharing, alignment to the ELIXIR Tools and Interoperability Platforms, as well as defining best practices for Machine Learning-related reviewing.
Machine Learning and reproducibility
This area focuses on the definition of the best practices for developing, sharing and reusing Machine Learning approaches (including, but not limited to, Machine Learning models, algorithms, frameworks and protocols including the DOME recommendations ), while at the same time involving the existing approaches in the ELIXIR Tools Platform.
Benchmarking of Machine Learning tools
In order to facilitate clear and objective comparison of ML-based tools, it is important to establish a benchmarking protocol; this may include datasets, protocols and services offered by the ELIXIR Tools Platform.
Training for Machine Learning
Machine Learning has been identified by the ELIXIR Training Platform gap analysis task as an existing need. As such, a particular area of focus for this group will be to design and produce training resources for supporting the ELIXIR community, based on the standards and approaches established by the ELIXIR Training Platform.
Integration across ELIXIR Communities
Machine Learning is a key competency that is relevant to a large number of activities, as well as being clearly aligned to several funding opportunities. As such, a persistent activity of this group is to align and coordinate these efforts across all relevant ELIXIR groups, such as the Federated Human Data Community and the Data Platform.
Outputs of the Focus Group
The ML Focus group has recently published the DOME recommendations in Nature Methods (July 2021). DOME is a set of community-wide recommendations for reporting supervised machine learning–based analyses applied to biological studies. Broad adoption of these recommendations will help improve machine learning assessment and reproducibility.
Going beyond a standard, the DOME recommendations can facilitate reproducibility in Machine Learning through the clear definition of the involved steps. As such, it can be also used in training capacity, assisting the implementation and overall design of ML studies in Life Sciences. More information can be found on the DOME-ML website.
ML_focus_group [at] elixir-europe.org