ELIXIR at the ISMB/ECCB, 21-25 July 2019

ELIXIR is proud to be a sponsor of this year’s ISMB/ECCB Conference, 21-25 July 2019. ELIXIR will be present at booths 23 and 24.

There will be a number of opportunities to engage with ELIXIR throughout the conference. The information below summarises our contributions to the programme and outlines the Tracks and a Special Session.

Should you need additional details then please contact Melissa Balzano, ELIXIR Events Officer: melissa.balzano@elixir-europe.org, +44 (0)1223 494305.

Title	The ELIXIR::GA4GH Strategic Partnership
Timing	Monday 22 July 2019, 10:15 am - 12:40 pm
Room	Osaka / Samarkand (3rd Floor)
Speakers	Gary Saunders (ELIXIR Hub) Jonathan Tedds (ELIXIR Hub)
Track	This session describes the complementary activities carried out by both organisations in order to establish and implement a suite of interoperable standards and tools to overcome the technical and regulatory hurdles to aid genomic data-sharing (GA4GH on an international scale, ELIXIR predominantly on a European scale).
Description	Large cohorts, with potentially millions of participants, are needed to understand the genetic and molecular signatures of diseases, and they provide a cornerstone for the creation of personalised treatments. The ELIXIR::GA4GH Strategic Partnership will facilitate the responsible sharing of these sensitive data across Europe in order to help create virtual cohorts with tens of millions of participants. In this session we will describe the complimentary activities that are carried out by both organisations with a focus on secure data archival, federated data discoverability, authorised data access, and distributed data analysis.

Title	ELIXIR Beacons: Federating Data Discoverability
Timing	Monday 22 July 2019, 2:00 pm - 4:00 pm
Room	Osaka / Samarkand (3rd Floor)
Speakers	Gary Saunders (ELIXIR Hub) Frederic Haziza (ELIXIR-ES)
Track	This workshop will demonstrate the Beacon API, a data discovery protocol that allows users to determine the presence or absence of a particular allele in a dataset, without disclosing any further data differentiating the individuals it contains.
Description	The ELIXIR Beacons project is a Driver Project of the Global Alliance for Genomics and Health that provides a simple way to federate data discoverability. The ELIXIR Beacons protocol can allows any variation data back end to be connected to the ELIXIR Beacons network which can then be queried to ask questions like 'does this dataset have any information about allele 'X' at position 'Y' in the genome?'. This workshop will show how recent extensions to the Beacon API has extended its functionality by adding support for additional types of genomic variants and improved metadata support. Additionally we will also demonstrate the accompanying ELIXIR Beacon reference implementation which utilises risk mitigation strategies by integrating the ELIXIR Authorization and Authentication Infrastructure (AAI), demonstrating to data owners how to light Beacons at different tiers of data access: open, registered, or controlled.

Title	Bioschemas and 4OSS: recommendations on metadata for tools
Timing	Monday 22 July 2019, 4:40 pm - 6:00 pm
Room	Osaka / Samarkand (3rd Floor)
Speakers	Kenneth McLeod (ELIXIR UK) Leyla Garcia (ELIXIR Hub) Mateusz Kuzak, Dutch Techcentre for Life Sciences, (ELIXIR-Netherlands)
Track	Learn what Bioschemas and 4 simple recommendations for open source software (4OSS) are about and how to use them to add metadata to your research software and get more visibility.
Description	This tutorial aims to introduce Bioschemas and 4OSS so research software become more findable (mainly) but also interoperable and reusable. Bioschemas is a community initiative based on schema.org aiming to facilitate the adoption of structured metadata in Life Science web pages in order to improve findability and interoperability. 4OSS comprises 4 key recommendations to improve quality of open source research software. Both of them support elements aligned with the Findable, Accessible, Interoperable and Reusable (FAIR) principles. During this session, participants will become familiar with Bioschemas and 4OSS and will learn how to use them in order to annotate (open) research software with metadata.

Title

Scalable Plant Research in Cloud Environments (Special Session)

Timing

Wednesday July 24, 10:15 am - 6:00 pm

Room

Shanghai 3/4 (Ground Floor)

Schedule Overview

Time	Title	Authors
10:30am - 11:00am	Introduction to the ELIXIR plant community	Frederik Coppens, VIB, Belgium
11:00am - 11:30am	Plant Phenotyping infrastructure: Breeding API & MIAPPE	Cyril Pommier, INRA, France
11:30am - 12:00pm	(Plant) Data Resources at Ensembl	Erin Haskell, EMBL-EBI, United Kingdom
12:00pm - 12:30pm	ELIXIR Cloud and interservice authorization	Alexander Kanitz, University of Basel, Switzerland
2:00pm - 2:30pm	Overview of CyVerse tools and services: introduction to data/metadata management and sharing with CyVerse	Jason Williams, Cyverse, United States
2:30pm - 3:00pm	Introduction to Galaxy and the European Galaxy community	Anika Erxleben, University of Freidburg, Germany
3:00pm - 3:30pm	A PhenoMeNal Workflow to Study the Metabolites Variation in Bryophytes across Seasons	Kristian Peters, Leibniz Institute of Plant Biochemistry, Germany
3:30pm - 4:00pm	Apollo and Galaxy: Scaling Genome Annotation for the Masses	Helena Rasche, University of Freiburg, Germany
4:30pm - 5:00pm	Overview of tools and container infrastructure	Hervé Ménager, Institut Pasteur, Paris, France
5:00pm - 5:30pm	FAIRly maintain and publish research data with e!DAL (electronic Data Archive Library)	Björn Grüning, Albert-Ludwigs-Universitaet Freiburg, Germany

Presentation Overview

This session will give participants an introduction to what is needed to use and create cloud-enabled bioinformatics pipelines. Speakers from several projects that are already using cloud computing to solve plant related research questions will be featured. Based on their hands-on experience, speakers will showcase usage of cloud computing in their projects, including bottlenecks and learned best practices. We will introduce participants to usage of established as well as emerging data repositories and standards. The focus will be on accessing and using these resources for FAIR data management strategies and integrative analysis leveraging the power and scalability of cloud computing, with a particular emphasis on resources created by the ELIXIR Galaxy working group, and by the larger ELIXIR community.

At the end of the session participants will be able to leverage cloud computing and data resources for their research questions according to best practices, using established production platforms.

Motivation

Plant research needs to cope with the major challenges of population growth and climate change adaptation. Sequencing of the DNA and RNA of crop and forest plants, as well as their pathogens and pests, has generated enormous quantities of data. High-throughput “omics” technologies are widely used and increasingly important to support plant biology research and breeding of diverse plant species for production of food, feed, fibre and other biomaterials, and bio-energy. Much of this data is found in well established repositories and data resources. However, large-scale automated phenotyping is now possible under controlled and field conditions, and there is classical phenotyping data available in literature and in dispersed databases. This data is heterogeneous, described in diverse ways, and difficult to find and re-use.

Significant advances in plant science can be obtained by integrating available genomic and genotyping data with diverse types of phenotyping data, including field and greenhouse experimental data, molecular, -omics and image data. Although most -omics data, and especially phenomic data, are being generated at an increasing scale from public and private research organizations, the dispersion of datasets and metadata among multiple repositories and their often poor description and annotation, make their use and exploitation still challenging or even unapproachable.

To help unlock the full potential of a multi-omics approach to plant science, it is essential to make plant data interoperable in accordance with the FAIR principles (i.e. Findable, Accessible, Interoperable and Reusable). Several standards have been built for the annotation of data sets.

For phenotyping data, ELIXIR has built an architecture based on the Breeding API (BrAPI, www.brapi.org ), an API for accessing data relevant for plant breeding developed by the international plant community. The implementation of BrAPI endpoints results in a distributed infrastructure for plant phenotyping data, allowing to access diverse datasets in different sites. To enable the interoperability of these datasets, the MIAPPE (www.miappe.org) standard for plant phenotypic data has been further developed and integration into BrAPI is ongoing. These technologies form the basis for a scalable analysis of plant phenotyping data and its integration with data in well established data archives ( www.elixir-europe.org/platforms/data/elixir-deposition-databases) such as

ArrayExpress (functional data, www.ebi.ac.uk/arrayexpress/ ),
PRIDE (proteomics, www.ebi.ac.uk/pride/archive/ ),
MetaboLights (metabolomics, www.ebi.ac.uk/metabolights/ ),
European Variant Archive (EVA, variant data, www.ebi.ac.uk/eva/ ) and
the European Nucleotide Archive (ENA, sequencing data, www.ebi.ac.uk/ena/ ).

All this creates a huge demand for compute resources that are easily accessible, scalable, and ideally equipped with a workbench that can handle large datasets and is easily deployable. On this side of the spectrum Cloud computing has gone from cutting edge to standard practice and is no longer solely the domain of computer science professionals. Many cloud providers (both scientific and commercial) exist, and private clouds exist in many universities and research institutions. The basic functionality of running cloud-native workloads can be performed on any of them, avoiding a lock-in scenario. Users are often not aware that an analysis is cloud-powered.

Projects like Bioconda ( bioconda.github.io ) and Biocontainers ( biocontainers.pro ) provide thousands of bioinformatics tools conveniently packaged for use in cloud environments, paving the way for taking bioinformatics data analysis to the cloud. Simultaneously, workflow environment systems, like Galaxy ( galaxyproject.org ) and Snakemake ( snakemake.readthedocs.io ) have been adapted to run in the cloud. As a result, cloud environments are now used in many life science research projects, and given its scalability, reproducibility and reduced costs, it is expected that more and more research projects will be conducted in this way.

More information

https://www.iscb.org/ismbeccb2019-program/special-sessions#sst02

ELIXIR at the ISMB/ECCB, 21-25 July 2019

The ELIXIR::GA4GH Strategic Partnership

ELIXIR Beacons: Federating Data Discoverability

Bioschemas and 4OSS: recommendations on metadata for tools

Scalable Plant Research in Cloud Environments (Special Session)