Metagenomics is the study of genetic material recovered directly from environmental samples. It studies microorganisms that share a specific living space like sea, human gut or soil. It has applications in many research fields, ranging from microbial proteins in biotechnology to environmental restoration of polluted soil.
The whole metagenomics field is growing exponentially, as the recent advances in sequencing technologies have enabled more research groups to do metagenomics and perform larger studies. However, there is a danger that data is produced faster than users are able to share, analyse and interpret it.
Major demand for training
In order to address this issue, CSC (ELIXIR Finland) organised an international metagenomics data analysis course in April 2017 as part of the ELIXIR-EXCELERATE and PRACE projects. The teachers included specialists from the Norwegian ELIXIR Node, the European Bioinformatics Institute, Finnish universities and CSC.
"We are really happy that EXCELERATE and PRACE enabled us to organise this course, and so many experts were willing to come to teach. There is clearly a big demand for metagenomics training, as we have 50 participants from 11 countries, and some applicants were left out due to space limitations. In order to enable a larger number of people to benefit from the course, we record the lectures and make the videos and training material available", explains Eija Korpelainen, the ELIXIR Finland Training Coordinator.
Right tools for simpler analysis
Jenni Hultman from the University of Helsinki knows from experience that a need for this kind of course truly exists in Finland. She studies arctic microbial communities and gave a lecture on assembling genomes from metagenomics data:
"When I first got interested in metagenomics data analysis and wanted to know more, there was practically no one in Finland who could have helped me. So I had to go abroad. Some researchers have had access to datasets that nobody in their research group could analyze. Now they’ve seen it is actually not so hard.
"Metagenomics data analysis is actually quite simple if you have the right tools", confirms Nils Willassen, one of the leaders of the ELIXIR Marine Metagenomics Use Case. He and his team from ELIXIR Norway presented the META-pipe analysis pipeline that they have developed and the participants got hands-on training in using it. "One thing I try to remind the researchers of: contact us before your project starts, so we can help you to design the experiment" adds Willasen.
See the course programme, training materials and lecture videos
Need for computing resources will grow immensely
When asked about the challenges that metagenomics researchers encounter, the lecturers mention the number of samples. There is either a huge amount of samples making the data analysis a bottleneck, or too few samples, and therefore, not enough replicates for statistical analysis. "Even though sequencing is getting cheaper, the data analysis requires a lot of computing resources", says Hultman.
"While our META-pipe analysis pipeline is publically available, one major challenge is that we cannot offer computing resources for all the researchers in the world. We need to find sustainable solutions for covering the computing costs", explains Willasen.
The ELIXIR Marine Metagenomics Use Case and the ELIXIR Compute Platform are working together to address this issue through a distributed cloud and computing resources. "Underlying infrastructure services relying on ELIXIR single-sign-on enables researchers to seamlessly use cloud and compute resources in ELIXIR Nodes from within the META-pipe analysis portal,” says Tommi Nyrönen, Head of ELIXIR Finland and one of the leaders of the ELIXIR Compute Platform. “By decoupling the data analysis software pipelines and the compute resources, we can create a distributed research infrastructure accessible from any ELIXIR Node,” concludes Nyrönen.
Authors: Eija Korpelainen (ELIXIR Finland) and Premysl Velek. Originally published on CSC Finland (ELIXIR Finland) website.