Single-Cell Omics Community

Single-cell and spatial omics (SCO) has revolutionised the way and the level of resolution by which life science research is conducted. It has not only impacted our understanding of fundamental cell biology, but has also provided novel solutions in cutting-edge medical research.

The rapid development of single-cell and spatial technologies has been accompanied by the active development of data analysis methods, resulting in a plethora of new analysis tools and strategies every year. Such a rapid development of SCO methods and tools poses several challenges in standardisation, benchmarking, computational resources and training.

These challenges are in line with the activities of ELIXIR. The ELIXIR Single-Cell Omics Community aims to identify the main challenges in single-cell and spatial omics research and coordinate an international effort to best serve the needs of researchers. The Community will build on top of national experiences, and pave the way towards integrated long-term solutions for SCO research.

Graphs showing increased use of SCO technologies
The SCO is rapidly expanding in terms of methods, data and tools. (a) Current count of articles using SCO technologies and cumulative number of cells sequenced and deposited in public databases. (b) Number of tools developed specifically to work with SCO. (c) Most common SCO molecular profiling technologies mentioned in publications. (d) Top 15 most targeted categories for software development in SCO. (e) Number of tools developed for SCO, split by which scripting languages are used. Data were taken from the scRNA-tools database and the single-cell studies database, and surveyed up until January 2022. Figure from Czarnewski P, Mahfouz A, Calogero RA et al. Community-driven ELIXIR activities in single-cell omics, F1000Research 2022, 11(ELIXIR):869.

Goals of the Community

The SCO Community has outlined key focus areas in their whitepaper on F1000Research. In the initial phase the Community aims to address issues in:

Training

This is the core goal of the Community.

  • We will connect existing materials to compile a comprehensive collection of training materials, datasets and guidelines on how to teach for trainers and learners.
  • We will adopt both top down and bottom up approaches via:
    1. Trainer the Trainer (TtT) workshops for exchanging ideas about best practices, methods and datasets, and
    2. offering advance training courses and compiling video catalogue of training material to enable self-study and asynchronous learning for researchers.
  • In the long term we aim to utilise ELIXIR TeSS to establish a well-curated ELIXIR SCO training portal, listing national and international bodies, web resources and upcoming training events.

Standardisation and interoperability

Several standards exist in the field of Single-Cell Omics (e.g. FASTQ, FAST5, BAM, CRAM), while a few processed data formats are starting to converge (e.g. tab-separated files, AnnData, HDF5, loom, SingleCellExperiment, Seurat, scverse). However, many of these have had to change in order to adapt to new technological advances that rendered previous formats inadequate.

  • We will focus on best practices for FAIRification in the SCO Community, building upon establishing and promoting data and metadata standards. This will be strongly coordinated with the Human Cell Atlas Data Coordination Platform.
  • To effectively address shortfalls in current data/metadata standard paradigms, we will work towards monitoring emerging technologies and bodies and communicate these guidelines to them. We will also help incorporate new experimental concepts into these existing guidelines. Particular focus will be on spatial gene expression, as this is already showing signs of widespread adoption.

Identifying the most appropriate and performant analysis tools

A plethora of SCO tools exist, and yet standards on how to benchmark or evaluate the accuracy of each tool are lacking. Furthermore, most benchmark efforts are focused on certain cell types or tissues.

  • The first goal is the identification of a few scRNA-seq datasets for developing the rules to be used for defining datasets suitable to become benchmarking cases. We aim to create a central pipeline to benchmark SCO tools and define standard datasets for such benchmarks. The outcomes of benchmarking and software challenges allow data scientists to make an informed decision on the software to be used in their analytical workflows.
  • To facilitate findability and usability, we aim to contribute to tool registries and provide portable software environments of the most commonly used tools/workflows.
  • In the long term, we aim to provide cloud deployable analysis pipelines that utilise Galaxy and Chipster platforms, as well as providing curated datasets for user driven benchmark on the OpenEBench infrastructure.

Leadership

Paulo Czarnewski
Paulo Czarnewski
(ELIXIR Sweden)
Naveed Ishaque
Naveed Ishaque
(ELIXIR Germany)
Eija Korpelainen
Eija Korpelainen
(ELIXIR Finland)
Katharina Heil
Katharina Heil
(Communities Coordinator, ELIXIR Hub)

Find out more

  • Contact details:
    • Paulo Czarnewski: paulo.czarnewski [at] scilifelab.se
    • Naveed Ishaque: naveed.ishaque [at] bih-charite.de
    • Eija Korpelainen: eija.korpelainen [at] csc.fi
  • Czarnewski P, Mahfouz A, Calogero RA et al. Community-driven ELIXIR activities in single-cell omics, F1000Research 2022, 11(ELIXIR):869