Most human cells divide by mitosis, where one parent cell divides to produce two daughter cells. Just before a cell divides it makes a copy of its own DNA, but this copy may not have the same gene sequence as the parent DNA. One gene, for example, may be copied twice into the new DNA, or not copied at all. This phenomenon is called Copy Number Variation (CNV).
It is thought that these Copy Number Variations are vital for evolution, but they also play important roles in disease. Despite the fact that Copy Number Variations are the most prevalent genetic mutation type, identifying and interpreting them is still a major challenge. The ELIXIR human Copy Number Variation (hCNV) Community aim to implement processes to make the detection, annotation and interpretation of these variations easier.
Goals of the Community
To define optimal CNV detection pipelines
- The Community will use reference datasets to benchmark the existing tools for detecting hCNVs. The most sensitive, specific, reliable and rapid tool for each dataset will be identified. If no system is effective enough for some conditions, the hCNV partners will develop new tools.
- The Community will also optimize the selected tool pipelines to increase their performance on ELIXIR compute facilities, and develop guidelines to help end-users use these tools to detect CNVs. These guidelines will be available through the ELIXIR Training Platform.
To identify reference datasets for hCNV
- The Community will produce reference datasets of fully validated somatic and germline CNVs representing a wide range of samples types and experimental technologies. These reference materials will enable the Community to evaluate and compare pipelines and/or Next-Generation Sequencing (NGS) technologies, and can be used for quality assurance.
- The NGS technologies are rapidly evolving and therefore the reference datasets will need to be regularly updated.
To establish and define data exchange formats
- International collaborative projects require common standards to describe their results. This ensures efficient data aggregation and comparison. Although various international initiatives are currently addressing this issue, no robust and exhaustive standard CNV annotation format has emerged so far.
- To address this, the hCNV Community will establish a list of existing formats used to describe CNVs, and develop recommended formats. If a few alternative formats are frequently used, it will provide bioinformatics resources to convert data into the common data exchange format. This work will align with existing efforts, notably GA4GH Work Streams and the ELIXIR Interoperability Platform.
To create a process to facilitate the identification of patients with similar genotypes and phenotypes
- Because rare diseases affect fewer than 1 in 2,000 people and as most mutations are private (i.e. restricted to a single family or small group), finding similar cases at the clinical level is a challenge. Yet finding similar cases is essential for clinical diagnosis and for identifying disease-causing genes.
- The hCNV Community will recommend standard ways of describing rare disease and provide ways to map various ontologies, medical terminologies, vocabularies and nomenclatures. This will enable cross queries and the identification of similar patients.
To develop innovative tools to detect, annotate and interpret CNV
- CNVs could involve large genomic regions and encompass multiple genes. In addition, in recessive diseases CNVs can alter one allele of a specific gene, while the second one could be altered by Short Nucleotide Variations (SNV). In many situations, it is thus difficult to identify the single or multiple genes whose alteration is directly associated with the patient's phenotype.
- Here, the hCNV community will develop innovative tools to annotate CNVs, facilitate their interpretation through a combinatorial approach, and help to pinpoint key genes in regions of interest.
To help make hCNV services and datasets FAIR (Findable, Accessible, Interoperable, Reusable)
- Various CNV national databases, ELIXIR Core Data Resources, and ELIXIR Deposition Databases are currently being developed by ELIXIR hCNV partners. In order to allow interoperability (including discovery), the FAIR principles (Findable, Accessible, Interoperable, Reusable) will be applied to those systems to demonstrate the feasibility and utility of distributed CNV databases.
- The Community will use the BANCCO and the CIBERER databases to demonstrate the benefits of using the FAIR data principles for CNV in diagnostic and research contexts.
To disseminate knowledge and train researchers
- The Community will organise meetings to gather experts' point of view on its work, to gather feedback and to disseminate knowledge. It will also promote the ELIXIR hCNV community through participation in international consortia, such as GA4GH.
The hCNV Community is leading an Implementation Study:
Find out more
- Contact gary.saunders[at]elixir-europe[dot]org if you'd like to know more about the community's work.