Services: Core Data Resources

Name of service Tag Related links* Key Collection
BacDive

BacDive represents a collection of organism-linked information covering the multifarious aspects of bacterial and archaeal biodiversity. The content encloses information on taxonomy, morphology, physiology, sampling and environmental conditions as well as molecular biology.

CDD
BioStudies

The BioStudies database holds descriptions of biological studies, links to data from these studies in other databases at EMBL-EBI or outside, as well as data that do not fit in the structured archives at EMBL-EBI.

BRENDA

A comprehensive enzyme information system. It is the world’s largest and most widely information system on all aspects of enzymes, including function, structure, mutants, properties like stability, purification. Data download is possible as an integrated text filed or via SOAP.

CDD
CATH/Gene3D

A classification of protein structures and sequences that groups protein domains into superfamilies. 

CDD
Cellosaurus

A knowledge resource on cell lines.

CDD
ChEBI

ChEBI (Chemical Entities of Biological Interest) is a dictionary of small molecular entities. It is manually annotated and provides a chemical ontology to describe small molecules, including their biological and chemical roles.

CDD
ChEMBL

ChEMBL is a database of bioactive compounds that focuses on interactions between small molecules and their macromolecular targets, including medicinal chemistry, clinical development and therapeutics data.

CDD
EMDB

The Electron Microscopy Data Bank (EMDB) is a public repository for electron microscopy density maps of macromolecular complexes and subcellular structures. It covers a variety of techniques, including single-particle analysis, electron tomography, and electron (2D) crystallography.

Ensembl

Produces and maintains automatic and manually curated annotation on eukaryotic genomes. It is integrated with important molecular resources, for example UniProt, and can be accessed programmatically or through a web browser.

CDD
Ensembl Genomes

Provides access to genome-scale data from bacteria, protists, fungi, plants and invertebrate metazoa, through a unified set of interactive and programmatic interfaces based on the Ensembl software platform.

CDD
European Genome-phenome Archive (EGA)

The European Genome-phenome Archive (EGA) allows users to explore datasets from numerous genotype experiments, including case-control, population and family studies, that are supplied by a range of data providers. 

EuropePMC

Europe PubMedCentral (EuropePMC) contains over 3 million full text life science research articles, of which over 900 000 are open access, and combines these with 30 million abstracts from PubMed and other sources. 

CDD
GWAS Catalog

The NHGRI-EBI GWAS Catalog is a quality-controlled, manually curated, literature-derived collection of all published genome-wide association studies.
 

HGNC

HUGO Gene Nomenclature Committee, responsible for approving unique symbols and names for human loci, including protein coding genes, ncRNA genes and pseudogenes, to allow unambiguous scientific communication.

CDD
Human Protein Atlas (HPA)

Database with millions of high-resolution images.

CDD
IntAct

IntAct provides a freely available, open source database system and analysis tools for molecular interaction data.

InterPro

InterPro classifies proteins into families and predicts the presence of important domains and sites.

CDD
LIPID Maps

Provides access to lipid nomenclature, databases, tools, protocols, standards, tutorials, meetings, publications, and other resources and serving the international lipid research community.

MGnify

Formerly called EBI Metagenomics, MGnify is an automated pipeline for the analysis and archiving of metagenomic data.

CDD
Orphadata

Orphadata provides the scientific community with comprehensive, quality data sets related to rare diseases and orphan drugs from the Orphanet knowledge base, in reusable formats.

CDD
PomBase

PomBase is a model organism database that provides organization of and access to scientific data for the fission yeast Schizosaccharomyces pombe. PomBase supports genomic sequence and features, genome-wide datasets and manual literature curation.

PomBase also provides a community hub for researchers, providing genome statistics, a community curation interface, news, events, documentation and mailing lists.

CDD
PRIDE

PRIDE (The Proteomics Identifications Database) is a standards-compliant, public repository for proteomics data. It contains protein and peptide identifications and their associated supporting evidence.

Protein Data Bank in Europe (PDBe)

The Protein Data Bank in Europe (PDBe) is the European part of the wwPDB for the collection, organisation and dissemination of data on biological macromolecular structures.

Reactome

An open-source, curated and peer reviewed pathway database. Its goal is to provide tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling and systems biology.

CDD
Rhea

Rhea is a comprehensive and non-redundant resource of expert-curated biochemical reactions described using species from the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. 

CDD
SILVA

SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya).

CDD
STRING

STRING is a database of known and predicted protein-protein interactions. The database contains information from numerous sources, including experimental repositories, computational prediction methods and public text collections. 

CDD
SWISS-MODEL Repository

SWISS-MODEL Repository is a continuously updated database of annotated protein structure homology models generated by the fully automated SWISS-MODEL modelling pipeline. 

CDD
The European Nucleotide Archive (ENA)

The European Nucleotide Archive (ENA) contains all the nucleotide sequences in the public domain and consolidates data from EMBL-Bank, the European Trace Archive and the Sequence Read Archive.

UniProtKB

UniProt produces and maintains automatic and manually curated annotation of all publicly available protein sequences and serves these to users through various interfaces. 

CDD

For information about what Core Data Resources are and how they are selected see the Data Platform pages.