Annotation of external cross references
The identifiers, names and descriptions of the genes, transcripts and translations in Ensembl Genomes are typically imported from or created in collaboration with the relevant communities for a given species. In addition, external cross references to these objects are automatically created from various other databases as part of the standard release process, as described below.
External cross references are useful for several interrelated purposes:
Automatically creating and importing external cross references
There are two types of external cross reference (XRef), direct or dependent.
For example, the translation for the Arabidopsis thaliana gene AT1G58030 in Ensembl Plants is identical to the UniProtKB/Swiss-Prot protein CAAT2_ARATH, giving us a direct XRef by synonymy. Additionally, CAAT2_ARATH is annotated within UniProtKB with XRefs to more than 20 other databases. A sub-set of these XRefs (e.g. BT046174 in the European Nucleotide Archive) are additionally imported as 'dependent' XRefs based on the original 'direct' XRef.
The process of importing XRefs for a given species consists of loading direct and dependent XRefs from a pre-defined set of sources. Each source is configured to use either direct mappings (synonymy) or by sequence alignment using exonerate. The sources used are either generic, applying equally to all species, taxon-specific, or can be specific to a single species.
Note that species imported directly from the INSDC archives are processed differently, having direct XRefs for the primary INSDC feature and dependent XRefs from UniProtKB. For more details, see INSDC annotation import.
Cross references are used as a mechanism of adding ontology annotations to genes, transcripts and translations in Ensembl Genomes. Ontology terms, typically but not exclusively, from the Gene Ontology, are imported from four distinct sources, two within the XRef pipeline described above and two separate sources, respectively:
Common XRef sources
We import XRefs from this list of sources for all our species, unless otherwise specified on the species homepage:
and where data is available from this list of sources:
and an additional list of more than 100 species or taxon specific sources (see individual species pages for details).
Gene, transcript and translation XRefs are tabulated on the gene and transcript pages under the External references section of the left hand menu. For example, here or here. Additionally, XRefs may be returned within BioMart, or queried in the Perl or REST API.