Genome annotation

The genomes provided by Ensembl Genomes contain annotation on genes and gene function that are obtained via import of external data or use of predictive algorithms. This document outlines the steps involved in adding annotation to a genome assembly:

  1. Import protein coding gene models. Ensembl Genomes does not carry out primary annotation of protein-coding gene models. Gene models can be imported either from annotation in INSDC sequence archive records or from other public sources, in which case GFF is the preferred import format. In addition, Ensembl Genomes is involved in collaborations from which manual annotation is imported.
  2. Annotate non-coding gene models
  3. Annotate repeat features
  4. Annotate protein features
  5. Add cross-references to external data sources