Repeat feature annotation

If repeat data is present in INSDC when a genome is loaded, then those features are imported into Ensembl Genomes. For bacterial genomes, this is currently the only source of repeat data. For other divisions, a computational pipeline is additionally run, to annotate three types of repeat:

  • Low-complexity regions (Dust [1])
  • Tandem repeats (TRF [2])
  • Complex repeats (RepeatMasker [3])

Annotating repeats with RepeatMasker requires a repeat library. In most cases, a species-specific library is not available, so the RepBase [4] database of eukaryotic repetitive elements is used. Species-specific repeat libraries from the following sources are used where possible:

Viewing and accessing repeat features

By default, repeat features are not displayed in the genome browser; display them by using the Configure this page option. You can view all repeats, or a subset of repeats based on type.

The repeat annotations can be programatically accessed using the Ensembl API. See the RepeatFeature and RepeatFeatureAdaptor documentation for further details.


