Repeat feature annotation

If repeat data is present in INSDC when a genome is loaded, then those features are imported into Ensembl Genomes. For bacterial genomes, this is currently the only source of repeat data. For other divisions, a computational pipeline is additionally run, to annotate three types of repeat:

  • Low-complexity regions (Dust [1])
  • Tandem repeats (TRF [2])
  • Complex repeats (RepeatMasker [3])

Annotating repeats with RepeatMasker requires a repeat library. In most cases, a species-specific library is not available, so the RepBase [4] database of eukaryotic repetitive elements is used. Species-specific repeat libraries from the following sources are used where possible:

Viewing and accessing repeat features

By default, repeat features are not displayed in the genome browser; display them by using the Configure this page option. You can view all repeats, or a subset of repeats based on type.

The repeat annotations can be programatically accessed using the Ensembl API. See the RepeatFeature and RepeatFeatureAdaptor documentation for further details.

References

  1. Morgulis A et al. (2006) A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J Comput Biol. 13:1028-40
  2. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27: 573-580
  3. Smit AFA, Hubley R, Green P (1996-2010) RepeatMasker Open-3.0 http://www.repeatmasker.org
  4. Jurka J et al. (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 110:462-467