Non-coding RNA annotation

Data Sources

If ncRNA gene annotations have been performed along with the protein coding genes predictions by the group or consortium in charge of a sequencing project, these annotations will be imported, either through GFF import or through INSDC import.

In the case there are no ncRNA annotations, we then run our own ncRNA predictions pipelines.

Ensembl Genomes prediction pipelines

For all ncRNA except tRNA and rRNA genes, models are predicted by aligning a genomic sequence against Rfam sequences using BLASTN. The BLAST hits are then used to seed Infernal searches of the aligned regions with the corresponding Rfam covariance models. The purpose of this is to reduce the search space required, as to scan the entire genome with all the Rfam covariance models would be extremely CPU-intensive.

See Burge SW et al. (2013) Rfam 11.0: 10 years of RNA families Nucl. Acids Res. 41 D226-32.

tRNA is predicted by using tRNAScan-SE software. Version 1.23 of the program was used, configured for superregnum as appropriate.

See Lowe T.M. and Eddy S.R. (1997) tRNAScan-SE: a program for improved detection of transfer RNA genes in genomic sequence Nucl. Acids Res. 25 955-964

rRNA is predicted by using RNAmmer software. Version 1.2 of the program was used, configured for superregnum as appropriate.

See Lagesen K. et al. (2007) RNammer: consistent annotation of rRNA genes in genomic sequences Nucl. Acids Res. 35 3100-3108.

Non-coding RNA biotype

The following non-coding RNA gene types are annotated, along with pseudogenes.

tRNA
transfer RNA
rRNA
ribosomal RNA
scRNA
small cytoplasmic RNA
snRNA
small nuclear RNA
snoRNA
small nucleolar RNA
miRNA
microRNA precursors
tmRNA
transfer-messenger RNA
MRP_RNA
ribonuclease MRP (RNase MRP)
P_RNA
ribonuclease P (RNase P)
SRP_RNA
signal recognition particle (SRP) RNA
antisense
antisense RNA
ribozyme
ribozyme
telo_RNA
ribonucleoprotein reverse transcriptase (telomerase RNA)
v_RNA
vault RNA
class_I_RNA
class I RNA
class_II_RNA
class II RNA
misc_RNA
miscellaneous other RNA

Note that there are many more RFAM families but whenever they are classified as motifs (e.g. a SECIS element motif, RF00031), they are filtered out by our ncRNA gene prediction pipeline.