EST collections in Ensembl Genomes

Comprehensive sets of EST and full-length cDNA sequences have been aligned to most species in Ensembl Genomes. When a genome is added, all of the EST sequences for the species are retrieved from dbEST and aligned. Additional EST collections are obtained from species-specific EST databases or directly contributed from external research groups. The sets of aligned ESTs are updated intermittently, or by community request. Details of an EST's source, along with a link to the source database, if available, can be found by clicking on the track's "more information" icon in the Ensembl Genomes browser.

Alignment generation

EST alignments in Ensembl Genomes are generated with Exonerate [1]. Typically, the genome sequence is soft-masked to improve alignment quality without sacrificing sensitivity. Alignments shorter than 50 base-pairs, with introns longer than 25Kb, or with an Exonerate score below 300 are discarded. EST sequences from separate sources are aligned in distinct batches, and are presented in separate browser tracks.

Data access

EST alignments can be accessed programmatically using the Ensembl core API. They are stored as Bio::EnsEMBL::DnaDnaAlignFetaure objects. More details on how to do this, including sample code, are available in the "Alignment Features" section of the Ensembl Core API Tutorial.


[1] Slater and Birney, Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 2005, 6:31