Ensembl Perl API

Ensembl Genomes uses MySQL relational databases to store its information. A comprehensive set of Application Programme Interfaces (APIs) serve as a middle-layer between underlying database schemes and more specific application programmes. The APIs aim to encapsulate the database layout by providing efficient high-level access to data tables and isolate applications from data layout changes. The API uses an object-oriented approach to model real biological objects such as genes, transcripts and 'slices' of DNA sequence, making it straightforward for you to write scripts that retrieve and analyse data e.g.

# find gene stable ids corresponding to the symbol 'Tyr' 
my $gene_adaptor = $dbCore->get_GeneAdaptor();
my @genes = @{ $gene_adaptor->fetch_all_by_external_name('Tyr') };

The database schema and API used by Ensembl Genomes have been developed in the context of the Ensembl project. All Ensembl Genomes databases are completely compatible with the Ensembl API and tools.

Tutorials

All Ensembl Genomes databases can be accessed using the Ensembl Perl API, for which full documentation is available on the Ensembl website. This includes a range of extensive tutorials:

Working with Ensembl Genomes data

Configuring the Registry

To work with Ensembl Genomes data, the Registry should be configured to use the public MySQL server provided by Ensembl Genomes, or an alternative local mirror:

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

Bio::EnsEMBL::Registry->load_registry_from_db(
    -host => 'mysql-eg-publicsql.ebi.ac.uk',
    -port => 4157)

This will load the data from the current version of Ensembl Genomes.

To use both Ensembl and Ensembl Genomes data in parallel, multiple servers can be specified e.g.

use strict;
use warnings;
use Bio::EnsEMBL::Registry;
Bio::EnsEMBL::Registry->load_registry_from_multiple_dbs(
    {-host => 'mysql-eg-publicsql.ebi.ac.uk',
     -port => 4157, 
     -user => 'anonymous'
    },
    {-host => 'ensembldb.ensembl.org',
     -port => 5306,
     -user    => 'anonymous'
    }
);

Note: In contrast to the vertebrate Ensembl data set where each genome is stored in a single MySQL database which can be addressed individually, genomes from Ensembl Genomes are often stored in collection databases, with one MySQL database containing up to 250 genomes. When using these genomes, the easiest approach is to load the whole registry as above and select an individual species, or use the Ensembl Genomes Perl API (discussed further below).

Compara

In contrast to Ensembl, Ensembl Genomes provide six different Ensembl Compara databases, for each of the five divisions plus the pan-taxonomic compara. These can be selected from the Registry using the division ("metazoa", "plants", "fungi", "protists", "bacteria"; or "pan_homology" for the pan-taxonomic compara) as the "species" name e.g.

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

Bio::EnsEMBL::Registry->load_registry_from_db(
    -host => 'mysql-eg-publicsql.ebi.ac.uk',
    -port => 4157);

my $genome_db_adaptor = Bio::EnsEMBL::Registry->get_adaptor(
    'metazoa', 'compara', 'GenomeDB');

Ensembl Genomes API

When working with larger numbers of genomes e.g. Ensembl Bacteria, Fungi and Protists, easier selection of genomes of interest is provided by an auxillary Ensembl Genomes Perl API.

Ensembl Software Support

Ensembl and Ensembl Genomes are open projects and we would like to encourage correspondence and discussions on any subject on any aspect of these resources. Please see the Ensembl Contacts page for suitable options for getting in touch with us.