FAQs for EG

Q1) Can I download complete proteomes in Ensembl Genomes?

A1) Yes, protein sequences files (in FASTA format) are available from the Ensembl Genomes FTP server for all domains of Ensembl Genomes, i.e. Metazoa, Protists, Bacteria, Plants and Fungi. See an example for Oryza sativa as in release 13 here.

Q2) What do Target % ID and Query % ID mean in the Comparative Genomics views of the Ensembl browser?

A2) Query % ID and Target % ID are reported on Comparative Genomics views of the Ensembl Genomes browser such as the 'Orthologues' page (see an example here). If you are searching for one gene in arabidopsis and looking for its homologue in another species such as maize, the query % ID refers to the percentage of the query sequence (arabidopsis) that matches to the homologue (the maize protein). Target % ID refers to the percentage of the target sequence (maize) that matches to the query sequence(arabidopsis).
It can be helpful to think of a query sequence (arabidopsis protein) of length 100 amino acids, and a target sequence (maize protein) of length 50 amino acids. Assume that the sequence of 50 amino acids are identical between the two proteins. In this case, Query % ID will be 50%, and Target % ID will be 100%.

Q2.1) What is the difference between % Identity and "[Species Name] % Identity" homologs attributes in BioMart?

A2.1) BioMart allows homology information to be exported as attributes, including the % Identity and '[Species Name] % Identity'. The % identity is the percentage of amino acids in the currently selected sequence that are identical in its homologue and '[Species Name] % Identity' is the percentage of amino acids in the homologue that are identical in the currently selected sequence. For example, if the species selected is arabidopsis, and the homologue is in maize, the query sequence is the arabidopsis protein and the target is the maize protein. % Identity in this case is the % of the arabidopsis protein identical to the maize protein, and "maize % Identity" is the % of the Maize protein identical to the arabidopsis protein. These % identities will only be the same if the length (number of amino acids) of the arabidopsis and maize proteins are the same.

Q3) How can I get access to the Ensembl Genomes archive?

A3) Sequences and MySQL databases from previous releases can be downloaded from here.

Q4) When are newly sequenced genomes available in Ensembl Genomes?

A4) Newly sequenced genomes are prioritised for inclusion in Ensembl Genomes according to a variety of factors. We want to hear from new user communities interested in using Ensembl, so please contact the helpdesk if there are particular species or data sets you'd like to see included. Please note that we consider it very important that primary sequence data and assemblies are lodged in the ENA/GenBank/DDBJ nucleotide archive, and we will only display genomes that have been submitted there. Meanwhile, you can see genomes scheduled for inclusion in the next release of Ensembl Genomes on our website.

Q5) Can modENCODE data be added for Drosophila species?

A5) modENCODE information can be uploaded through the BAM plugin, which allows Ensembl users to view the content of BAM files in the context of a reference genome simply by placing the relevant files on a local HTTP server and performing a simple configuration step in Ensembl. Any data represented in BAM files (or other common format files) can be uploaded to the browser in a similar way. Information on uploading data can be found here.

Q6) Do you have viral genomes in Ensembl Genomes?

A6) Currently, we focus on cellular organisms, but the inclusion of viral genomes is not ruled out at some point in future.

Q7) How can I view syntenic regions in Ensembl Genomes?

A7) Syntenic regions are calculated from pairwise (between two species) whole genome alignments. In the 'Location' tab, click on the 'Synteny' link in the left hand menu available under 'Comparative Genomics' to view conserved blocks of sequences. See an example here from Ensembl Plants for syntenic regions between rice and maize. Synteny data can also been viewed in the 'Region overview' panel in the 'Location' tab. Click on 'Region overview' in the left hand menu, click on 'Configure this page' then on 'Synteny' and choose the species. See an example here from Ensembl Plants for syntenic regions between rice and three other species.

Q8) How can I download sequences and their correspondent genomic coordinates using BioMart?

A8) BioMart is provided for Ensembl Plants, Ensembl Bacteria, Ensembl Metazoa, Ensembl Protists and Ensembl Fungi. You can download gene sequences and flanking regions in FASTA from BioMart by choosing 'sequences' under the 'attributes' and selecting one of the following: unspliced (Transcript or Gene), flank (Transcript or Gene), flank-coding region (Transcript or Gene), 5. UTR, 3. UTR, Exon sequences, cDNA sequences, Coding sequence and protein. You can choose to download either upstream or downstream gene flanking sequence. You can expand the 'Header Information' option under 'attributes' and choose the gene coordinates as gene start (bp) and gene end (bp). More details on how to use BioMart can be found here. For non-gene centric downloads, please use the 'Export data' option on the Ensembl Genomes browser (see here), the region report tool (under the link Tools, for example in Ensembl Bacteria: http://bacteria.ensembl.org/tools.html) or the Ensembl Genomes Perl API (for example in Ensembl Fungi: http://fungi.ensembl.org/info/docs/api/index.html).

Q9) How are names assigned to genes in Ensembl Genomes?

A9) Names are taken from external databases such as UniProt, Gramene, European Nucleotide Archive, GeneDB, WormBase, SGD, according to which resource provides the most authoritative and informative name for each gene.
You can search for your gene by name or identifier by typing in the search box. Moreover, you can use a name or an identifier to directly link to the appropriate page the site. For example, you can construct a link such as this:
http://www.ensemblgenomes.org/id/EBMYCT00000036688
which will resolve to the appropriate page in Ensembl Genomes (if the search term matches more than one entity, the link will resolve to a page of search results instead).
Once already on a gene page in Ensembl Genomes, you can also change to another gene by just replacing the gene name directly at the end of the URL. See a few examples here:
http://plants.ensembl.org/Sorghum_bicolor/Location/View?db=core;g=Sb01g042000 (gene Sb01g042000 in Sorghum_bicolor, Ensembl Plants),
http://plants.ensembl.org/Sorghum_bicolor/Location/View?db=core;g=Sb01g041960 (gene Sb01g041960 in Sorghum_bicolor, Ensembl Plants),
http://bacteria.ensembl.org/e_coli_o7_k1/Gene/Summary?g=ftsZ (gene ftsZ in Escherichia coli, Ensembl Bacteria),
http://bacteria.ensembl.org/e_coli_o7_k1/Gene/Summary?g=murG (gene murG in Escherichia coli, Ensembl Bacteria),
http://protists.ensembl.org/Plasmodium_falciparum/Gene/Summary?g=CLAG3.1 (gene CLAG3.1 in Plasmodium_falciparum, Ensembl Protists),
http://protists.ensembl.org/Plasmodium_falciparum/Gene/Summary?g=PFC0125w (gene PFC0125w in Plasmodium_falciparum, Ensembl Protists) etc.

Q10) How can I adjust the width of images in Ensembl Genomes to fit the size of the screen?

A10) The width of images in both 'Location' and 'Gene' tabs can be easily configured. Click the 'Configure this page' button at the bottom of the left hand menu, then click on 'Display options' (at the bottom of the configuration menu) and set 'Width of image' either to 'best fit' or to the number of pixels of your choice.

Q11) What type of data is available under the 'Location' tab of Ensembl Genomes and how can I change it?

A11) In the 'Location' tab, you can browse genes, variations, sequence conservation, and other types of annotation along the genome. The 'Region in detail' is a highly configurable and scalable view, and you can choose what to see by clicking on the 'Configure this page' button at the bottom of the left-hand menu, after which you will be able to select the type of data you want to have included in the display. Data from the following categories can be easily added or removed from this 'Location' tab view: 'Sequence and assembly', 'Genes and transcripts', 'mRNA and protein alignments', 'Other DNA alignments', 'Germline variation', 'Comparative genomics', among others. You can also change the display options such as the width. A further option allows you to reset the configuration back to the default settings. Upload your own data to this view as a BAM file, VCF file or another format.