EvoPrinter - Introduction
Introduction for students of molecular evolution

The availability of genomic sequences for over a dozen vertebrate and Drosophila species opens up the possibility of examining in detail the changes that have occurred during the process of molecular evolution. Multiple vertebrate species have evolved over the course of hundreds of millions of years, while the time frame for the divergence of various Drosophila species ranges from a few million to more than 50 million years. EvoPrinter is a comparative genetic tool for assessing evolutionary divergence of protein-encoding genes as well as the non-coding sequences that regulate gene expression.

EvoPrinter is a tool for phylogenetic footprinting -- that is, the use of multi-species comparisons to discover unusually well-conserved regions in a set of orthologous DNA sequences. In Drosophila, the combined mutational histories of five or more species renders near base-pair resolution of conserved transcription factor DNA-binding sites, and essential amino acids are revealed by the nucleotide flexibility of the third base of their encoding codons. EvoPrinter is easy and fun to use, and affords a view of the evolution of any of your favorite genes.

Getting started

To start, first acquaint yourself with one of several genomic resources:

From these sources you will be able to recover the genomic sequence of any gene of interest. For example, if a Drosophila gene is of interest, you can go to FlyBase Genes and do a gene search for the gene 'even skipped'. On the results page that this search yields, select eve to arrive at the 'Synopsis' page for even skipped. On the eve synopsis page, look for the buttons 'get', 'gene region' and 'FastA'. Pressing the 'get' button recovers the genomic sequence for the even skipped gene. The protein encoding sequences are shown in red, and the untranslated leader and terminal sequences (5' and 3' UTR sequences respectively) are in blue. The single intron is annotated in black letters. You can expand this view by using the 'Expand/reduce' button marked '0kb' at the top of the page, and selecting 2 or 10 kb, to look at sequences found more upstream and downstream. If you do not know which gene to pick, go to The 'Interactive Fly', and choose a gene from any of the lists available. There are links from The Interactive Fly to the appropriate FlyBase gene site.

Now that you have recovered the eve genomic sequence you are ready to copy/paste it into the EvoPrinterHD search engine. The steps to generate an EvoPrint are found at the EvoPrinter Instructions and Tutoial pages.

Interpreting your EvoPrint

Unlike most multi-species alignment programs that display multi-species conserved sequences (MCSs) as consecutive columns of invariant nucleotides interspersed by alignment gaps, the EvoPrinter readout displays only the reference-DNA with no alignment gaps, highlighting a species-centric representation of the conserved sequences. To facilitate the comparative analysis of evolutionary changes between test-species, a second algorithm, EvoDifferences profile enables one to identify MCSs that are common to all but one of the test-genomes.

We call the conserved sequences detected by EvoPrinter 'diamonds', referring to the fact that they are evolutionarily hardened and functionally important. When looking at the EvoPrint representation of the protein encoding region, notice the presence of conserved pairs of nucleotides interspersed with single non-conserved bases. These non-conserved nucleotides are the third base of the triplet codons of the nucleic acid code. Francis Crick, one of the discoverers of the structure of DNA, found that the code was degenerate, allowing for variation in the third codon of the genetic code. The diamonds in the 3' UTRs of genes often represent conserved binding sites for microRNAs, that serve to regulate translation or messenger RNA stability. Finally the diamonds in non-coding regions are very likely to be binding sites for transcriptional regulators. For example the base combination 'ATTA' or in reverse 'TAAT' can serve as a binding site for HOX transcription factors, and the base combination 'CANNTG' where n can be any of the four nucleotides, serves as the binding site for bHLH transcription factors. We invite you to discover for yourself how genes evolve, the stability of both coding regions and the non-coding regions involved in gene regulation, and the changes over time wrought by selective forces and genetic drift.

Contact information

Please feel free to contact Drs. Thomas Brody at brodyt@ninds.nih.gov or Ward Odenwald at odenwaldw@ninds.nih.govif you have any questions about EvoPrinter or its applications.

Return to EvoPrinter home.