EvoPrinter - Introduction
EvoPrinter
Introduction to EvoPrinter

EvoPrinter is a simple multigenomic comparative tool that allows one to rapidly identify multi-species conserved DNA sequences (MCSs) in the context of a single species of interest (Odenwald et al., 2005. PNAS 102: 14700-14705). The EvoPrinter algorithm superimposes multiple Blast-Like Alignment Tool (BLAT) readouts (Kent, 2003) of individual reference-DNA/test-species pairwise alignments to generate an evolutionary gene print (EvoP) of invariant DNA sequences as they appear in the reference-DNA. Because EvoPrinter uses only the reference-DNA as the template for the comparisons, no alignment gaps are introduced into the EvoP thus ensuring a species-centric representation of the conserved DNA sequences. In addition, EvoPrinter provides a view of the individual contributions that a species makes to the EvoP by differentiating between those MCSs that are common to all species but not found in a selected species. Due to the significantly greater speed of the BLAT alignment algorithm, relative to other alignment programs (discussed in Kent, 2003), and the fact that only a single curated reference-DNA sequence is required to initiate the analysis of pre-BLAT formatted genomes (currently there are 13 vertebrate and 7 Drosophila species available), EvoPrinter enhances the rate of MCS discovery and thereby facilitates the analysis of both gene function and the molecular evolution of species differentiation.

Procedure

EvoPrinter is a tool for discovering MCSs shared among 3 or more orthologous DNA sequences. EvoPrinter is a JavaScript program that runs on the users computer. Its algorithm creates an ordered array of strings from each BLAT output and then looks for conservation of sequence by looping through the strings one letter at a time, outputting a capital letter for only those reference-DNA nucleotides that are aligned in all test-species. The program requires an up to date web browser and JavaScript has to be enabled. There is no arbitrary limit on sequence capacity. For example, a 50 kb EvoP can be generated by splicing together two 25 kb BLAT outputs. A second algorithm, the EvoDifference (EvoDif) reveals what is different in any one species from the EvoP of all species tested (described below).

The first step in generating an EvoP is the curation of a single FASTA sequence (up to 25 kb per alignment) from either the UCSC genome browser, Ensembl or FlyBase. This sequence (the reference-DNA) is copy-pasted into one or multiple BLAT engine input windows at BLAT (maintained by UCSC Genome Bioinformatics). The BLAT alignment is then performed between the reference-DNA and a selected test species by selecting the 'submit' box. The highest scoring readout result of each alignment is then selected and the sequence labeled 'YourSeq' (showing the reference-DNA) is copy-pasted into one of the EvoPrinter input windows at EvoPrinter without removing numbering or spaces. The test-species identity of the BLAT output is then recorded in the flanking box. This procedure is repeated for as many reference-DNA/test-species as are to be compared. After all of the desired readouts have been loaded into the different EvoPrinter windows, select the 'Generate EvoPrint' button to produce the EvoP. EvoPrinter can also be used to generate a protein-EvoP from BLAT alignments of amino acid sequences.

One important feature of the EvoPrinter program is its ability to generated an EvoP from subsets of the selected BLAT readouts by un-checking the species or groups of species to be excluded. This flexibility is particularly useful when MCSs are lost in one or more BLAT alignments due either to (1) loss of co-linearity due to chromosome rearrangements, large insertions and/or deletions; (2) the overall sequence divergence is great enough such that alignment is not achieved for short homologies, or (3) loss of accurate alignment due to sequence gaps and/or errors generated during the sequencing and genome assembly of one of the test-species.

To identify MCSs that are shared by all but one of the test species, deselect all of the species buttons except for the species in question and then select the 'Highlight Species Differences' button to generate the EvoDif readout. The lowercase red colored nucleotides are those nucleotides that are lost from the final EvoP if that species is included in the comparison. The attached PowerPoint slides illustrate key steps in the generation of an EvoP and EvoDif prints. Color formatting of the EvoP and EvoDif readouts can be maintained by copy-pasting HTML outputs into Microsoft Word.

References

Odenwald, W. F., Rasband, W., Kuzin, A. and Brody, T. (2005). EvoPrinter: A multigenomic comparative tool for rapid identification of functionally important DNA. Proc. Natl. Acad. Sci. 102: 14700-14705

Kent, W. J. (2002). BLAT--the BLAST-like alignment tool. Genome Res. 12: 656-64.

Discover multispecies conserved sequences using EvoPrinter.


Return to EvoPrinter home.