EvoPrinterHD for bacteria
Introduction

EvoPrinter is a comparative genomics tool for discovering conserved DNA sequences that are shared among three or more orthologous DNAs (Odenwald et al., 2005). Only a single curated DNA sequence is required to initiate the rapid comparative analysis. Generated from multiple pairwise BLAT alignments (Kent, 2002), an EvoPrint presents an ordered, uninterrupted representation of evolutionarily resilient sequences within the user's DNA of interest. EvoPrinterHD is a 2nd-generation comparative tool that automatically superimposes higher-resolution alignments, obtained using an enhancedBLAT (eBLAT) protocol, to give an enhanced view of sequence conservation between evolutionarily distant species (Yavatkar et al., 2008). EvoPrinter is currently available for 17 Staphlococcus, 22 enteric bacteria, and 17 Streptococcus genomes.

The following algorithms were developed to help identify evidence for horizontal gene transfer DNA sequences: (1) an EvoUnique profile highlights unique or uniquely shared sequences among subsets of genomes that are otherwise absent from the other genomes included in the analysis; (2) a repeat finder detects putative mobile genetic element sequences based on the repetitive presence of their sequences within bacterial chromosomes; (3) an EvoDifferences profile portrays, in a single view, those sequences that are detected in all but one of the genomes included in the analysis, and (4) input reference DNA exchange allows for re-initiation of the comparative analysis using the aligning region of another genome, thus facilitating the search for unique differences among the genomes included in the analysis. EvoPrinterHD also includes algorithms that identify sequence rearrangements in the aligning regions of the test genomes.

Described below is a list of steps that should be followed for the EvoPrinter analysis of bacterial DNA:

  • Go to the EvoPrinterHD website and select the bacterial group you want to analyze, Staphylococcus, Streptococcus or enteric bacteria.
  • Go input page and enter sequence name,
  • From the pull down box select your reference genome (if you are using a genome sequence that is not included in the list, select 'other'),
  • Copy-paste the sequence to be analyzed into the input window (you can analyze up to 40,000 bases pairs in one run)
  • Select the "Launch eBLAT" alignments button.
  • At the genome selection page (for enteric bacteria only), select the genomes you want to include in the analysis (for Staphylococcus and Streptococcus, all species are automatically selected).
  • Select the "Launch eBLAT" alignments button
  • Select the "Check input DNA for Repetitive Sequence" button to reveal presence of repetitive sequences in your query sequence.
  • Select the "View Alignment Results" button to generate a score-card for your input reference DNA and the test species results.
  • A guide for reading the score-card is found at the Introduction and Tutorial site. Keep in mind that the "score" values are essential for understanding the degree of sequence identity between the reference and test sequences and cues you in as to which genomes to include in the EvoPrinter analysis.
  • The Score card page contains links to the eBLAT alignments of for each of the sequences used in the alignment and links to the Composite eBLAT generated for each line. Examination of these alignments is useful in interpreting the "score" values and for deciding which alignments to include in the final EvoPrint. Please note, the sequence of the test species can also be obtained from these links.
  • For your analysis, if the second and third alignment scores are significantly lower than the first alignment score, select the highest scoring eBLAT alignment for analysis. If the second and/or the third alignment scores are high (indicating rearranged sequences) select the composite eBLAT alignment option for the analysis.
  • After selecting the test genomes, select the 'Generate EvoPrint, EvoUnique and EvoDifference Readout' at the bottom of the page.
  • In the EvoPrint readout, black capital letters represent bases conserved in all test sequence. Lower case letters represent sequences that are not conserved in at least one the test genomes.
  • In the EvoUnique Profile readout, red-colored uppercase letters represent bases that are present only in the reference sequence, the green-colored uppercase letters represent bases that are shared by only one of the test species and blue-colored letters represent bases are shared with two other species. Lowercase gray-colored letters represent bases that are common to three or more of the test species aligning regions.
  • In the EvoDifferences profile readout, black capital letters represent bases conserved in all species and colored bases represent sequences present in all sequences but one. The species lacking a particular base is revealed through the color coding of the result. Lower case gray letters represent bases absent in more than one test line.
Sample EvoPrint

A region of the E. coli EDL933 genome from bases 376466-386461, which includes the choline transport protein BetT gene, was subject to EvoPrint analysis. The scorecard reveals the following: 1)The only two genomes with complete homology to the ~10,000 base input sequence, as revealed by the first scores in each column, was the test genome E. coli EDL933 and the closely related genome E. coli Sakai. 2) Second and third scores were low, indicating a low level of sequence rearrangement. 4) A second level of homology was indicated by most of the other E. coli species in the analysis. 5) A third level of homology is indicated by the lower score against Shigella flexneri 5str8401. 6) Boxes were checked to include only seven species in the final EvoPrint, since no other genomes contained homologous sequences to the reference sequence.

A sample choline transport protein BetT region EvoPrint readout, that includes an EvoPrint, an EvoUnique print and an EvoDifferences profile, is given for the E. coli EDL933 genomic region.

The EvoPrint reveals, in uppercase black letters, bases that are in the E. coli EDL933 reference sequence that are conserved in the E. coli Sakai, E. coli K12MG 1655, E. coli CFT073, E. coli 536, E. coli UTI89, E. coli APEC 01 and Shigella flexneri 5str8401 orthologous DNAs.

In the EvoUnique print, uppercase red-colored letters represent bases that are present only in the reference species, uppercase green-colored letters represent bases that are shared by only one of the test species and blue-colored uppercase letters represent bases that are shared with two of the test species. Lowercase gray-colored bases are common to three or more of the test species aligning regions.

The Evodifferences Profile - Relaxed EvoPrint contains results that are color coded, revealing bases that are not conserved in one of the test species. The red color coding reveals bases that are uniquely absent from Shigella flexneri 5str8401.

Further BLAST analysis of the the three regions (not shown) revealed by the EvoPrint analysis shows that the upper region, conserved in all species used for the final EvoPrint, encodes a choline transport protein BetT that is conserved in all E. coli species used in the analysis except E. coli K12W 3110. Of all the Shigella species used in the analysis, the sequence is found only in Shigella flexneri 5str8401. The central portion of the EvoPrint, indicated by green capital letters in the EvoUnique print, encodes a putative outer membrane autotransporter (AidA-I adhesin-like protein) that is present only in E. coli EDL933, E. coli SAKAI, and a few other E. coli genomes, as revealed by BLAST, that were not used in the EvoPrint analysis. The blue bases of the EvoUnique print are bases conserved in the AidA gene of E. coli 53638 (divergent with respect to the EDL933/SAKAI sequence). The lower portion of the sequence, revealed as capital letters as being exclusively absent from Shigella flexneri 5str8401, in comparison to its presence in other genomes used in the analysis, encodes a LuxR-family transcriptional regulator/cyclic diguanylate phosphodiesterase (EAL) domain protein that is present only in the E. coli genomes and absent from Shigella clones.

In summary, EvoPrint analysis reveals three regions in the reference sequence distinguished by their presence or absence from the test species.

References

Odenwald WF, Rasband W, Kuzin A and Brody T. (2005). EvoPrinter, a multigenomic comparative tool for rapid identification of functionally important DNA. Proc. Natl. Acad. Sci. 102: 14700-5.

Kent WJ. (2002). BLAT-- the BLAST-like alignment tool. Genome Res. 12: 656-64.

Yavatkar AS, Lin Y, Ross J, Fann Y, Brody T and Odenwald WF. (2008). Rapid detection and curation of conserved DNA via enhanced-BLAT and EvoPrinterHD analysis. BMC Genomics.


Return to
EvoPrinterHD home.

[ National Institutes of Health (NIH) | Contact NINDS ]
[ Home | Disclaimer | Privacy Notice | Accessibility Compliance ]
[ National Institute of Neurological Disorders and Stroke (NINDS) | FirstGov | Department of Health and Human Services ]


H H S Logo - link to U. S. Department of Health and Human Services     N I H logo - link to U. S. National Institutes of Health    N I N D S logo - link to National Institute of Neurological Disorders and Stroke    FirstGov Logo - link To FirstGov