Previous Article | Next Article ![]()
Journal of Bacteriology, January 2004, p. 400-410, Vol. 186, No. 2
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.2.400-410.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Department of Botany, University of Toronto, Toronto, Ontario M5S3B2 Canada
Received 28 May 2003/ Accepted 8 October 2003
| ABSTRACT |
|---|
|
|
|---|
-like phage on the basis of its morphology. Similarity-based analyses identified 27 open reading frames with significant matches to proteins in the NCBI databases. Forty-eight percent of these were similar to Mu-like phage and prophage sequences, including proteins responsible for transposition, transcriptional regulation, virion morphogenesis, and capsid formation. The tail proteins were highly similar to prophage sequences in Escherichia coli and phage Phi12 from Staphylococcus aureus, while proteins at the right end were highly similar to proteins in Xylella fastidiosa. We performed phylogenetic analyses to understand the evolutionary relationships of D3112 with respect to Mu-like versus
-like bacteriophages. Different results were obtained from similarity-based versus phylogenetic analyses in some instances. Overall, our findings reveal a highly mosaic structure and suggest that extensive horizontal exchange of genetic material played an important role in the evolution of D3112. | INTRODUCTION |
|---|
|
|
|---|
One of the best-studied bacteriophages is the temperate, transposable phage Mu (51). This predator of Escherichia coli has the ability to integrate randomly into the host genome, often leading to mutations in the host, and to transduce variable amounts of heterogeneous host DNA (53). These traits, characteristic of the transposable phages, make it an extraordinary resource for genetic research. Unfortunately, very few other transposable phages have been isolated from the gram-negative Enterobacteriaceae.
In contrast, over 60 distinct temperate, transposable bacteriophages have been isolated from Pseudomonas aeruginosa, a gram-negative bacterium belonging to the family Pseudomonadaceae (2). D3112-like phages (30, 31, 45) and B3-like phages (1) represent two major groups of P. aeruginosa transposable phages which have been shown to recombine with low frequency (35). D3112 was found to have a genomic structure and life cycle that closely approximate those of Mu (5, 10, 36, 40, 45). Like Mu, D3112 is also a temperate phage. Upon infection, the phage genome is integrated into the host genome via transposition, and the c protein represses the lytic cycle and maintains stable integration of the genome (45). Surprisingly, D3112 and Mu share no similarity at the DNA level (40), and the virion structure of bacteriophage D3112 is morphologically more similar to the
-like phages (Fig. 1). This has resulted in D3112 being classified as a member of the
-like Siphoviridae family (phages with long, noncontractile tails) by the International Committee on Taxonomy of Viruses. Bacteriophage Mu, on the other hand, is classified in the Mu-like family of Myoviridae (phages with long, contractile tails).
|
The need to further our understanding of temperate, transposable phages has prompted the genomic sequencing of D3112. Here we report the shotgun cloning, genome sequencing, and annotation of the D3112 genome. Comparative studies with other Mu-like phages and prophages were performed. Phylogenetic analyses of D3112 were also carried out to elucidate the evolutionary history of the phage. Our findings will contribute to the establishment of a phylogenetically based taxonomic framework for bacteriophages.
| MATERIALS AND METHODS |
|---|
|
|
|---|
was grown in LB at 37°C, supplemented with ampicillin (50 µg/ml) and kanamycin (50 µg/ml). Preparation of D3112cts lysates. PAS429 was grown overnight without antibiotics. The culture was diluted 1:50 in LB and grown for an additional 3 h or until the optical density at 600 nm reached 0.6. The culture was shifted to 42°C for roughly 2 h or until the cells lysed. MgSO4 was added to a final concentration of 2 mM, CaCl2 was added to a final concentration of 0.2 mM, and chloroform was added to a final concentration of 1% of the lysate volume. Cell debris was removed by centrifugation, and the cleared lysates were stored at 4°C.
Extraction of D3112cts DNA. Phage DNA was prepared as described previously (48). DNase and RNase were added at a final concentration of 100 µg/ml each to 1.5 ml of a high-titer cleared liquid lysate, and this mixture was incubated at 37°C for 30 min. Twenty microliters of 2 M ZnCl2 was added per milliliter of lysate, and the mixture was incubated at 37°C for 5 min and then centrifuged for 1 min. The supernatant was discarded, and the pellet containing the phage was resuspended in 500 µl of TES (0.1 M Tris-HCl, pH 8; 0.1 M EDTA; 0.3% sodium dodecyl sulfate). This was followed by lysis at 65°C for 15 min. The suspension was mixed thoroughly after the addition of 60 µl of 3 M potassium acetate, pH 4.8, and then incubated on ice for 20 min. Proteins were removed by centrifugation, and the supernatant was transferred to a new tube. An equal volume of isopropanol was added, and the mixture was incubated on ice for 5 min. The DNA was recovered by centrifugation, washed with 70% ethanol, air dried, and resuspended in a small volume of TE (10 mM Tris-HCl, pH 8.0; 1 mM EDTA, pH 8.0).
D3112 infection. D3112 infection of bacterial cells was performed using a modification of the method described by Roncero et al. (44). Host bacteria were mixed with phage at a high multiplicity of infection. After addition of soft agar, the cultures were transferred to LB plates supplemented with 1 mM MgSO4. Bacterial phenotypes resistant to infection were visualized as confluent lawns, while sensitive phenotypes were distinguishable as completely lysed plates.
Shotgun cloning of D3112cts.
Partial digestion of bacteriophage DNA was performed using the DNase Shotgun cleavage kit (Novagen, Madison, Wis.). A minimum of 1 µg of purified phage DNA was cleaved with 0.01 U of DNase I. DNase I concentrations were optimized according to the manufacturer's protocol. DNA fragments between 500 and 2,000 bp were size selected from a 1% (wt/vol) TAE agarose gel (21). DNA fragments were purified and concentrated using the QIAquick PCR Purification kit (Qiagen Inc., Valencia, Calif.). Staggered ends of the DNA fragments were repaired and dephosphorylated using T4 DNA polymerase and alkaline phosphatase, respectively (New England Biolabs, Beverly, Mass.). The resulting fragments were ligated into the pCR4Blunt-TOPO vector (Invitrogen Corp., Carlsbad, Calif.) and transformed into E. coli DH5
. Transformants were transferred to plates with selective media containing ampicillin and kanamycin.
PCR amplification and sequencing. Direct colony-PCR amplification of the inserts was performed using modified vector primers M13F (5'-GTAAAACGACGGCCAGT-3') and M13R (5'-CAGGAAACAGCTATGACCATG-3'). The PCR conditions used were denaturation at 94°C for 4 min, followed by 35 cycles of denaturation at 94°C for 30 s, annealing at 55°C for 30 s, and extension at 72°C for 2 min, with a final extension at 72°C for 5 min. Amplified products were sized by gel electrophoresis, and fragments >500 bp were selected for sequencing. The selected PCR mixtures were enzymatically cleaned for sequencing with the addition of a 1/100 reaction volume of alkaline phosphatase (10 U/µl) and exonuclease I (20 U/µl) (New England Biolabs). The reaction mixtures were incubated at 37°C for 30 min, followed by 85°C for 15 min to destroy the enzymes. Sequencing was performed using modified T3 (5'-GCCAAGCTCAGAATTAACCCTCACTAAAGG-3') and T7 (5'-CGACGGCCAGTGAATTGTAATACGACTC-3') primers. Ten-microliter reaction mixtures were prepared with the CEQ Dye Terminator Cycle Sequencing (DTCS) Quick Start kit (Beckman Coulter Canada, Inc., Mississauga, Ontario, Canada) and run on the Beckman CEQ2000XL sequencer (6). DNA sequences were edited using the BioEdit Sequence Alignment Editor software (22). Sequence alignments and the assembly of continuous overlapping sequences into contigs were performed with Sequencher 4.0 (Gene Codes Corp., Ann Arbor, Mich.). Gaps between contigs were bridged by designing primers off the end of each contig and randomly pairing them in PCR amplifications with D3112cts DNA as the template. Amplified products were sequenced using the original amplification primers.
Sequence annotation and analysis. Potential open reading frames (ORFs) were identified using GeneMark.HMM version 2.0 for prokaryotes (http://opal.biology.gatech.edu/GeneMark/heuristic_hmm2.cgi) (9). Potential ORFs were compared against the NCBI protein databases using BLASTP and PSI-BLAST (3, 4). Sequences were also scanned using PROSITE (part of the ExPASy suite) (19) and Pfam (8). Molecular masses were determined using ProtParam (part of the ExPASy suite). The DNA sequence was scanned for putative tRNA genes by tRNAscan-SE (www.genetics.wustl.edu/eddy/tRNAscan-SE/) (33) and FAStRNA (http://bioweb.pasteur.fr/seqanal/interfaces/fastrna.html) (18). Restriction enzyme analysis was determined by Vector NTI (version 6.0; InforMax, Inc., Bethesda, Md.). The moving average G+C content was determined by using the EMBOSS program GEECEE (41) by averaging the G+C content in a 1,000-bp window moved across the genome in 100-bp steps. The sliding window was implemented with the aid of the EMBOSS program SPLITTER (41).
Phylogenetic analysis. ORFs were picked for analysis if their translated sequences had more than two significant matches to proteins in the NCBI databases. The most-similar sequences were downloaded on the basis of two criteria: expect thresholds below 5e-15 and sequence lengths similar to that of the query ORF. Choosing sequences of similar length helped eliminate those that had significant local alignments due to conserved domains but that could not be globally aligned. Alignments were made with ClustalX (version 1.81) using default parameters (52).
The phylogenies were produced using two techniques: Bayesian and neighbor joining. For the Bayesian analysis, a MrBayes Nexis block was generated for each ORF using the MrBayes Block Form (http://darwin.zoology.gla.ac.uk/~rpage/mrbayes), and analyzed with MrBayes 2.01 (25). Each inference consisted of four Markov chains starting from random trees and running for 200,000 generations. One tree was sampled every 100 generations. The initial burn-in trees were discarded. Fifty-percent majority rule consensus trees were generated. The numbers at the interior branches represent the percentage of time a clade appears in the sampled trees. Trees were viewed in PAUP* and Treeview (38, 50). Neighbor-joining trees were generated for each ORF using the Prodist and Neighbor modules of PHYLIP (20). The Jones-Taylor-Thornton model was used to compute the protein distance matrices, including 500 bootstrap replicates. All alignments and trees are available at http://www.botany.utoronto.ca/ResearchLabs/GuttmanLab/index.stm.
Nucleotide sequence accession number. The complete nucleotide sequence of bacteriophage D3112 is available under GenBank accession number AY394005.
| RESULTS |
|---|
|
|
|---|
|
Sequencing of the leftmost and rightmost ends of D3112 revealed heterogeneous P. aeruginosa sequences. This supports the earlier finding of Rehmat and Shapiro and indicates the phage is capable of integrating into different locations (16, 40). After trimming the flanking heterogeneous sequences, we determined that the sequence at the left extremity common to all of our phage clones begins with the bases 5'-TGC-3', followed by the 3' end of the P. aeruginosa outer membrane protein oprM. Except for the addition of the first three bases, our findings were in agreement with those of Autexier et al. (5). The difference is most likely due to our having a different isolate of the phage. The right extremity has been shown to be significantly more variable in size due to imprecise excision of the phage (16). We have found up to 2 kb of heterogeneous DNA attached to the right end of D3112, which was also trimmed from our final sequence.
Sequence comparisons between the two existing GenBank entries for the left end of D3112 and our sequence revealed 10 discrepancies, 1 of which is the extra bases at the 5' end (5, 54). Six discrepancies occurred within ORFs. Three of these could effect the resulting proteins and are discussed below. The remaining three differences are located in noncoding regions at positions 1041, 2342, and 2429. Our sequence was confirmed at these positions with at least 10-fold sequencing coverage from both strands. These discrepancies could be due to errors in earlier sequencing efforts or due to different isolates of the phage.
Genome organization of D3112cts. The genome of bacteriophage D3112 can be divided into three functional regions: the left end or early region, which is responsible for genome integration, modulation of phage gene expression, and modulation of host response; the middle region, which is responsible for control of late gene transcription; and the right end or late region, which is responsible for virion morphogenesis and contains ORFs of largely unknown function. To determine whether any large rearrangements have occurred in our isolate, we generated a restriction map of our genome for comparison to one previously reported by Rehmat and Shapiro (40), using the identical enzymes SalI, KpnI, HpaI, HindIII, and EcoRI (data not shown). Our map was in agreement with the previous map, except for a reversal of the last KpnI and HpaI sites at bases 34025 and 34034, respectively. This region does not appear to be consistent with the invertible tail segment of phage Mu given the lack of similarity to any tail-like sequences and its location in ORF 51 at the right extremity of the genome. The discrepancy could be due to errors in earlier analyses or differences in the right end of our phage isolate. Overall, there do not appear to be any major rearrangements in our isolate.
A transcriptional map of D3112 was recently constructed by Bidnenko et al. (11), suggesting that the genome is comprised of six independent transcriptional units which correspond to a modular organization similar to that of phage Mu. They reported that transcription occurred from left to right in the same order as the genes located on their genetic map, except for the c repressor. As seen in Fig. 2, our findings generally agree with the previous findings, but we identified two ORFs that are transcribed from left to right (ORFs 1 and 23).
Analysis of D3112 gene products. Three ORFs from the left end of D3112 have been previously identified (those encoding the c repressor and A and B transposases) (45, 54). Five additional genes have been genetically characterized (cip1, kil, C, ts47, and c91) (10-12). We identified a total of 53 ORFs, 7 of which begin with a GUG initiation codon, while the remaining ORFs begin with an AUG codon. Structurally and functionally the genome of D3112 is very similar to that of bacteriophage Mu, although there is no detectable homology at the DNA level (36). A complete list of the D3112 ORFs and their putative functions determined by BLASTP analysis is shown in Table 1. Figure 2 shows a schematic of all the ORFs and their degrees of similarity to known proteins as determined by BLASTP analysis.
|
ORF 2 may encode the cip (for "control of interaction of phages") protein, which has been shown to be functionally similar to the Ner protein in phage Mu. As such, cip may serve as a negative regulator of c repressor transcription (11). It is expressed only during the prophage stage.
No DNA or protein homologies were found in our analysis for ORF 3, but it contains an insertion of 5'-GGCCGCGTGGC-3' at position 1709, compared to previously published sequences. This discrepancy causes a frameshift, the functional significance of which is unknown. Our sequence is supported by ninefold sequencing coverage of the region.
The last major sequence discrepancy between our data and the previously submitted sequence is located in ORF 6. Our data report an insertion of 5'-GCC-3' at position 5442. This insertion does not cause a frameshift and is verified by 10-fold sequence coverage from both strands.
The first half of ORF 11 is similar to the host nuclease inhibitor protein Gam in phage Mu. Conservation of this protein among pathogens suggests that this is an important factor for overcoming host defense and establishing infection. The putative D3112 Gam is 267 amino acids (aa) long, which averages 92 aa longer than other homologues. The latter half of ORF11 (aa 178 to 248) does not align with Gam but displays 44% identity to a hypothetical E. coli O157:H7 protein.
ORF 16 displays 63% similarity (40% identity) to hypothetical protein NMA1186 in prophage PNM1 of Neisseria meningitidis Z2491. NMA1186 has reported similarity to Mu protein E16 (http://www.sanger.ac.uk). ORF 16 also has weak similarity to the E16 homologue in N. meningitidis MC58. This suggests that ORF 16 is similar to phage Mu E16 (gemA) and may be responsible for modulation of host response (36).
ORF 22 is a likely candidate for the C gene or locus ts47 on the basis of its location in the D3112 genome (10, 11). The C gene was mapped between 12 to 16 kb of the D3112 genome and was shown to be a positive regulator of viral late gene transcription. It neighbors the ts47 locus, a second positive regulator of late gene transcription, which was mapped between 13.5 and 21 kb of the D3112 physical map. The presence of both gene products is essential for a normal level of late gene transcription (10). ORF 22 has 74% similarity (56% identity) to a hypothetical DNA-binding protein in prophage PNM1 of N. meningitidis Z2491. This putative DNA-binding activity is suggestive of a nonstructural role, possibly as a transcriptional regulator.
ORFs 25 to 33 are part of the late-region genes responsible for head morphogenesis. The gene order of this segment agrees with those of phage Mu, prophage FluMu, and Mu-like prophage PNM1 (36).
ORF 33 has 63% similarity (42% identity) to a hypothetical protein in prophage PNM1 and 61% similarity (40% identity) to the major head subunit gpT in phage Mu. Interestingly, the beginning and end of ORF 33 align with the PNM1 and Mu proteins, but not a 12-aa stretch in the middle spanning aa 168 to 179. The significance of this short region, which has no matches in the databases, remains to be determined.
ORFs 42 and 43 belong to the late-region genes responsible for tail morphogenesis and show similarity to lambdoid phages as opposed to Mu-like phages. ORF 42 has 42% similarity (23% identity) to a putative tail component of prophage CP-933K in E. coli O157:H7 EDL933. It is also highly similar to a tail fiber protein of phage Phi12 in Staphylococcus aureus. ORF 43 has 35% similarity (21% identity) to a putative tail component of prophage CP-933O of E. coli O157:H7 EDL933.
The ORFs at the right end of the genome have unknown functions, with the majority displaying very high amino acid similarity to hypothetical proteins in Xylella fastidiosa.
Infectivity and host range. D3112 has been shown to infect many different strains of P. aeruginosa (44). Because the D3112 plaques are small and difficult to see, host bacteria were mixed with phage at a high multiplicity of infection to ensure complete lysis in the plate assay. Using this approach, a resistant phenotype (confluent lawn) is clearly distinguishable from a sensitive phenotype (complete lysis).
Our plate assays showed that D3112 was able to infect P. aeruginosa wild-type strains PA14 and PAK, but not our PAO1 isolate. This is could be due to our isolate already harboring a related prophage belonging to either the D3112 or B3 groups, which would render this host resistant to superinfection by D3112. The isolate could also express different surface receptors not recognized by the phage.
Phylogenetic and similarity analyses. Separate phylogenetic analyses were performed for each ORF which had at least three significant BLAST hits. The ORFs analyzed were ORFs 1, 4, 11, 16, 17, 19, 22, 23, 24, 25, 26, 27, 31, 33, 36, 42, and 43. Unrooted trees were built using the Bayesian and neighbor-joining methods. With the exceptions of ORFs 19 and 42, the placements of D3112 in the trees generated by both methods were identical, supported by posterior probabilities of at least 94% for Bayesian analyses and bootstrap values of at least 70% for neighbor-joining analyses (summarized in Fig. 2). The results for ORF 19 were inconsistent. The Bayesian analysis placed D3112 ORF 19 in a strongly supported monophyletic group with Salmonella enterica and Pseudomonas putida (posterior probability of 99%). The neighbor-joining analysis placed D3112 in a poorly supported clade with P. putida (bootstrap support of 58%). If the weak node is collapsed, the results agree with the Bayesian tree. A similar situation applies to ORF 42. The neighbor-joining tree placed ORF 42 in a very poorly supported clade with Pseudomonas species (bootstrap of 41%). If the unsupported node is collapsed, the tree becomes identical to the Bayesian tree.
The degree of amino acid similarity between the D3112 ORFs and bacterial proteins with significant similarity to at least one ORF is shown in Fig. 2. The shading of the boxes indicates the degree of similarity based on BLASTP expect values also reported in Table 1. Phylogenetic analysis was used to resolve these relationships further. In the majority of cases, the phylogenetic analysis agreed with the BLASTP similarity search. The majority of D3112 ORFs comprising the early, middle, and late head regions (ORFs 1, 4, 16, 17, 22, 24, 26, and 33) were most closely related to proteins in N. meningitidis. D3112 ORF 4, which encodes A transposase, was also equally similar to proteins in E. coli O157:H7 and Shigella flexneri.
Interspersed throughout the genome are ORFs in clades with host species other than N. meningitidis. For example, ORFs 25, 27, 31, and 33 appear to be part of a genomic segment that is more closely related to S. oneidensis, a bacterium belonging to the Alteromonadaceae. The late tail segment, represented by ORFs 42 and 43, represent a diverse module compared to the first part of the genome. Both proteins display no similarity to phage Mu or N. meningitidis. Of particular interest is the clustering of D3112 ORF 43 with the plant pathogen Pseudomonas syringae. Since phage tails are believed to be involved in host recognition and attachment, one can speculate that this is a host-specific adaptation for a group of transposable phage capable of infecting a wide range of Pseudomonas species.
| DISCUSSION |
|---|
|
|
|---|
(11). The majority of ORFs in the early, middle, and late head regions possess amino acid similarity to, and reflect the organization of, Mu-like prophages from N. meningitidis Z2491. These same conclusions are also reflected in our phylogenetic studies. Since these regions comprise the majority of the D3112 backbone, we hypothesize that they are the most-ancestral part of the genome, suggesting that D3112 may have evolved from a phage which originally infected N. meningitidis. The level of nucleotide sequence divergence was so high that reliable sequence alignments could only be made at the protein level. This divergence may have been due to host adaptation, an idea supported by the similar G+C contents of D3112 and its host P. aeruginosa. An exception is ORF 1, which has an unusually low G+C content of 56%, suggesting a more recent horizontal origin. BLASTP analysis shows that ORF 1 is most similar to a homologous protein in P. aeruginosa phage D3 (29), but our phylogenetic analysis of ORF 1 and its G+C content strongly support a more recent common ancestry with N. meningitidis (G+C content, 51.8%) or S. oneidensis (G+C content, 46.0%), as shown in Fig. 2 and 3A.
|
The proteins at the right end of the D3112 genome are strikingly similar to hypothetical proteins found in X. fastidiosa, suggesting the recent acquisition of this region from X. fastidiosa. Pieces of host DNA are more easily copackaged with the right end of the phage, accounting for the presence of up to 2 kb of heterogeneous host DNA present in our sequences. Sometimes the proteins encoded in the packaged host DNA can provide adaptive functions for the phage that are not necessary for its life cycle, such as conferring species specificity, enhancing virulence, affecting phage gene expression, or modulating host responses (23, 27, 36, 49, 55). In this case the relevant sequences can become a permanent part of the phage genome, as hypothesized for the X. fastidiosa-like sequences at the right end of the D3112 genome.
Interspersed around the D3112 genome are stretches of sequence that are similar to those from disparate phages, further illuminating the mosaic nature of the phage. The vast majority of these segments appear to have been acquired via horizontal gene transfer between D3112 and other phages with bacterial hosts belonging to the phylum Proteobacteria. These observations suggest that the rate or likelihood of transmission of genetic material via horizontal transfer may be correlated with the relatedness of the host species. In the case of homologous recombination, limits to horizontal gene transfer due to genetic distance have been observed in bacteria (42). Restriction of horizontal gene transfer by their host's species boundaries has also been observed in dairy phage (13).
A more thorough sampling of phage sequences will be needed before we understand the global diversity and evolution of transposable phage. One aspect of phage genomics that has received considerable attention is the development of criteria for taxonomic classification (32, 43). Recent attempts at viral taxonomy have used a range of methods, including the conservation of morphological structures and the comparison of whole viral genome structure, organization, and similarity (7, 17, 43). There are an ever-growing number of examples of incongruence between the morphology-based taxonomic classification systems and sequence-based systems. Sequence similarity-based taxonomy, the basis of phage comparative genomics, has been seen as a powerful and promising alternative (13, 14, 17, 39). Yet, our analyses clearly demonstrate that the results of local similarity-based analyses (e.g., BLASTP searches) and evolutionary (phylogenetic) analyses do not necessarily agree. Comparisons of these two methods for our 17 analyzed D3112 ORFs found that only 65% of the ORFs showed complete agreement between the similarity-based and phylogenetic approaches or at least placed the ORF of interest in a clade with multiple taxa, one of which also displayed the highest similarity. The remaining 35% of the ORFs (ORFs 1, 11, 26, 31, 36, and 43) showed significant inconsistencies between the most similar sequence identified by BLASTP and the closely related sequences identified by phylogenetic analyses. Figure 3 presents two representative examples of this similarity and phylogenetic inconsistency.
Why do the conclusions from similarity-based analyses differ from those drawn from phylogenetic analyses? We do not believe that the inconsistencies can be attributed to failings of the phylogenetic methods. We took the conservative analytical approach of trimming regions of questionable homology out of the multiple sequence alignments and performing the phylogenetic analyses by two independent methods (neighbor-joining and Bayesian analyses). The differences are most likely due to the inherent limitations of the BLAST algorithm. BLAST is a pairwise local alignment algorithm which excels at rapidly identifying sequences of local similarity from a sea of unrelated sequence. This is clearly an extremely powerful method for identifying matches to a query sequence in a large database, but the obvious drawback of this algorithm from the perspective of phage taxonomy is it preferentially identifies relatively short regions of high similarity, such as that seen between conserved domains, over regions of global similarity. Since BLAST uses pairwise comparisons, it is also unable to identify clusters of related sequences, such as multiple isolates from the same species. This may be a problem if one member of this cluster of related sequences is substantially more related to the query sequence than the average of the cluster (Fig. 4). Lastly, similarity analyses provide no basis for distinguishing homology from homoplasy (identity due to parallel or convergent evolution). Consequently, evolutionary inferences drawn from similarity-based approaches must be evaluated carefully.
|
The current trend towards using sequence data to resolve viral taxonomical issues has raised the question of identifying the most-appropriate level for analysis. There are four levels of resolution that can be used for the analysis of genomes. The first is the comparison of whole phage genomes, such as in the study of lambdoid and dairy phages (14, 17, 24, 26). The second entails the use of smaller segments of phage genomes, such as the structural segment containing the head and tail genes (17). The third is the level of an individual gene or ORF. The fourth is at the level of conserved motifs within genes. Lawrence et al. (32) discuss the problems associated with mosaicism of viral genomes and how these invalidate approaches based on strictly pairwise local alignments for the first two levels of comparison. One exception to this conclusion is the work with the dairy phages, because they constitute an unusually homogeneous group of phages that display significant similarity at the nucleotide level (13). Similarity-based analyses at the level of a gene or smaller are biased by the tendency to align conserved motifs. Based on our results with D3112, we propose that phylogenetic analyses should be performed at the level of individual genes, since these represent the functional units of these highly mosaic systems. Phylogenetic approaches are clearly superior to approaches based on similarity and local alignments. They are less prone to be biased by conserved motifs and use more of the evolutionary information of a sequence, thereby permitting more power to disentangle homology from homoplasy.
There is ongoing debate as to whether there exists a single gene which can be used to build viral phylogenies (32, 39). We argue that the evolutionary history of a phage should be reflected in the distinct phylogenies of all of its individual genes. Thus, building a multigenic phylogenetic framework for each phage moves us away from the failings of phenetic approaches (morphological, structural, or sequence similarity) towards a cladistic basis for viral taxonomy. Lawrence et al. (32) have addressed related issues and proposed similar guidelines for a viral taxonomy. These gene-by-gene phylogenetic methods more realistically reflect the relationships between phages by taking into account the mosaic nature of these important organisms.
| ACKNOWLEDGMENTS |
|---|
We greatly appreciate the generous support of Beckman Coulter, Inc. This work was supported by grants from the National Science and Engineering Research Council of Canada, and the Canadian Foundation for Innovation.
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Appl. Environ. Microbiol. | Infect. Immun. | Eukaryot. Cell |
|---|---|---|
| Mol. Cell. Biol. | J. Virol. | Microbiol. Mol. Biol. Rev. |
| ALL ASM JOURNALS |