Previous Article | Next Article ![]()
Journal of Bacteriology, November 2007, p. 8206-8214, Vol. 189, No. 22
0021-9193/07/$08.00+0 doi:10.1128/JB.00838-07
Copyright © 2007, American Society for Microbiology. All Rights Reserved.
,
Nestlé Research Center, Nestec Ltd., P.O. Box 44, CH-1000 Lausanne 26, Switzerland
Received 30 May 2007/ Accepted 4 August 2007
|
|
|---|
|
|
|---|
There is also a medical interest in gaining a better knowledge of E. coli phages closely related to T4. E. coli is a versatile pathogen causing urinary and gastrointestinal infections. The diarrhea burden is especially large for children from developing countries (2, 4). Diarrhea represents the second most frequent cause of morbidity and mortality (32), and E. coli is responsible for one-third of cases (2). The use of oral rehydration solution has substantially reduced mortality (3), but its application does not treat or prevent infections. Vaccines against E. coli diarrhea are not yet available (29, 31), and antibiotics are of limited use (14, 23). Considering these issues, it is not surprising that the old idea of phage therapy was taken up against E. coli diarrhea (5). However, the reference T4 phage has only a narrow host range on pathogenic E. coli strains. Therefore, we had to isolate phages with a broader host range from sewage and stool samples of children hospitalized with diarrhea (10). Field studies in Bangladesh identified JS98-like phages as frequent isolates. Partial sequencing of its genome revealed JS98 to be a distant relative of T4, suggesting a hitherto uncharacterized new branch of T-even phages (9). Closely related phages have also been isolated from other geographical areas (16a). To achieve optimal coverage of pathogenic E. coli strains, T4-, RB69-, JS98-, and RB49-like phages are part of our phage cocktail. Since all of these phage groups except for JS98 are represented by at least one complete genome sequence, we targeted JS98 for sequencing. Due to the presence of many host-lethal genes, we could not obtain the complete phage genome sequence of JS98 with techniques based on phage DNA cloning (9). Here we report that the newer pyrosequencing technology yielded the complete phage JS98 sequence at a fraction of the time and cost of previously used sequencing approaches. In analyzing this sequence, we asked the following two questions. Does phage JS98 contain genes that represent a potential safety concern for oral application in humans, and what does comparative genomics of JS98 tell us about genetic diversification and evolution of T-even phages?
|
|
|---|
Sequencing. T4-like phage strain JS98 was isolated from the stool of a pediatric diarrhea patient in Bangladesh (10) and grown on an E. coli K-12 strain. Phage DNA was purified and sent to 454 Life Sciences (Branford, CT) for commercial sequencing. There the phage DNA was amplified by an emulsion-based method, sequenced by synthesis using a pyrosequencing protocol (22), and assembled de novo into a single contig.
Bioinformatics analysis and annotation of genome sequence data.
Genome sequence comparisons were carried out using the MUMmer package (version 3.18) (19; http://www.tigr.org/software/mummer) and the EMBOSS (The European Molecular Biology Open Software Suite) software STRETCHER, with the parameters set at default values (26). We used Artemis software as a sequence viewer and annotation tool (30). Open reading frames (ORFs) were predicted using Glimmer software (version 3.02) (12) and based on nucleotide and amino acid sequence alignment searches (BlastN, TBlastN, BlastX, and BlastP), using the T4-like genome database available through the Tulane website (http://phage.bioc.tulane.edu) and the nonredundant database from NCBI. The basic prerequisites for an ORF were the presence of one of the three potential start codons, i.e., ATG, TTG, or GTG, and a length of at least 25 encoded amino acids. A search for tRNA genes was done with the tRNAscan-SE program (version 1.23) (20). Homology assignments between genes from other T4-like phages and predicted ORFs of phage JS98 were based on amino acid sequence alignment searches (BlastP) and were accepted only if the statistical significance of the sequence similarities (E value) was
0.001, the bit score was
50, and the percent identity between the aligned sequences was
30%. Functional annotations were based on the homology assignments with the T4 genes, and functional classifications were performed using the COG (clusters of orthologous groups of proteins) database (34).
Safety evaluation.
We constructed a database of undesirable genes (DUG) by searching public databases available at NCBI for a list of antibiotics and virulence terms. BlastP and BlastN searches were performed against the DUG and against the 15 E. coli genomic sequences currently available at NCBI. Only hits with E values of
0.01 (BlastP and BlastN) and bit scores of
50 (BlastP) were considered to constitute significant matches. Searches for specific protein domains and conserved motifs with known function were performed using the Conserved Domain Architecture Retrieval Tool (CDART) available at NCBI and the InterPro (http://www.ebi.ac.uk/interpro/) databases. The 266 predicted protein sequences of JS98 were screened for similarities to currently known protein food allergens by comparing them to the amino acid sequences present in the Food Allergy Research and Resource Program allergen database available at http://www.allergenonline.com. Only hits with E values of
0.01 and bit scores of
50 were considered to constitute significant matches.
Nucleotide sequence accession number. The sequence data were deposited at GenBank under accession number EF469154, which replaces accession numbers AY746495, AY746496, and AY746497.
|
|
|---|
![]() View larger version (14K): [in a new window] |
FIG. 1. g23 tree analysis. Major head gene g23 tree analysis was performed with the sequenced PCR products from our field isolates of T4-like phages. The tree is based on nucleotide sequence alignments (corresponding to codons 346 to 1118 in T4). Origins (S, sewage water; and F, fecal content), places (B, Bangladesh; and S, Switzerland), and years of isolation are indicated for the phages from our survey. The numbers at the nodes give the bootstrap probabilities, and the scale above gives percentages of base pair sequence identity. The tree was rooted with the Vibrio T4 phage KVP40 genome sequence. Their codes are indicated at the level of the twigs of the tree.
|
JS98 comparison with RB69 phage. Alignment of the JS98 genome with T4-like phages in the Tulane database showed that JS98 was most closely related to phage RB69. In both DNA sequence and protein sequence dot plots, we observed a frequently interrupted but straight diagonal line between both phages (see Fig. S2 and S3 in the supplemental material). Overall, the two genomes are colinear but are frequently interrupted by replacements with unrelated genome segments of comparable lengths.
Figure 2 shows an alignment of the JS98 and RB69 genome maps. The protein sequence relatedness between them is color-coded. The right genome halves are closely related. They cover two rightward-oriented structural gene clusters and, between them, a cluster of leftward-oriented nonstructural genes, mainly encoding proteins involved in nucleotide metabolism. The degree of sequence identity varied substantially: it was highest for the head and nucleotide metabolism genes and lowest for three groups of structural genes. The degree of sequence conservation thus did not follow the structural/nonstructural gene division.
![]() View larger version (29K): [in a new window] |
FIG. 2. Alignment of genome maps of phages JS98, T4, and RB69. The ORFs are indicated by arrows as follows: white arrows indicate ORFs which share amino acid sequence identity over >70% of the sequence length, and gray arrows indicate ORFs that are specific to the indicated phage genome in this three-phage comparison. ORFs sharing amino acid sequence identity are linked by color shading; the color scale at the bottom of the figure indicates the percentage of amino acid sequence identity between the compared predicted proteins. The JS98 map is shown at the top and bottom to allow comparisons with both T4 and RB69.
|
The left genome halves cover exclusively leftward-oriented nonstructural genes, including a large DNA replication module. They were substantially less well aligned than the right genome halves. Large regions of nonaligned genome segments showed mainly gene replacements. Only a few gene inversions affecting a single gene were observed.
JS98 comparison with T4. After RB69, phage T4 showed the next best dot plot alignment with JS98 (see Fig. S2 and S3 in the supplemental material) from the entries in the NCBI database. Figure 2 shows a comparison of the two genome maps. Overall, rather similar observations were made to those described above in the JS98-RB69 comparison. The major new observation was the lack of high sequence identity between the distal tail fiber genes of T4 and JS98. Furthermore, regions of gene replacement detected in the left genome halves were no longer of comparable lengths; in two cases, genes on the JS98 map lacked a complement in T4. One case of such a complex gene replacement is illustrated in Fig. 3. Over this region, JS98 shared greater similarity with RB69 than with T4. T4 and RB69 shared genes lacking in JS98, and JS98 contained genes not found in either RB69 or T4. One could explain the observed gene constellation by a combination of insertion events (e.g., ORFs 124 to 126 in JS98), gene replacements (e.g., JS98 ORFs 130 to 131 versus RB69 e.3 to e.5), and DNA rearrangements (JS98 ORF 121 versus RB69 ORF 102).
![]() View larger version (26K): [in a new window] |
FIG. 3. Alignment and comparison of a 10-kb region showing a high degree of variability between the phages T4 (top), JS98 (middle), and RB69 (bottom). Genes are colored according to their T4 functional assignments (25). Gray indicates genes unique to JS98. Gene 126 is shown hatched, as it shows 40.5% homology to ORF 063 of phage 44RR. The T4 ORFs are annotated with their conventional gene names, JS98 ORFs are numbered or named after the corresponding T4 homologues, and RB69 genes are quoted with the annotations given to them in the GenBank entry (accession number NC_004928). Amino acid sequence identities between genes were determined using STRETCHER and are indicated by connections of red to yellow shading, according to the color key provided at the bottom right. The black ovals indicate homology between T4 and RB69. The top line provides a base pair scale and the positions of the first and last depicted JS98 genes.
|
RB69 comparison with T4. According to the g23 tree analysis, RB69 and T4 belong to separate branches of T4 coliphages, but these branches are more closely related to each other than any of them is to JS98 (Fig. 1). Comparative genomics confirmed this relationship: phages T4 and RB69 showed a substantially larger number of genes sharing >80% sequence identity than did the JS98-T4 or JS98-RB69 alignment (Fig. 2). The sequence identity between T4 and RB69 was especially marked over the right genome halves. The left genome halves varied more substantially, as gene replacements and insertions/deletions were also frequently observed between RB69 and T4.
JS98 genome map. (i) Plus strand. We next investigated the JS98 genome in greater detail. Annotation of the JS98 genome revealed 266 ORFs. Notably, 198 of the 266 JS98 ORFs (74% of the total) shared significant amino acid sequence identity with T4 proteins. Based on their similarity with biologically defined T4 proteins, 114 ORFs of JS98 (43% of the total) could be given a functional annotation (Fig. 4).
![]() View larger version (35K): [in a new window] |
FIG. 4. Annotated genome map of bacteriophage JS98. Following convention, the map starts at the top left with the rIIA gene and ends at the bottom left with the rIIB gene. The genome was divided into 15-kb segments that are to be read from left to right and from top to bottom. The individual ORFs are depicted as arrows, with the orientation of the arrows indicating whether the genes are carried on the Watson or Crick strand. The color of the arrow identifies the functional category into which the homologous T4 gene was classified (25). The color code for gene function and the COG letter code are provided in the bottom center frame of the figure. The 266 predicted JS98 ORFs plus three tRNA genes are annotated above the corresponding arrows with the names of the homologous T4 genes (letter code or horizontal number code). If the JS98 ORF lacks a T4 gene complement, we attributed a number to the gene, starting from rIIA. To distinguish these JS98 ORFs from T4 genes which also carry numbers as gene identifiers (horizontal numbers), we annotated the JS98 genes lacking a T4 homologue with vertical numbers. The first such JS98 ORF without a T4 complement is found in the second line, following the T4 homologue uvsX, and is annotated as JS98 ORF 38. Three tRNA genes are indicated with dark olive arrows, located between JS98 ORFs 136 and 137. Below the arrows are boxes whose colors indicate how many phages from the Tulane database contain a protein which shares protein sequence identity with the JS98 gene. The key to the color code for the prevalence of the JS98 genes is provided in the bottom right frame.
|
(ii) Minus strand. The longest cluster of leftward-transcribed genes (i.e., genes carried on the minus strand) extends from a T4 g4 homologue to asiA homologues covering the first half of the genome (and a smaller part of the opposite end, a consequence of the artificial cutting of the genome between rIIA and -B). Database homologies in this region defined a large cluster of DNA replication genes. In addition, several nucleotide metabolism, translation, and transcription regulation genes were identified (see the color code for the ORFs in Fig. 4). A second large cluster of leftward-transcribed genes extends over nearly 30 kb, from a T4 homologue for the RNase H gene (rnh) to the T4 adenosyl-ribosyltransferase homologue alt gene, located between the base plate hub and the tail fiber gene clusters. This region contains several nucleotide metabolism, DNA replication, and transcription genes.
Comparison with proteins in the Tulane T4 phage database. We searched the Tulane protein database of T4-like phage genomes with the JS98 sequence, using TBlastN searches, and determined the number of T4-like phages with a homologous ORF for each JS98 ORF (Fig. 4). Significantly, 45 of the 68 ORFs from JS98 that lacked a T4 match shared protein sequence identity with another T4-like phage (see Table S1 in the supplemental material). Figure 4 depicts the degree of conservation for each JS98 ORF, expressed in a color code. Dark green indicates highly conserved genes with matches in more than 10 other T4-like phages. With few exceptions (polynucleotide kinase pseT, RNA ligase rnlA, and rIIB genes), the highly shared core genes come from the DNA replication and virion structural modules. The major structural genes are located in a tight cluster of highly conserved genes. In contrast, the DNA replication genes are less well conserved and are often separated by nonconserved phage genes. Notably, all JS98 genes located on the plus strand shared sequence identity with the predicted proteins from many T4-like phages.
The genes on the minus strand showed, on average, much less representation in the Tulane T4 phage database (http://phage.bioc.tulane.edu/) than the genes on the plus strand. For example, all genes lacking either NCBI or Tulane database matches (26 ORFs) are located on the minus strand (depicted with white boxes under white arrows in Fig. 4). These JS98-specific genes are not clustered on the genome: they occur either as single genes or as two adjacent genes (Fig. 2).
Hypothetical and undesired proteins. Table S1 in the supplemental material provides a list of the 68 ORFs from JS98 that lacked any sequence homology with T4. Some clustering of these non-T4-related genes was observed on the JS98 genome. The hypothetical and no-hit proteins from JS98, which lacked matches with the Tulane database, were screened using InterproScan. A PROSITE motif (TonB-dependent receptor protein signature 1) was found for ORF253, and a PFAM domain (anticodon nuclease activator protein) was found for ORF257. Nine of the hypothetical proteins contained a signal peptide motif, and among these, six also showed one or two transmembrane domains. This small number of additional links demonstrates the seclusion of T4 phages from the entries in the database. We also screened all JS98 genes against our DUG and obtained no matches. Likewise, matches with protein food allergens in the FARRP Food Allergen Database (http://www.allergenonline.com) were not found.
Links to E. coli genes.
Next, we screened for possible horizontal gene transfer by comparing the JS98 genome to all available E. coli genome sequences. Eighteen predicted JS98 proteins shared sequence identity with E. coli proteins (Table 1). The first 12 proteins on this list belong to the category of DNA replication and DNA transaction genes. Notable are the NrdA and NrdB proteins, which are the
and ß subunits of ribonucleotide reductase, and the thymidylate synthetase Td, which shared 54 and 47% amino acid identity, respectively, with their E. coli homologues. In phage T4, these adjacent genes are interrupted or flanked by mobile DNA elements (introns and intron-homing endonucleases) (17). An amino acid identity of 51% was also found for the cellular and viral thymidine kinases. Interestingly, in T4 the tk gene is also followed by mobile DNA elements. However, no DNA sequence identity was detected between these viral and E. coli proteins, arguing against a recent horizontal gene transfer event as an explanation.
|
View this table: [in a new window] |
TABLE 1. Phage JS98 ORFs showing homology to E. coli ORFsa
|
|
|
|---|
A crucial aspect of any safety analysis of phage genomes concerns the degree of genetic exchange with their bacterial hosts. A total of 18 JS98 ORFs had sequence relatedness with E. coli proteins (Table 1). All showed, at the same time, sequence matches to T4-like phage proteins. The highest degree of sequence identity with E. coli proteins was 51 to 54% amino acid identity (tk and nrdA/B). In fact, orthologous genes for these functions are widely distributed, and a previous phylogenetic tree analysis suggested that, for example, the T4 thymidylate synthetase branched off before the split between the eukaryotic and bacterial orthologues (25). Obviously, such genes do not constitute evidence of horizontal gene transfer. The genetic isolation of JS98 from its E. coli host is further underlined by the drastically lower GC content of the phage DNA. Only two regions in the JS98 genome (and those of many other T4-like phages) showed a significantly higher GC content than the average. One region covers the major phage capsid gene and is therefore an unlikely candidate for lateral gene transfer from the bacterial host. Since Gp23 is one of the most abundantly expressed T4 proteins, the higher GC content may represent an adaptation to the codon usage of E. coli to optimize gene expression. The second locus is the distal part of the tail fiber gene cluster. This region is known to undergo gene shuffling for host range changes (35). As long as no known virulence genes are carried with the phage into a new species, this observation does not represent a safety concern, either.
Furthermore, we did not observe genes in JS98 whose best hits were with phages outside the T4 group. This observation suggests that gene exchange of T4 phages with other coliphages is rare. This lack of genetic exchange with both the bacterial host genome and other phage genomes may be explained partially by the rapid and complete degradation of the bacterial genome at an early infection stage (37). T4 phages may simply lack a utilizable source of foreign genes for gene transfers to occur to a reasonable extent.
JS98 and related phages were tested in mice, and no adverse events were observed (A. Bruttin et al., unpublished data). On the basis of the genome analysis and these animal safety tests, JS98-like phages were then introduced into our phage cocktail for further safety evaluations.
What can be learned from the JS98 sequence analysis with respect to the evolution of the T4 genome? Based on the strikingly different GC content from that of its host, T4 has not evolved in E. coli. Recent structural analysis of the T4 capsid protein Gp24 (and, to a lesser extent, Gp23) revealed close conformational similarity with the major head protein from the lambdoid coliphage HK97 (16). This observation suggests a common ancestor for T4- and lambda-like tailed phages with respect to the assembly and structure of double-stranded DNA phage heads. However, the lack of any sequence similarity between these structurally related phage proteins suggests an ancestor deep in the evolutionary past, long before the evolution of Enterobacteriaceae. Some glimpses into the distant past of T4-like phages can be derived from comparisons of T4 with distantly related phages from cyanobacteria (15, 21, 33).
Information on genetic mechanisms, which are the motor for short-term T4 evolution, is better derived from comparisons of more closely related T4 phages. A detailed analysis of the pseudo-T-even coliphage RB49 with the reference T4 phage was published recently (11). The authors distinguished four segments of a conserved core genome, with two carrying virion genes and two carrying DNA replication genes. Replication and virion gene clusters were separated by hyperplastic regions containing mostly novel genes of unknown function and origin. Moving to more distant relatives of T4, such as the Aeromonas phage Aeh1, a similar pattern of conservation was observed. Differences were a lower degree of DNA sequence identity between Aeh1 and T4 than that between RB49 and T4, the splitting of the DNA replication genes over three gene clusters, and the larger sizes of the hyperplastic regions in Aeh1, which also contains a substantially larger genome (233 kb) than T4 (169 kb). The authors noted a gradient of decreasing DNA sequence identity in comparing T4 with RB69 (another T-even phage) (27), RB49 (a pseudo-T-even E. coli phage), 44RR2.8t (a pseudo-T-even Aeromonas phage), and Aeh1 (a schizo-T-even Aeromonas phage). This relationship became even closer when comparing T4-like phages belonging to the same branch (e.g., the pseudo-T-even E. coli phages RB49 and
1; the Aeromonas phages 44RR2.8t, 25, and 31; and the schizo-T-even Aeromonas phages Aeh1 and 65). However, none of these analyses have been published in greater detail. The focus of our analysis is comparisons between phages belonging to different branches of the T-even group of E. coli phages.
Some genetic exchanges between T4-like phages were observed and can be selected in the laboratory (1), but they do not resemble lambda-type modular exchanges. An instructive case is provided by the tail fiber genes. T4 and RB69, despite being closely related in the structural genes, differ in their distal tail fibers, a common mechanism of host range extensions (35). An exchange point apparently exists between proximal and distal tail fibers. In contrast, JS98 and RB69 share related distal tail fiber genes, while the proximal tail fiber genes differ substantially in sequence. Interestingly, this difference comes in parallel with differences in base plate wedge and base plate hub genes that likely interact with the proximal tail fiber genes. T4 phages can thus make coordinated exchanges over three separate genome regions when the proteins are adjacent in the virion structure and therefore have to interact directly. Despite this flexibility, T4 phages display clear constraints in the choice of genes fulfilling similar functions. While genes lacking sequence relatedness can serve the same structural functions in lambdoid phages, T-even phages have to use genes derived from the same sequence family. Far fewer constraints exist for the left genome halves of T-even phages. A patchwork of conservation and diversity was revealed by the alignment of the phage genomes. Only a few genes shared high sequence identity across all T-even phages (e.g., the g39 topoisomerase gene), most showed a variable degree of sequence conservation, and a few regions lacked any sequence relatedness. The alignment identified four such regions; two regions showed a group of genes which were unrelated in T4, RB69, and JS98, while two other regions showed distinct genes in two phages and no genes in the third phage. One of the hypervariable regions carries an intron endonuclease gene (segA) and two DNA modification genes; the other carries a tRNA gene cluster, again accompanied by an intron endonuclease gene (segB) (17, 25). Upstream of the soluble lysozyme gene e, T4 lacks genes where both JS98 and RB69 show a large cluster of unrelated genes. Where T4 carries the RNase reductase subunit gene nrdD, flanked on both sides by intron endonuclease genes, RB69 showed nrdD genes without endonuclease genes and JS98 showed no genes. Genetic hypervariability within T-even phages thus seems to be associated with mobile DNA. Since T4 has been invaded heavily by intron endonucleases, further phage comparisons are needed to assess whether this diagnosis can be generalized.
The T4 phage genome shows a peculiar strand-specific distribution of conserved and variable genes. The Watson strand carries the structural genes nearly exclusively and is highly conserved. The Crick strand carries the nonstructural genes. Over the left genome half, the Crick strand does not show a marked clustering of conserved gene function in adjacent genes. Notably, all JS98-specific genes are found exclusively on the Crick strand. The dispersed location of the JS98-specific genes within conserved phage gene functions (e.g., DNA replication) suggests that the strain-specific genes were inserted between the conserved nonstructural genes. In their majority, the inserted genes did not arrive as functional clusters but as individual genes or, at most, two adjacent genes. Novelty in T4-like phages comes primarily with these new genes and secondarily by the duplication of existing phage genes. In addition to the previously described g24 duplication, we detected other adjacent JS98 genes with significant amino acid sequence identity (alt, frd.2, and modB genes). Since it was argued that the T4 gene 24 is already a duplication of gene 23 (16), duplication followed by sequence diversification might be a mode of T4 evolution, which became possible with the larger T4 genome.
Currently, we are comparing the genetic variability within the individual branches of the T-even phage group to gain further insight into the processes introducing genetic variability in T4 phages over even shorter periods.
Published ahead of print on 10 August 2007. ![]()
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»