Previous Article | Next Article ![]()
Journal of Bacteriology, May 2002, p. 2789-2804, Vol. 184, No. 10
0021-9193/02/$04.00+0 DOI: 10.1128/JB.184.10.2789-2804.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
Laboratoire de Microbiologie et Génétique Moléculaire du CNRS, UMR 5100, 31062 Toulouse Cedex, France
Received 26 December 2001/ Accepted 20 February 2002
| ABSTRACT |
|---|
|
|
|---|
170-kb genome. Most of these nucleotide sequences lacked sufficient homology to T4 to be detected in an NCBI BlastN analysis. However, when translated, about 70% of them encoded proteins with homology to T4 proteins. Among these sequences were the numerous components of the virion and the phage DNA replication apparatus. Mapping the RB49 genes revealed that many of them had the same relative order found in the T4 genome. The complete nucleotide sequence was determined for the two regions of RB49 genome that contain most of the genes involved in DNA replication. This sequencing revealed that RB49 has homologues of all the essential T4 replication genes, but, as expected, their sequences diverged considerably from their T4 homologues. Many of the nonessential T4 genes are absent from RB49 and have been replaced by unknown sequences. The intergenic sequences of RB49 are less conserved than the coding sequences, and in at least some cases, RB49 has evolved alternative regulatory strategies. For example, an analysis of transcription in RB49 revealed a simpler pattern of regulation than in T4, with only two, rather than three, classes of temporally controlled promoters. These results indicate that RB49 and T4 have diverged substantially from their last common ancestor. The different T4-type phages appear to contain a set of common genes that can be exploited differently, by means of plasticity in the regulatory sequences and the precise choice of a large group of facultative genes. | INTRODUCTION |
|---|
|
|
|---|
Genomic hybridization and PCR analysis revealed that the T4-type phages vary considerably in their distance from T4 (54, 62, 72). Based on such data and limited sequence analysis, we can distinguish subgroups of the T4 type (54, 62, 72). The T-even subgroup shares considerable nucleic acid sequence homology with T4; for example, quantitative hybridization between the genomes of T2, T4, and T6 indicates that more than 90% of their sequences are nearly identical (18). As a result of their close relationship to T4, the T-even genomes can usually be analyzed and sequenced using PCR primers based on the T4 sequence (62, 71, 72). Although such comparisons confirmed that most of the T4-type phages were very close to T4, a few of them, such as RB69 (81) and SV14 (54), were clearly chimeras.
The genomes of these phages have blocks of sequence that diverge significantly from T4. The origin of these sequences became obvious with the characterization of the pseudo-T-even phages (54). The members of this subgroup of the T4-type phages (e.g., RB49 and 44rr2.8t) are more diverse in their host range than the T-even phages, and their genomes are phylogenetically distant from T4 (54, 62). Only a few genes of phage RB49, for example, still retain sufficient homology to hybridize with T4 DNA under stringent conditions (54).
The sequencing of the most conserved segments of the RB49 genome revealed that they encoded the structural proteins of the head (gp23 and gp24), the collar (gp20), and the contractile tail (gp18 and gp19) (54). Homologues of most of the structural components of the T4 virion were thought to be present in the pseudo-T-even phages (54). A small number of plasmids containing randomly cloned DNA of RB49 have been previously analyzed (54), and these had two sorts of sequences. A minority contained sequences that lacked homology to any entry in the NCBI database. The majority contained distant homologues to both nonstructural and structural genes of T4.
Aside from their significant divergence in nucleotide sequence, the T-even and the pseudo-T-even phages also differ in the modifications of their DNA. The T-even genomes contain hydroxyl-methylcytosine in place of cytosine, and these residues are generally glucosylated, which provides additional protection against host restriction systems (12, 29). The DNA of the pseudo-T-even phages does not appear to have these nucleotide modifications (54). Consequently, the pseudo-T-even phages must have evolved a transcription and replication apparatus that was adapted to this difference in the DNA template they use. Furthermore, Southern analysis of several different pseudo-T-even phages revealed that they are as distant from each other as they are from T4 (54). Their phylogenetic distance from T4, and the genome plasticity that this implies, motivated us to further investigate the pseudo-T-even subgroup of phages.
In this communication we have used an efficient partial sequencing strategy to compare the genomes of the T-even phage T4 and the pseudo-T-even phage RB49. Our snapshot of the RB49 genome indicates that it has diverged substantially and relatively uniformly from T4, and thus it is the first phage of this type to be analyzed. In addition to extensive random genomic sequencing, this study involved the targeted sequencing and characterization of two regions that contain the DNA replication genes. Although the genomes of T4 and RB49 share many features, they have important differences. A functional analysis of the RB49 promoters revealed a fundamental difference in the regulation of the gene expression in T4 and RB49. RB49 employs only two classes of temporally regulated promoters, rather than the three in T4. The evolutionary processes capable of generating such diversity within a phage family are discussed.
| MATERIALS AND METHODS |
|---|
|
|
|---|
(Life Technologies) was used for electrotransformation of the RB49 genomic library. The wild-type E. coli strain AC21 and temperature-sensitive RNase E mutant (rne-3071) strain AC22 (14) were used as the hosts to test the involvement of the RNase E in the processing of the gene 32 transcripts by reverse transcriptase 5'-end mapping. Primers. Primers used for the reverse transcription experiments and to obtain the gene 43 and gene 32 PCR fragments are shown in Table 1. A large number of additional primers were used to perform the sequencing of the gp43 and gp32 regions that are not listed here. The sequences of these primers and others used to investigate the genomic organization of RB49 by PCR are available on request from the authors.
|
2,500 x g. The supernatant was collected, and one-half volume of phenol and one-half volume of chloroform were added. The samples were immediately centrifuged as before and the supernatant was collected. The phenol extraction was repeated three times. The DNA was precipitated with 0.75 volume of isopropanol (room temperature) and centrifuged for 30 min at 4°C at
6,000 x g. The pellet was dried and resuspended in 500 µl of Milli-Q water. Then 1 µg of this DNA was digested with the restriction enzyme Sau3A (according to New England Biolabs) (4 U/µg of DNA) for 10 min in a final volume of 30 µl. This digestion yielded heterogeneous fragments with a mean size of less than 1 kb.
A 1/10 fraction of this digestion was then incubated in the presence of 40 U of T4 DNA ligase, its appropriate buffer (New England Biolabs), and 100 ng of pUC19 vector (New England Biolabs) linearized at the BamHI site (according to New England Biolabs) in a final volume of 20 µl. The DNA was then precipitated by adding 1/10 volume of sodium acetate (3.3 M, pH 5) and 2.5 volumes of cold ethanol. The sample was centrifuged for 30 min at 4°C at
6,000 x g. The pellet was dried and resuspended in 20 µl of Milli-Q water. E. coli DH5
was electrotransformed with 2 µl of the sample by use of the Gene Pulser device (Bio-Rad) according to the manufacturer's recommendations.
Sequencing of the library.
The chimeric plasmids carrying the phage inserts were extracted using miniprep spin columns (Qiagen) according to the manufacturer's protocol. A 1.5-ml volume of an overnight culture was extracted and the plasmid DNA was resuspended in 50 µl of Milli-Q water. The DNA sequencing was performed on 5 µl of this material. The universal and reverse primers were both end-labeled with 25 µCi of [
-33P]dATP using T4 polynucleotide kinase (New England Biolabs) as recommended by the manufacturer. The plasmid inserts were sequenced from both primers using the USB Thermosequenase cycle sequencing kit (US78500) and the three-deoxynucleoside triphosphate (dNTP) internal label cycle sequencing protocol recommended by the supplier.
Large PCR fragments. PCR products with a size greater than 3 kb were obtained by using the kit Expand Long Template PCR System of Boehringer Mannheim. The reactions were prepared according to the manufacturer's recommendation with 5 µl of an RB49 stock (109 phage/ml) as the template and using the enzyme buffer number 3 of the kit. The reactions were carried out in 0.2-ml microcentrifuge tubes. The procedure involved 30 cycles in the Perkin-Elmer 2400 Gene amp PCR system. The first 10 cycles had a 30-s step at 94°C for denaturation, 10 s at 54°C for annealing and 10 min at 68°C for extension. This was followed by 20 cycles with 30 s at 94°C, 10 s at 54°C, and 15 min at 68°C.
Direct DNA sequencing of PCR fragments. The protocol we used was developed in this laboratory and is described in detail by Bouet et al. (6).
Bacterial growth and phage infection. Bacterial growth and phage infection were done by a modification of the protocol of Belin et al. (5). Bacterial strain BE was grown in Luria-Bertani (LB) medium at 30°C with aeration to a cell density of 108/ml, centrifuged, and concentrated fourfold in the same medium. This culture was then mixed with phage to give a final multiplicity of infection of 10 and was incubated at 12°C for 10 min without agitation to allow phage adsorption to the host. Simultaneously shifting the culture that had adsorbed phage at 12°C to 30°C and providing vigorous aeration was used to initiate infection. At 2 min after infection the percentage of surviving bacteria was 10 to 20%.
Bacterial strains AC21 and AC22 were grown in LB medium at 30°C with aeration to a cell density of 108/ml and put on ice for 5 min. They were then centrifuged and resuspended in one-quarter volume of the same medium and incubated for 10 min at 43.5°C with aeration. The culture was then placed on ice for 5 min before being mixed with phage (T4 or RB49) to give a final multiplicity of infection of 10. The phages were adsorbed to the host cells by a 10-min incubation at 12°C, and infection was initiated by shifting the culture to 43.5°C.
RNA isolation. Isolation of total RNA from RB49- and T4-infected cells was performed essentially as described by Hagen and Young (31). At the indicated times after infection, a 4-ml portion of the infected cells was removed for cell lysis and nucleic acid extraction. About 200 to 300 µg of RNA was recovered and resuspended in 200 µl of diethylpyrocarbonate (DEPC)-treated water.
5'-End mapping of transcripts by primer extension.
5'-End mapping of transcripts by primer extension was done by a modification of the protocol of Gutierrez et al. (30). The appropriate primer (5 pmol) was end-labeled with 25 µCi of [
-32P]dATP using T4 polynucleotide kinase (New England Biolabs) as recommended by the manufacturer. Then 10 µg of RNA isolated from phage-infected cells was incubated for 5 min at 80°C with 0.5 pmol of the labeled primer. The samples were then frozen in dry ice and defrosted gently at 4°C. The primer was extended at an appropriate temperature (42°C for all the reactions except for the gene 32 experiments, where the best results were obtained at 48°C) for 15 min in the presence of 10 U of avian myeloblastosis virus (AMV) reverse transcriptase (Promega), AMV reverse transcriptase buffer (Promega), and 0.5 mM each dNTP. Then an additional 5 U of the enzyme was added to each sample, and the extension reaction was continued for an additional 15 min. The samples were dried in a Speedvac for 30 min and resuspended in 4 µl of Milli-Q water and 4 µl of the sequencing stop solution of the USB Thermosequenase cycle sequencing kit. The samples were then incubated at 95°C for 3 min and loaded on a denaturing polyacrylamide gel (6.5%).
Nucleotide sequence accession numbers. The nucleotide sequences obtained in this study have all been deposited in the NCBI database. The accession numbers are given in the appropriate figure legends and Table 2.
|
| RESULTS |
|---|
|
|
|---|
|
Genomic organization of RB49. The random sequencing indicates that much of the T4 and RB49 genomes evolved from a common precursor. During their descent from this ancestor, the two genomes could either have remained relatively fixed in their organization or they could have undergone rearrangements. To investigate this question we have examined the linkage of various RB49 genes by a PCR technique. To do this, we used pairs of RB49 primers whose homologous sequences are located within 10 kb of each other on the T4 genome. As shown in Fig. 1, only two of these reactions failed to give a PCR product (genes cd and 30; genes 47 and sunY). The simplest explanation for this is that these genes are not located relative to each other as they are in T4. In the remainder of the reactions, the PCR fragments obtained have roughly the same size as would be predicted on the basis of the T4 genome. Thus, the global organization of the RB49 genome is very similar to that of T4. Nevertheless, the numerous small variations from the expected size of the PCR fragments indicate that these genomes have frequently undergone insertions and/or deletions of gene-sized sequences.
Sequencing of the DNA polymerase region of the RB49 genome. To examine the RB49 genome in greater detail, we focused on two genomic regions and completely sequenced them. First, we sequenced the region that encodes the DNA polymerase (gp43) and its associated proteins in the replication complex: gp41 (primase); gp44 and gp62 (the clamp loader); gp45 (sliding clamp); and gp46 (exonuclease subunit) (Fig. 1 and Materials and Methods). The RB49 "gene 43" segment (Fig. 1 and 2A) is significantly smaller (11.9 kb) than its T4 counterpart (18.5 kb). Although the NCBI BlastN analysis detected no extended homologies to the T4 DNA sequence, the ClustalX program aligns the T4 and RB49 sequences to reveal numerous small blocs of nucleotide sequence homology (data not shown).
|
RB49 replisome proteins. Only a few replication genes of phages with T4-like morphology have been sequenced (52, 64, 76, 81). Among the closely related T-even phages the amino acid sequences of the replication homologues typically diverge from T4 proteins by less than 5%. However, in the chimeric T4-like phage RB69 (64, 81), the sequences can diverge significantly more than this (e.g., 20%). The replication genes of RB49 are clearly the most divergent T4-type sequences thus far analyzed. Although the RB49 gp43 is the most conserved replication protein, it still has 44% divergence from the T4 sequence. All the RB49 polymerase accessory proteins differ by 50% or more from their T4 homologues (Fig. 2A).
When the alignments (Fig. 3A) of the T4 and the RB49 protein gp43 are made, it is evident that there are numerous differences in this key component of the replisome. Highly divergent segments are interspersed among much more conserved sequence blocks. In T4, this protein has both DNA polymerase and proofreading activities and also interacts with the accessory proteins (58, 61). A previous comparison of the T4 and RB69 gp43 protein sequences suggested that these two enzyme functions were separated into a two-domain protein structure (46, 61, 76). More recent x-ray crystallographic studies suggest a much more complex five domain organization of the protein (40, 77). Our compilation of the three known gp43 variants (T4, RB69, and RB49) confirms the conservation of the diagnostic motifs of B family polymerases (21, 36) (Fig. 3A). In particular, the residues believed to be involved in the active sites (75, 76) are well conserved in all three phages. This analysis also confirms the existence of a nonconserved block of 70 amino acids (residues 482 to 552 in the T4 sequence) previously suggested by the T4 and RB69 comparison (76). Domain deletion and swapping experiments have demonstrated that this segment plays an important role in the replication activity of the protein (76). Our expanded sequence comparison reveals additional segments of the gene whose sequence can vary substantially; for example, the 30-amino-acid segment (152 to 182) located in the N-terminal portion of the protein. No residue in this segment is conserved in all of the gp43 sequences. Eventual comparison of the RB49 gp43 with the RB69 structure and the construction of gp43 chimeras by swapping such nonconserved segments between the different versions of gp43 should define the function of the variable segments.
|
Sequencing of the RB49 gene 32 region. We used similar methods (see Materials and Methods) to identify and isolate the gene 32 region of the RB49 genome (Fig. 1 and Fig. 2B). The single-stranded-DNA-binding protein gp32 plays a central role in T4 DNA replication, recombination, repair and late transcription (41, 55, 80). Its amino acid sequence is extremely well conserved (>95%) among the numerous T-even phages thus far examined (50, 73). This 3.5-kb genomic segment of RB49 extended from the RNase H gene to gene 32 and is somewhat larger than the analogous 3.4-kb region of the T4 genome. As with the other RB49 DNA sequences, the gene 32 segment has little homology to the T4 nucleotide sequence and the relation to the T4 genome becomes obvious only after translation. In addition to gene 32, this segment contains gene 59, which plays an important role in DNA replication, and gene 33, a transcription factor required for late gene expression (34, 80). As shown in Fig. 3B RB49 lacks the T4 gene 32.1 but contains a novel large ORF of unknown function inserted between the RNase H and dsbA genes.
The alignment of the RB49 and T4 gp32 sequences reveals a 60% amino acid identity (Fig. 2B and Fig. 3B). The T4 protein is organized in a three-domain structure. The amino terminal domain of the protein is composed of basic residues and contains a LAST motif (Lys/Arg/Lys/Ser/Thr) (15). This element is believed to be involved in the protein's cooperative binding to DNA (25, 48, 69, 73, 79). The carboxyl-terminal portion of the protein is very acidic (16) and is involved both in the protein's ability to denature double-stranded DNA and its interaction with the various replication and recombination proteins (9, 10). The central domain of the T4 gp32 contains both a zinc-finger motif and a fairly regularly spaced series of 6 tyrosine residues, features that are believed to be important to its nucleic acid binding activity (23, 24). A second LAST motif is also present here and would be able to interact with either the acidic carboxyl terminal part of the same monomer when the protein is not bound to DNA, or with the nucleic acid backbone when the protein is bound to DNA (15).
As shown in Fig. 3B, except for the amino-terminal LAST motif all of the elements believed to be important for DNA binding activity of the T4 gp32 are conserved in RB49. These include the zinc finger motif, the internal LAST motif, and the series of aromatic residues. Most of the differences between the RB49 and T4 gp32 sequences are located in the carboxyl terminus, where only 50% of the last 48 residues are identical. In addition, RB49 has an 18-amino-acid insertion within one of the three
-helices in the carboxyl-terminal domain of the protein. Nevertheless, the number of negatively charged residues in this domain is similar to that in T4 gp32 and thus, in spite of the sequence divergence, the acidic character of the domain is preserved. To summarize, the C-terminal domain of gp32 that mediates its protein-protein interactions manifests more sequence plasticity than the remainder of the gene.
Regulatory sequences.
In T4, many of the regulatory signals are located in the intergenic spacers. The analysis of the gp43 and gp32 replication regions of RB49 indicates that their intergenic sequences have diverged even more from those of T4 than the coding sequences. In particular, the presumed promoter sequences in RB49 differ strikingly from their T4 counterparts (Table 3). In T4 the gp43 and gp32 regions contain early and middle mode T4 promoters (35, 39) but in RB49 no T4 consensus promoter sequences of either type are found. However, upstream of many RB49 replication genes, there are motifs identical to the consensus sequence recognized by the E. coli
70 (Table 3 and Fig. 2). Interestingly, we also found sequences identical to the T4 late promoter consensus located upstream of the RB49 replication genes 44 and regA (Table 3 and Fig. 2); promoter sequences that are absent from the corresponding region in T4. Many of the putative promoter sequences of RB49 have upstream AT-rich sequence, as do many strong T4 promoters.
|
70 promoter.
|
|
70 and is necessary for a correct recognition of the middle promoters (17, 59). Again there was no obvious homologue of this gene in RB49. Posttranscriptional control. In T4 infection, there exist posttranscriptional control mechanisms that fine-tune phage gene expression by means of mRNA processing or translational repression (53). The substantial differences that we found in the transcriptional regulation of RB49 motivated a comparison of mRNA processing in cells infected with RB49 and T4.
One of the more striking examples of mRNA processing influencing gene expression involves T4 gene 32. There are four different primary transcripts produced in T4-infected cells that contain the gene 32 mRNA sequence (5, 13). These transcripts can be cleaved by E. coli RNase E at several sites, including a prominent one located 71 nucleotides upstream of the gene 32 initiation codon (5, 27, 57). This cleavage permits a stabilization of the 3' portion the transcript containing the gene 32 sequence and initiates the degradation of the 5' portion of the polycistronic transcripts containing the upstream genes (57).
The presence of a functional early promoter in the Rnase H gene and the absence of an obvious transcription terminator downstream of this sequence led us to believe that, in addition to the proximal early and late monocistronic transcripts, a polycistronic gene 32 species also exists (Fig. 2B). This would explain the series of minor 5' ends upstream of the gene 32 late transcript that were detected by primer extension. We have examined the possibility that RNase E processes either this large gene 32 transcript or the monocistronic ones as it does in T4 (5, 57). To do this we have compared the results of reverse transcription experiments using phage mRNA prepared from isogenic host strains that were either wild-type or mutant for RNase E. As shown in Fig. 6, the pattern of primer extension was essentially identical when both of these mRNA preparations were used as templates. This indicates that for RB49 infection, unlike that of T4, the host Rnase E has no major role in processing the gene 32 transcription unit.
|
|
| DISCUSSION |
|---|
|
|
|---|
We have demonstrated that phages T4 and RB49 clearly derived from a common ancestor, but that their subsequent evolution has caused these genomes to diverge considerably. These two distantly related genomes share a common set of essential genes that are distributed throughout the chromosome. Nevertheless, some of the T4-type sequences, such as the nonstructural genes or the regulatory signals, have changed considerably more than others and nearly 30% of the RB49 genes are without homologues in the database. Many of these ORFs of unknown function are in sites that are occupied by nonessential genes in T4 genome. These novel ORFs may have been transferred horizontally into RB49 genome.
Comparison of the sequences of the replication proteins that have substantially diverged between RB49 and T4 should help to define the sequence motifs that are critical to protein activity, especially when such an analysis can be combined with structural information. Previous attempts to do this for the sequences of T-even proteins were largely unsuccessful, because the replication proteins were too conserved within this subgroup of the T4 phage. Gene 32 illustrates this problem; although the sequence of gene 32 has been determined for a number of T-even phages (44, 50, 73), only in the pseudo-T-even phage RB49 gene is there any substantial variation from the T4 sequence. In spite of these divergences, the gp32 motifs involved in the DNA-binding activity of the protein are well conserved. Thus, it is reasonable to suppose that the T4 and the RB49 gp32 ensure essentially the same function.
Nevertheless, some of the sequence variations may be related to subtle differences in the way the T4 and RB49 proteins function. For example, the absence of base modifications in the RB49 DNA could necessitate changes in the sequence motifs involved in binding to single-stranded DNA. Moreover, the RB49 gp32 sequence may be adapted to specific interactions within the RB49 replication complex that differ from those of its T4 counterpart. This could explain why the segments of gp32 involved in protein-protein interactions with the other replication proteins are more plastic than the sequences involved in its DNA-binding activity. Similarly, the differences between T4 and RB49 gene 43 proteins are nonrandomly distributed in the sequence. Both the T4 and RB49 DNA polymerases contain numerous motifs that are widely conserved in B-family polymerases.
Nevertheless the two phages' enzymes can substitute only poorly for each other (J. Karam et al., personal communication). This may be a consequence of the considerable divergence of the various RB49 accessory proteins from their T4 homologues. It has been shown that the T4 and RB69 versions of proteins gp44 and gp62, which are far more homologous than the T4 and RB49 versions, cannot substitute for each other to form an active gp44/gp62 heteromer (81). Such results suggest that the various components of the replication machinery of RB49 and T4 may not be as interchangeable as one might have imagined based on the fact that they fulfill similar functions in the replication complex.
The most striking difference between phages RB49 and T4 is that the regulation of the transcription of RB49 relies on only two classes of promoters instead of three, as in T4 (56). In T4, the early promoter consensus sequence is apparently optimized to ensure its preferential recognition by the RNA polymerase compared to the endogenous E. coli promoters (78). However, the RB49 early promoter sequence is identical to the consensus of the E. coli
70-dependent promoter. Furthermore, since the RB49 DNA is not modified, it is difficult to imagine how, early in infection, there could be efficient discrimination between the phage and host promoters. This may explain why the titers of RB49 stocks produced on E. coli are typically a factor of 10 lower than those produced by T4.
However, the use of an E. coli promoter consensus sequence for early phage transcripts may be advantageous in other hosts because this promoter sequence functions well in numerous bacterial species, including distant ones (4, 22, 28, 51, 60). Thus, RB49's utilization of this early promoter sequence may expand its host-range compared to T4. Cryptic promoters are found in front of some T4 genes that appear to be recognized only when the phage DNA is not modified (13). This suggests that a mechanism of early transcription initiation similar to that in RB49 was also present in an ancestral version of T4. These two phages may have chosen very different survival strategies: T4 is very efficient but can produce numerous progeny in only a limited set of hosts, E. coli and its closest relatives, whereas RB49 might be able to infect a wider range of hosts but has a significantly lower progeny yield in E. coli.
The distribution of the promoters in RB49 is different from that in T4, probably in part to compensate for the lack of middle promoters. In the gene 43 region of the RB49 genome, a number of early promoters are arranged so as to insure adequate synthesis of all the replication genes for the early phase of infection. The presence of two late promoters in the same region could significantly extend the period of the expression of these genes. Similarly the RB49 gene 32 has proximal early and late promoter sequences that replace the intermediate and late promoters in the T4 gene. Since the early and middle periods of transcription in a T4 infection partially overlap, this switch in promoters may have only modest effects on the kinetics of gp32 synthesis, especially if, as in T4, the gene 32 mRNA was not rapidly degraded.
The observations reported here indicate that there is a high level of plasticity in phage regulatory sequences. This phenomenon is particularly well illustrated by the comparison of gene 32 transcription among the various T4-type phages. In the T-even subgroup, the regulatory region just upstream of gene 32 is dimorphic (47, 62). The T2 and T4 versions of this locus differ from each other by the presence in T4 of the nonessential gene 32.1 (47). In T2 the gene 59-gene 32 intergenic sequence contains an intercalated middle and late promoter that both transcribe monocistronic gene 32 mRNAs (47). The same classes of promoters transcribe gene 32 in T4, but in this case these promoters are embedded within the inserted gene 32.1 sequence (5, 47, 49, 74). Since the same classes of promoters transcribe gene 32 in both of the T-even versions of this locus, we had assumed that some strong regulatory constraint imposed this conservation. The novel organization of gene 32 transcription in RB49 shows that this is not the case.
Our results also demonstrate that RB49 has, in some cases, evolved different posttranscriptional regulatory strategies than T4 (e.g., RNase E processing of the 5' leader of gene 32 mRNA). A detailed analysis of the self-regulation of gene 43 in RB49 will be reported in a separate publication (Petrov et al., unpublished data) and similar studies are now being performed on the posttranscription regulation of gp32 synthesis (Desplats et al., unpublished data). Some variations in regulatory strategies have been previously observed (47, 81). For example, T4 and RB69 control the synthesis of the polymerase accessory proteins gp45, gp44 and gp62 differently (81). The apparent absence of intermediate promoter sequences in this part of the RB69 replication region was remarked (81). A simple explanation for this could be that RB69 has swapped this segment of its genome for an analogous segment derived from a pseudo-T-even related to RB49. The results reported here suggest that the regulation of homologous genes in the T4-type phages may vary much more than would have been imagined previously. The evolutionary significance of such variation needs to be further investigated.
Swapping of analogous modular sequences by homologous recombination may occur frequently among the T4-type phages (62). Regions of sequence homology between these phages, either in regulatory or in coding sequences, could occasionally mediate these genetic exchanges. For example, alignments of the RB49 and T4 nucleotide gene sequences have revealed small "islands" (approximately 20 bp) of strong conservation (up to 80% of nucleotide identity) in spite of the high global level of nucleotide sequence divergence. Such motifs might act as preferred sites of recombination as do the His-box or the glycine island motifs previously characterized in the long tail fiber locus of these phages (70, 71). Such conserved sequences might delimit protein domains that could be exchanged to generate functional chimeric proteins. In such a scenario, even the distant pseudo-T-even phages could provide the T-even phages with a highly diverse but nevertheless generally compatible source of modules. Modular exchange of segments of the viral genome could generate enormous diversity within the context of a fairly standard T4-type genome. In addition, since the sequences involved in the regulation of the phage expression are not as rigidly constrained as had been previously imagined, the same module in different phage genomes may have different patterns of expression. Such context-dependent modulation of gene expression could be yet another important mechanism by which modularity generates phage diversity.
Recent comparative studies indicate that different phage families can have major differences in the characteristics of the genomic plasticity that they manifest (8, 33, 38). For example, the existence of a common set of essential genes distinguishes the T4-type genomes from those of the well-known lambdoid phages. The lambdoid genome is an ordered array of modular elements that are essentially independent of each other (11). The genes encoded by functionally equivalent lambdoid modules need not share common phylogenetic origin and have evolved independently from each other. Thus, in the lambdoid genome each of the various functional elements has its own evolutionary history, whereas in the T4-type genomes an ensemble of essential genes appears to have coevolved together. This difference in evolutionary strategy is probably necessitated by the greater size of the T4-type genome and the more intricate pattern of interactions between its diverse products.
Recently, two new subgroups of T4-type phage have been identified that are more distant from T4 than the pseudo-T-even phages. These subgroups are called the schizo-T-even phages (phages with a generally T4-like morphology but having elongated heads) and the exo-T-even phages (phages with only a vaguely T4-like morphology, having isometric heads and longer contractile tails) (32, 72). Since these phages infect bacterial species that are very distant from enterobacteria (vibrios, aeromonads, and cyanobacteria), this difference in the ecological niche they occupy could partly explain their massive divergence from T4 (1, 2, 32). At this time, very little else is known about these most distant members of the T4 phage family. Without doubt the lessons learned from this study of the RB49 genome will facilitate the analysis of the genomes of these new subgroups of T4-type phages. Such studies should help us to understand the fundamental evolutionary processes that created all of the diverse T4-type phages.
| ACKNOWLEDGMENTS |
|---|
This research was supported by the CNRS and by grants from the Ministère de la Recherche (PRFMMIP) and the GIP HMR and the Toulouse Genopole for DNA sequencing facilities.
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|