JB
Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental material
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Lobocka, M. B.
Right arrow Articles by Blattner, F. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lobocka, M. B.
Right arrow Articles by Blattner, F. R.
Journal of Bacteriology, November 2004, p. 7032-7068, Vol. 186, No. 21
0021-9193/04/$08.00+0     DOI: 10.1128/JB.186.21.7032-7068.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.

Genome of Bacteriophage P1{dagger}

Malgorzata B. Lobocka,1* Debra J. Rose,2 Guy Plunkett III,2 Marek Rusin,3 Arkadiusz Samojedny,3 Hansjörg Lehnherr,4 Michael B. Yarmolinsky,5 and Frederick R. Blattner2

Department of Microbial Biochemistry, Institute of Biochemistry and Biophysics of the Polish Academy of Sciences, Warsaw,1 Department of Tumor Biology, Centre of Oncology, M. Sklodowska-Curie Memorial Institute, Gliwice, Poland,3 Laboratory of Genetics, University of Wisconsin, Madison, Wisconsin,2 Department of Genetics and Biochemistry, Institute of Microbiology, Ernst-Moritz-Arndt-University Greifswald, Greifswald, Germany,4 Laboratory of Biochemistry, National Cancer Institute, National Institutes of Health, Bethesda, Maryland5

Received 11 March 2004/ Accepted 9 July 2004


    ABSTRACT
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 References
 
P1 is a bacteriophage of Escherichia coli and other enteric bacteria. It lysogenizes its hosts as a circular, low-copy-number plasmid. We have determined the complete nucleotide sequences of two strains of a P1 thermoinducible mutant, P1 c1-100. The P1 genome (93,601 bp) contains at least 117 genes, of which almost two-thirds had not been sequenced previously and 49 have no homologs in other organisms. Protein-coding genes occupy 92% of the genome and are organized in 45 operons, of which four are decisive for the choice between lysis and lysogeny. Four others ensure plasmid maintenance. The majority of the remaining 37 operons are involved in lytic development. Seventeen operons are transcribed from {sigma}70 promoters directly controlled by the master phage repressor C1. Late operons are transcribed from promoters recognized by the E. coli RNA polymerase holoenzyme in the presence of the Lpa protein, the product of a C1-controlled P1 gene. Three species of P1-encoded tRNAs provide differential controls of translation, and a P1-encoded DNA methyltransferase with putative bifunctionality influences transcription, replication, and DNA packaging. The genome is particularly rich in Chi recombinogenic sites. The base content and distribution in P1 DNA indicate that replication of P1 from its plasmid origin had more impact on the base compositional asymmetries of the P1 genome than replication from the lytic origin of replication.


    INTRODUCTION
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 References
 
Three bacteriophages of Escherichia coli with different life styles have been particularly instrumental in the development of the concepts and tools of molecular biology. They are the virulent phage T4 and the two temperate phages, {lambda} and P1. The two temperate phages were first described in close succession in the middle of the last century, namely in 1951 (phage P1 [31]) and 1953 ({lambda} [185] and informally even earlier, in the first issue of Evelyn Witkin's Microbial Genetics Bulletin, 1950 [184a]). Whereas {lambda} prophage leads an essentially passive existence within the chromosome of its host, P1 prophage exists as an autonomous plasmid that is maintained at low copy number. The complete genome sequences of {lambda}, T4, and Escherichia coli have been reported. We present here the complete nucleotide sequence of P1 and summarize its salient features.

P1 infects and lysogenizes E. coli and several other enteric bacteria. Its virion consists of an icosahedral head attached at one vertex to a tail that bears six kinked tail fibers (see cover figure). As in T4, the tail consists of a tail tube and a contractile sheath. A variable part of the tail fibers determines the specificity of P1 adsorption on different hosts. The variable part is encoded by an invertible segment of P1 DNA, similar to that of phage Mu (reviewed in reference 268). Infective particles of P1 contain cyclically permuted, linear, double-stranded molecules with a terminal redundancy of 10 to15 kb of DNA (293). Prior to this work, about 70 genes had been identified in P1 by genetic and physiological studies (346).

After injection into a host cell, viral DNA circularizes by recombination between redundant sequences to enter a lytic or lysogenic path. The choice between the two paths is dictated by the interplay of a number of environmental factors with the complex immunity circuitry that controls synthesis and activity of a master repressor, C1 (reviewed in reference 126). As a prophage, P1 is a stable plasmid maintained at about one copy per bacterial chromosome. Several P1 genes scattered throughout the genome are expressed in the lysogenic state. Those of primary importance are involved in plasmid maintenance functions and in inhibition of lytic development. P1 prophage replicates as a circular plasmid from an origin (oriR) resembling that of several other plasmids and, to a lesser extent, oriC of E. coli.

P1 genes expressed in the lytic pathway are those involved in the timing of phage development, in replication (from a "lytic" origin different from oriR), in the formation of phage particles, including headful packaging, and finally in cell lysis to release the phage progeny. Genes of the lytic pathway have been divided into early and late. Transcription of the late genes requires, in addition to bacterial RNA polymerase, a P1-encoded activator protein, Lpa (188), and an E. coli RNA polymerase-associated stringent starvation protein, SspA (111).

P1, like {lambda}, made its mark early in molecular biology. The significant capacity of P1 for mediating generalized transduction (196) led promptly to P1 becoming a workhorse of genetic exchange among strains of E. coli, a role it is still playing today. Moreover, because P1 can package slightly more than twice as much DNA as can {lambda}, and packaging can be efficiently carried out in vitro, P1-based vectors are now in common use for cloning and in vitro packaging of eukaryotic DNA (246, 293). P1 also made its mark early because of the restriction-modification genes that it carries, the analysis of which in the 1960s heralded the age of genetic engineering (16). The recognition that P1 is maintained as a plasmid prophage led to the identification of its plasmid maintenance functions. A site-specific recombinase, Cre, which appears promptly after infection, is involved in phage DNA cyclization (137), assists plasmid maintenance by resolving plasmid multimers (21), and perhaps, by a separate mechanism, may stabilize plasmid copy number (10). Ever since the introduction of the enzyme and its recombination site, lox, into Saccharomyces cerevisiae (271), the Cre-lox system has been a major tool of genetic engineering in eukaryotic cells.

The active partitioning of P1 plasmids to daughter cells requires two P1-encoded proteins and a cis-acting site (the centromere) (20). A surprising homology between the structural genes of plasmid partitioning and the soj and spoOJ sporulation genes of Bacillus subtilis provided the motivation for examining the effect of soj and spoOJ mutations on the production of anucleate progeny. Consequently, a connection was established between plasmid and nucleoid partitioning (157, 284). The finding that a gene of P1 encodes a toxin that is activated on P1 loss (192), among other findings of such "plasmid addiction" genes, helped to establish the concept that programmed cell death has its place in the life of bacterial populations, as in metazoan development (348). Finally, the analysis of P1 immunity maintenance and establishment (reviewed in reference 126) enlarges our view of controls on phage development. The master repressor, C1, unlike that of lambda, is monomeric, and the operators to which it binds are many and dispersed. It is modulated by at least two antirepressors (Ant1/2 and Coi) and a corepressor protein (Lxc), the latter capable of inducing certain C1-bound operators to loop. A trans-acting RNA (the c4 product) regulates Ant1/2 synthesis in a novel way, being excised from a transcript on which lies its site of action. Regrettably, the study of P1 immunity halted prematurely with the death of its prime mover, Heinz Schuster.

The most recent effort to compile a P1 map (made when no more than a total of 40% of the genome had been sequenced) was reported solely as a computer file (346). A recent short review of P1 biology has been prepared (186). Reviews of the older P1 literature may be found in references 29, 301, 345, and 347 and successive editions of Genetic Maps, one of which (344) was reprinted in reference 225. A thoughtful perspective on early studies of lysogeny by one of the pioneers in this field and the discoverer of P1 has recently appeared (30).

The overview of P1 genome organization and developmental regulation that we present here emerged from results of previous studies combined with the analysis of the entire genomic sequences of two strains of P1 described below.


    MATERIALS AND METHODS
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 References
 
Sources of DNA and sequencing strategy. DNA sequences described here were derived from two sources. One was DNA of P1 phage induced from N1706, an E. coli K-12 P1 c1-100 Tn9 lysogen preserved frozen since the early 1970s (kindly provided by J. L. Rosner). The other was DNA of P1 prophage carried by an MG1655-derived strain that was used in the E. coli genome sequencing project (35). The prophage had been acquired in the course of a P1-mediated transduction with the aid of P1c1-100 Tn9 r m dam (now dmt) {Delta}MB rev-6, a high-frequency transducing derivative of the P1 c1-100 Tn9 lysogen (303).

Prophage induction in N1706 and isolation of phage particles were performed as described previously (112). DNA was isolated from phage particles by using the QIAGEN Lambda Midi kit (QIAGEN, Inc.) according to the supplier's protocol. DNA of the prophage was isolated from a gel as described in reference 35.

The shotgun sequencing strategy was used for acquiring most of the sequencing data. Fragments of P1 phage or prophage DNA obtained by sonication were cloned in the pUC18 vector (266) or in the M13 Janus vector (45) and served as templates for sequencing. The ABI PRISM Dye Terminator Cycle Sequencing Ready Reaction kit with AmpliTaq DNA polymerase FS (Perkin-Elmer), was used to perform sequencing reactions. Separation of products of sequencing reactions and reading of results was done with ABI 377 automated sequencers (Perkin-Elmer). After collection of initial sets of sequencing data at a four- to fivefold redundancy from readings in one direction, selected templates were resequenced in the opposite direction. The remaining gaps in the sequences were filled in by primer walking using oligonucleotides complementary to the ends of the sequenced regions and cloned P1 fragments or the entire DNA of the respective P1 strain as a template. Sequences were assembled using the program Seqman II (DNAStar Inc.).

Sequence analysis and annotation. The majority of analyses were carried out using programs from either the GCG package version 9.0 (Genetics Computer Group, 1996) or the DNAStar package (DNAStar Inc.).

Open reading frame (ORF) identification was performed using programs based on a Markov model: the Internet version of GeneMark (http://genemark.biology.gatech.edu/GeneMark) (36, 208) and TIGR Glimmer 2.1 (75, 264). GeneMark was trained on templates for genes of E. coli, and its temperate bacteriophages and transposons, and bacteriophage T4. The model for Glimmer 2.1 was prepared with the use of Build-icm trained on P1 ORFs longer than 600 nucleotides. Additionally, the entire sequence was divided into 400-bp overlapping fragments and the predicted products of their translation were searched against the database for possible similarities with known protein sequences. Identification of a few genes, whose predicted codon usage pattern did not match that of any model organism, was based on homologies of their putative products to known proteins and on the presence in their upstream regions of sequences resembling promoters or likely to encode ribosome binding sites.

The assignment of previously identified genetic loci to newly sequenced ORFs was assisted by alignments to existing physical and genetic maps (230, 252, 280, 281, 294, 323, 346). Predicted sizes of putative P1 structural proteins were compared with sizes of head and tail components of P1 determined by polyacrylamide gel electrophoresis (PAGE) (325) to verify the identity of certain structural genes.

Searching for similarities between P1 proteins and known proteins in databases was performed using the Internet versions of programs PSI-BLAST and PHI-BLAST (http://www.ncbi.nlm.nih.gov/BLAST/) (12, 354). Macaw (278) served to create multiple protein alignments. The putative helix-turn-helix motifs in protein sequences were identified by the HTH program at its website (http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_server.html) (78).Putative signal peptides in proteins were predicted by using the SignalP program (235) at its website (http://www.cbs.dtu.dk/services/SignalP/#submission). Putative transmembrane helices in proteins were predicted by using the program TMHMM (175) at its website (http://www.cbs.dtu.dk/services/TMHMM).

Identification of putative {sigma}70 promoters was carried out using Targsearch (118). Putative Rho-independent transcriptional terminators were identified with the GCG program Terminator (40) and TIGR program TransTerm (85). To avoid false positives, only terminators that met the following additional criteria (69, 198) were taken into consideration: a 4- to 18-bp stem and a 3- to 10 nucleotide (nt) loop of the terminator hairpin; a thymidine-rich region downstream of the terminator hairpin and separated by less than 3 nt; more than three GC/CG or GT/TG bp in the hairpin stem; at least three T residues, no more than one G, and no 5'-TVVTT stretches (V is A, C, or G) in the 5-nt-long proximal part of the thymidine-rich region; absence of four purine or cytosine residues in the 4-nt-long distal part of the thymidine-rich region and at least four T residues together in the proximal and distal parts of the thymidine-rich region. Putative integration host factor (IHF)-binding sites were identified with MacTargsearch at SEQSCAN (http://www.bmb.psu.edu/seqscan/seqform1.htm).

Oligonucleotide frequencies were determined with the program OCTAMER (L. Lobocka, unpublished). The DNA was scanned for putative tRNA genes by DNA homology searches and by using tRNAscan-SE (81, 204) at its website (http://www.genetics.wustl.edu/eddy/tRNAscan-SE/).

Nucleotide sequence accession numbers and strain availability. The nucleotide sequences described here have been deposited in GenBank under accession numbers AF234172 (phage P1 mod749::IS5 c1-100) and AF234173 (prophage P1 mod1902::IS5 c1-100 rev-6 dmt{Delta}MB). A lysogen (N1706) from which the first of these phages was induced has been deposited with the American Type Culture Collection (Manassas, Va.) as ATCC BAA-1001.


    RESULTS AND DISCUSSION
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 References
 
Comparison of the sequenced P1 isolates. Current P1 phages in circulation are derived from P1kc, a mutant selected in two steps by Lennox (196) as able to plate on E. coli K-12 with plaques of increased clarity. The induction of lysogens was facilitated by the introduction of mutations that render P1 thermoinducible (279), the most widely used being the c1-100 mutation, originally designated clr100 (261). It is present in both of the P1 isolates that we used for sequencing. For simplicity in selecting P1 lysogens as drug-resistant bacteria, Rosner (261) made use of P1 CM (171), a strain that harbors the unstable Tn9 transposon (encoding chloramphenicol acetyltransferase) acquired from an R factor. However, P1 c1-100 Tn9 has been widely used and hence was chosen for sequencing. The phage induced from the supposed P1 c1-100 Tn9 lysogen was found to have lost Tn9 and to have acquired IS5 inserted in the mod gene, which renders it r m. The sequence of this strain, P1 c1-100 mod749::IS5 (94,800 bp), was determined with an average 7.2-fold redundancy. The assembly was verified by comparison of the restriction map obtained by digestion with EcoRI, HindIII, HincII, BamHI, SacI, and PstI with that deduced from the entire P1 sequence (without the IS5) (346).

The other strain of P1 used here, P1 c1-100 mod1902::IS5 dmt{Delta}MB rev-6 (94,481 bp), is a prophage carried by an MG1655-derived strain of E. coli K-12 that served in sequencing the E. coli genome (35). Its nucleotide sequence was determined with an average 7.8-fold redundancy. The prophage appears to have been acquired in the course of a P1-mediated transduction. In addition to a 319-bp deletion in dmt that impairs the function of the viral DNA methyltransferase gene, it carries a mutation, rev-6, that improves growth of the mutant in a dam mutant host and augments transduction (303). In the course of its history, the phage lost Tn9, acquired IS5, and accumulated other mutations such that, relative to the first strain, there are 13 additional mutations (four of them silent and one in a noncoding region). Five of those mutations are scattered in the variable part of the S gene (Sv), which is essential for P1 adsorption to E. coli. Which alleles are parental remains uncertain. The rev-6 mutation has not been correlated with any of the sequence differences between the two strains.

In the two sequenced strains, the IS5 insertions occurred at different TTAG sequences, known to belong to the most commonly used targets of IS5 (C/TTAA/G; reviewed in reference 212). In each case the IS5 insert, accompanied by a 4-bp duplication of its target, appears responsible for the r m phenotype since, in other respects, the res mod sequences are identical to those of r+ m+ P1 (145).

Genomewide features. The genome of P1, without IS5 and its duplicated target, consists of 93,601 bp of double-stranded DNA, which in the prophage can be represented as a circle or line with the center of the site-specific recombination site lox assigned as the zero point such that the first nucleotide to its right on the strand written 5' to 3' in the direction of cre is assigned position 1 and numbering proceeds rightwards (346) (Fig. 1). The genome contains one insertion sequence, IS1, as an integral part of P1 (151). In both sequenced strains, IS1 has a base substitution mutation (IS1 G757T) in its right inverted repeat (IRR).



View larger version (44K):
[in this window]
[in a new window]
 
FIG. 1. Genetic and physical organization of the P1 genome. Boxes with internal triangles show positions and orientations of genes, color-coded by function: yellow, plasmid maintenance; red, repression of early functions; pink, immunity control, not c1 itself; magenta, source of tRNAs; brown, DNA methylation; deep blue, transcriptional activation of late genes; grey, defective IS1; green, all other. Black boxes are intergenic regions of defined function: recombination sites, iterons to which RepA binds, plasmid centromere, and origin of DNA packaging, the direction of packaging being indicated by an arrowhead at the pac site. Bidirectional replication determined at the phage (lytic) origin, oriL, and at the plasmid origin, oriR, are indicated by black arrowheads above the genome map. C1 operator sites are marked with red flags pointing to the left or right (see also Table 6). Thin lines with terminal deep blue half arrowheads indicate the start sites and directions of the transcripts from particular late promoters. GATC sequences that overlap transcriptional promoters and clustered 5'-GATC sequences (two or more sites with pairwise separation of not more than 50 bp), substrates for Dmt or Dam methylation, are marked above the gene map by brown lollipops that are filled in the case of sites shown to alter function upon methylation. Hooks indicate Rho-independent transcriptional terminators. They face the starts of transcripts that they terminate. The map refers to the genome of P1 c1-100 mod749::IS5 without its nonintegral part, IS5.

 
(i) Base content and distribution. The GC content of P1 DNA is 47.3%, slightly less than that of its E. coli host (50.8%) (35). The distribution of purines and pyrimidines between the two complementary strands is similar, with the upper strand (+; coding strand of genes transcribed clockwise) containing 49.5% (26.0% adenosine and 23.5% guanosine), and the lower strand (–) containing 50.5% (26.7% adenosine and 23.8% guanosine) purine nucleotides.

The location of extensive AT-rich regions along the genome map confirms the denaturation mapping evidence (222, 347) for the relatively recent acquisition by P1 of its restriction-modification genes (res, mod) and suggests that genes of the sim and rlf operons, as well as hot and isaB, were also incorporated in the genome of P1 late in evolution. Sharp borders of long AT-rich regions within certain genes (e.g., the 3'-moiety of parB) may be indicative of mosaic structure.

Unique restriction sites in P1 DNA are located preferentially in AT-rich regions or within the IS1 sequence (Fig. 2), suggesting their recent acquisition by P1 and selection against these sites in a P1 ancestor. A few 6-bp palindromes, the targets of known restriction enzymes, ApaI (GGGCCC), NarI (GGCGCC), NaeI (GCCGGC), SacI (GAGCTC), SalI (GTCGAC), and AvrII (CCTAGG), are absent from the P1 genome. Short DNA palindromes rare in or absent from P1 presumably identify sequences that could confer vulnerability in hosts that P1 otherwise finds particularly congenial.



View larger version (28K):
[in this window]
[in a new window]
 
FIG. 2. Compositional organization of the P1 genome. The G+C content and (C–G)/(C+G) ratio (C-G skew) for one strand of P1 DNA are plotted as deviations from the mean. The plots were made with the GeneQuest program of DNAStar with a window size of 1,000 bp and a shift increment of 1 bp. For orientation, a simplified genome map is shown centrally. Upper-row genes are transcribed clockwise (left to right), and lower-row genes are transcribed counterclockwise (right to left). Two vertical black lines through the plots show the locations of the lytic (oriL) and plasmid (oriR) origins of replication. Two vertical grey bars show regions of polarity switch in the plot of the C-G skew. Unique recognition sites for restriction enzymes are indicated directly above the plot of G+C content; those below the line are for 8-bp cutters. See the text for absent 6-bp restriction sites. Chi sites and RAG sites (64) in the upper and lower strands are indicated by hash marks above and below the horizontal lines at the top of the figure. Where two hash marks are so close as to appear superimposed, a line at an angle to the vertical is added. The map refers to the genome of P1 c1-100 mod749::IS5 without its nonintegral parts, IS5, and the associated 4-bp duplication.

 
According to the pattern of C-G skew (the relative excess of cytosines versus guanosines in the top strand), the genome of P1 can be divided into two polarized arms with the average C-G skew below the mean predominating in one part and above the mean in the other part (Fig. 2). The region where the average sign of C-G skew below the mean predominates coincides with the region where transcription in the clockwise direction predominates, and vice versa. The C-G skew switches polarity in two regions: around kilobase 64 of the genome, close to the origin of plasmid replication (oriR), and around kilobase 3 of the genome. This pattern is consistent with the skew observed for bacterial and double-stranded DNA viral genomes whose replication proceeds bidirectionally from an origin (reviewed in references 101 and 216). Typically, the regions of polarity switch in the C-G skew are close to an origin and to the terminus of replication. P1 possesses two separate origins, oriR, and the origin of lytic replication, oriL, which are about 9 kb apart. Replication from both origins can be bidirectional (61, 241). As seen in Fig. 2, P1 prophage replication appears to have had more impact on the base compositional asymmetries and transcriptional organization of the genome than the theta ({theta}) or rolling-circle ({sigma}) replication that occurs during lytic development. Although oriL is further away than oriR from the region of the C-G skew polarity switch at 64 kb, it is located diametrically opposite to the region of the second switch in the C-G skew polarity, which may be indicative of the terminus of P1 replication. We did not find in this region (or anywhere else in the P1 DNA) the TGTTGTAACTA sequence that in E. coli, Salmonella enterica serovar Typhimurium, and several plasmids constitutes a conserved core of Ter sites (136) at which replication terminates. However, this region is within 3 kb of the P1 site-specific recombination site lox, which is involved in resolution of plasmid multimers into monomers, playing a role analogous to that of the dif site in the E. coli chromosome. The lox site of P1 was shown to suppress the phenotype of an E. coli dif deletion strain in the presence of the lox-specific recombinase, Cre, when lox was inserted in the region of the E. coli terminus of replication (197). This suppression requires the DNA translocase activity of the essential host protein FtsK (47), as does dimer resolution at dif (292).

The major family of strongly skewed sequences in E. coli with the motif 5'-RGNAGGGS (R = purine, N = any base, S = G or C) (265) represents putative sites of binding to FtsK, a protein involved in positioning of bacterial DNA within the cell (64). The strongly biased distribution of these so-called RAG sites in P1 DNA (Fig. 2) possibly argues for a comparable role for FtsK in the life of P1 prophage, although a connection between RAG sites and FtsK is still uncertain.

The most abundant octamer in the genome of P1 corresponds to the recombinational hot spot Chi (5'-GCTGGTGG; 31 and 19 sites in the upper and lower strands [31/19]). Three other octamers in descending order of abundance (5'-TGCTGGTG, 18/14; 5'-CTGGTGGA, 14/8; and 5'-CTGGTGGC, 14/6) contain heptamers (boldface) identical to a Chi heptamer. The canonical Chi sites are two and one-half times more frequent in the P1 genome (one site per 1,872 bp) than in the genome of its E. coli host (one site per 4,598 bp) (35, 291), where they are the third most abundant octamers and sevenfold more abundant than a random octamer would be. P1 could benefit in several ways from the accumulation of Chi sites in its genome. P1 DNA encapsidated in a virion is a terminally redundant linear molecule, which upon entering cells at infection, can serve as a substrate for loading of RecBCD nuclease at its ends and for RecBCD-mediated degradation. Each encounter of RecBCD with Chi modifies the enzyme from its destructive 3'-to-5' nucleolytic form to its recombination-promoting form (reviewed in reference 173) and thus should both protect incoming P1 DNA from further degradation and facilitate, by homologous recombination, its cyclization. The involvement of RecBCD in P1 cyclization could explain the 10-fold-reduced frequency of lysogenization by P1 of E. coli recB mutant cells observed by Rosner (261). Presumably, as in E. coli (181), Chi sites in P1, by their interaction with the RecBCD enzyme, could also facilitate the reassembly of collapsed replication forks. The burst size of P1 is reduced 20-fold in a recBC mutant of E. coli as compared to that in the recB+C+ strain (352).

The distribution of Chi sites in the P1 genome is unequal; the bias is away from the region that includes the invertible segment encoding tail fiber genes and at the right end favors the upper strand, where seven sites are within a 5-kb region around the pac site (Fig. 2). Of these, five are on the lpa side of pac. Their orientation is such that they offer protection against the nucleolytic activity of RecBCD loading onto DNA cut at the pac site and unprotected by packaging.

The two octamers that are the most frequent in the genome of E. coli (5'-CGCTGGCG and 5'-GGCGCTGG) (35) are not among the most frequent octamers in P1. However, their frequencies in both genomes are similar (approximately 1 per 6,000 nucleotides), suggesting that they have related functions. The majority of octamers that are frequent in P1 DNA, other than Chi sites and its variants, correspond to either sequences that are specific for P1 and represent fragments of 17-mer C1 binding sites or 19-mer RepA-binding iterons (Table 1) or fragments of AT-rich regions containing A or T tracts. The most infrequent tetramer in P1 DNA (5'-CTAG) is the same as the most infrequent tetramer in E. coli (35). Seven of its 27 copies are in intergenic regions. Of the remaining 20 that are within 15 protein-coding genes, 9 are within five genes of especially low GC content (three in res, two in mod, one in isaB, two in rlfA, and one in pmgU), which may represent the most recent acquisitions in the genome of P1.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Binding sites in P1 DNA of P1 proteins other than C1 and Lpa

 
(ii)Coding sequences. Analysis of the complete sequence of P1 has revealed 112 protein-coding genes and five genes encoding untranslated RNA (Tables 2 and 3 see also Table S1 in the supplemental material). Of these, complete sequences of 42 protein-coding genes and two RNA genes had been determined previously (149, 187, 190, 346). The remaining 73 genes were identified as described in Materials and Methods. Of the previously published sequences of P1 genes, some differ from those presented here by single-nucleotide changes or their overall length, due either to mutations or to previous sequencing errors.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Alphabetical index of P1 genes

 

View this table:
[in this window]
[in a new window]
 
TABLE 3. Gene products (listed by gene position)a

 
Protein-coding genes account for 92% of the P1 genome. Translation of most of the ORFs is initiated independently from their own ribosome binding and translation start sites (AUG[86]>>GUG[16]>UUG[10]). Translation stops predominantly at UAA and less frequently at UGA and UAG.

Utilization of UUG as the translation initiation codon appears to be more frequent in P1 than in other coliphages (224) and in E. coli itself (35). It was confirmed experimentally that translation of the P1 lxc gene, which encodes a modulator of C1-mediated repression, initiates at UUG (272).

Whereas the translation initiation codons of P1 are generally 6 to 10 nt downstream of typical E. coli ribosome binding sites (RBS), genes plp, pmgO, and pdcA lack nearby RBSs. They are probably translationally coupled to genes immediately upstream (plp) or overlapping (pmgO and pdcA). The gene ref (recombination enhancement function) in a monocistronic operon is more unusual. The sequence that most closely resembles a canonical RBS is separated by 16 bp, including a run of eight T's, from a presumptive ref initiation codon (207, 338). What purpose the extra RNA might serve or how it is accommodated remains to be determined.

Promoter-proximal regions of four genes, pro, hrdC, pmfA, and pmgL, contain, in addition to ATG codons preceded by sequences with a good match to RBSs in mRNA, TTG codons similarly preceded by canonical RBS sequences and further upstream. Possibly, each of these genes encodes two proteins that differ in the length of their N-terminal domains. Alternative translational starts in mRNA at UUG or AUG could control expression of these genes by modulating the intracellular concentration or activity of their protein products.

Although genes of P1 are tightly packed within the genome, overlaps are rare and occur mostly at coding termini. Spaces between genes of adjacent operons are in most cases limited to short regions containing promoter and regulatory sequences. Of those that exceed 250 bp, one contains oriR and another incA, both involved in P1 prophage replication (Table 1). A 272-bp region between the first gene of the sim operon, simA, and c4, was shown to encode the 5' region of the c4 transcript, which is cleaved to yield C4 RNA (115). Two regions within the ban operon (531 bp preceding the trnA gene and 359 bp between the trnI gene and ban, described in detail later in this paper) may also encode species of RNA that are processed.

Of previously identified or predicted products of P1 genes, 65 exhibit significant homology to known proteins of other organisms (Table 3; see also Table S1 in the supplemental material). Only 29 are homologous to proteins encoded exclusively by other phages. Although they include 11 homologs of proteins of T4 or T4-like phages, there is no clear preference in these homologies for proteins of a particular phage, suggesting that P1 has had a long, separate history. Homologs of known genes of different provenance are scattered throughout the P1 DNA. Whether their distribution results from the horizontal transfer of modules during the evolution of the P1 genome or from convergent evolution remains to be determined.

(iii)Organization of P1 genes and transcription units. P1 genes can be grouped into 45 operons, of which 15 appear to be monocistronic. Their organization resembles that of T4 (224) in that only some genes of related function are in clusters.

Regulatory regions of 38 operons contain one or more sequences that resemble strong {sigma}70 promoters (Table 4). Of 26 {sigma}70 promoters identified previously by transcriptional fusion or primer extension experiments, only 9 had a match to the consensus sequence below 50% and required relaxing the stringency of the prediction program for their detection. All nine promoters are associated with genes expressed during lysogeny: three with cre, three with res or mod (including the P1lxc promoter, which can also drive transcription of lxc), one with parB, one with repA, and one with c1. A relatively high predicted strength of known and putative promoters of genes expressed during P1 lytic development implies that as soon as these promoters become accessible to the host RNA polymerase, they can effectively compete with E. coli promoters for the enzyme. Multiple promoters within regulatory regions of several operons indicate a requirement for different controls on the expression of genes at different stages of phage development or plasmid maintenance or a requirement to adapt to alternative hosts or environmental conditions.


View this table:
[in this window]
[in a new window]
 
TABLE 4. Known and putative {sigma}70 promoters of P1 genes

 
Ten promoters, including a promoter of the essential phage replication gene repL, contain the so-called "extended –10 region" characterized by the presence of T and G at positions –15 and –14, respectively. In E. coli, transcription from extended –10 promoters can initiate in the absence of typical –35 hexamers (178).

Sixteen operons end in sequences characteristic of typical Rho-independent transcriptional terminators (Fig. 1 and Table 5). Either termination of some operons of P1 occurs by a Rho-dependent mechanism or certain Rho-independent transcription termination sites are too weak to be detected. Perhaps in the latter case a terminator further downstream is alternatively used. In the case of certain P1 operons expressed during lysogeny (for instance, res and phd doc pdcA pdcB), C1 repressor-binding sites of downstream operons may possibly be used as roadblocks to transcript elongation.


View this table:
[in this window]
[in a new window]
 
TABLE 5. Known and putative Rho-independent transcriptional terminators in P1 DNA

 
Five predicted Rho-independent transcription terminators (Tc8, Tref, TddrA, TplydC, and Tdmt) follow closely upon apparently functional promoters. The locations of those terminators (in a leader sequence or early in a gene) suggest control by antitermination. Indeed, the Tc8 terminator was shown to participate in the regulation of transcription of a gene downstream of c8, ref, by prematurely terminating transcription from one of the ref promoters (338). Of the five terminators, four are positioned in such a way that, in addition to premature termination of ref, mat, ddrB, and dmt genes, they can function to terminate transcription of operons immediately upstream. It is likely that P1 encodes an antitermination mechanism. Whereas no P1 protein has been implicated in the process, antitermination might not require such a protein (330).

As many as 12 predicted Rho-independent terminators (Tasref, Taspro, Taslyz, TasddrB, TasddrA, TaskilA, Tas23, TastciA, Tas7, TaspmgM, TaspmgT, and Tasdoc) are at the beginning of operons but opposite in orientation to the operon. This arrangement suggests that these terminators could terminate transcription of regulatory antisense RNAs. Antisense RNA has been shown previously to regulate the expression of icd ant1/2 and kilA repL (125, 126). The location of the predicted TaskilA terminator is consistent with that expected from the length of the antisense RNA transcribed from the PaskilA promoter (125). The antisense RNA terminated at Tas23 may regulate translation of the 23 gene from an alternative, upstream start site.

Genes expressed during establishment or maintenance of P1 prophage. The alternative life-styles of P1 require two sets of genes with little functional overlap. In general, the genes associated with lysogeny can be distinguished by the absence of regulatory regions resembling those known to control P1 lytic functions, the C1-controlled operators or the Lpa-activated promoters, which will be described later in this paper.

Of 45 P1 operons, 14 are not directly controlled by C1 or Lpa (Fig. 1). Their expression is probably not up-regulated during lytic growth, other than by gene dosage, an effect that is damped in those operons that are subject to autoregulation (ImmC and ImmI genes, repA, parAB, phd doc pdcAB, cin, and probably others).

As many as 20 genes of operons that appear to be independent of C1 and Lpa are either known or predicted to be associated with lysogeny (Fig. 1 and Table 3; see also Table S1 in the supplemental material). We are unable to predict the function of 12 other genes in this class (isaB, upfA, mlp, ppfA, upfB, upfC, plp, upl, upfM, upfN, upfO, and upfQ). However, three of them, mlp, ppfA and plp, encode products that have putative periplasmic transport or lipoprotein attachment signals likely to contribute to lysogenic conversion.

Only six structural genes are essential for stable inheritance of P1 as a prophage. Other genes associated with lysogeny may protect the lysogen from entry of foreign DNA (res and mod; reviewed in references 32 and 38), from infection by another P1 or homologous phages (sim, superinfection exclusion) (170), and from DNA damage (humD, homolog of umuD) (217). Possibly for the sim product(s) to be effective, the gene(s) must be amplified, as superimmunity was observed only with high-copy-number clones of sim. The C-segment inversion recombinase cin is expressed by the prophage, but the known benefit it confers is to produce phages that differ in host specificity from their parent.

(i) Immunity. Whether P1 is maintained in cells as a plasmid or enters the lytic pathway is dictated by the interplay of environmental factors with the components of the immunity circuit encoded at loci designated ImmC, ImmI, and ImmT (Fig. 1 of the accompanying guest commentary [343a]; reviewed in reference 126). Genes of the immunity loci were sequenced and characterized prior to the present work. Analysis of the entire P1 genome allows us to extend this characterization.

(a) ImmC. The key repressor of P1 lytic functions is the C1 protein, encoded by c1 within ImmC (84, 238). Inactivation of the C1 protein or a decrease in C1 synthesis triggers lytic growth of the phage. A commonly used allele of c1, c1-100, encodes a thermosensitive C1 protein (261). In both sequenced strains, c1-100 contained two mutations, T569C and G577T, as compared to what has been taken as wild-type c1 (238). The second of these mutations, which causes the amino acid substitution G193C in C1, probably confers thermosensitivity to the C1-100 protein. The thermolabile C1-100 protein of P1 and the almost identical thermostable C1 protein of P7 do not differ at the locus of the T569C substitution (238).

C1 protein exists as a monomer in solution and binds to a score of widely dispersed operators (27, 59, 84, 193). The 17-bp operator consensus sequence is asymmetric and hence has directionality. Monovalent operators have a single repressor-binding site, whereas bivalent operators consist of two overlapping repressor-binding sites oriented in opposite directions and forming an incomplete palindrome (129) (Table 6).


View this table:
[in this window]
[in a new window]
 
TABLE 6. C1-controlled operators of P1

 
Transcription of the c1 gene itself is autorepressed at operators that precede c1 and are part of ImmC (317, 128). One of these operators, Ocoi, additionally represses transcription of three small ORFs preceding c1 (27, 127). The product of the longest ORF, the 7.7-kDa Coi protein, forms a 1:1 complex with C1 and blocks its ability to bind to operators (25, 127, 130). A combination of negative effects of C1 on the synthesis of Coi and of Coi on the activity of C1 creates a sensitive regulatory system that is crucial for a choice between lysis and lysogeny. If C1 synthesis prevails in P1-infected cells, Coi synthesis is shut down, leading to the establishment of lysogeny. If Coi synthesis prevails, C1 becomes inactivated, leading to lytic growth.

The E. coli SOS response regulator LexA may be a major factor in this epigenetic switch. A LexA binding site identified previously in the P1 genome in the region upstream of the imcAB and coi genes (200) overlaps a predicted strong promoter that we designated P2coi (Tables 4 and 7; reference 343a, Fig. 1). Most likely, this promoter can drive transcription of imcAB and coi following inactivation of LexA. In support of this view, inducibility of P1 lysogens by UV light has been repeatedly reported, although not always reproduced (reviewed in reference 347).


View this table:
[in this window]
[in a new window]
 
TABLE 7. Known and potential binding sites in P1 DNA for E. coli proteins IHF, DnaA, and LexA

 
An additional component of ImmC or a separate immunity locus may be located downstream of cre. This gene, tentatively named c8, is probably the locus (c8) of a clear plaque mutation (280, 281). Temperature shift experiments with a thermosensitive allele of c8 suggested that the gene is probably not expressed in a lysogen and hence is most likely involved uniquely in the establishment of lysogeny.

(b) ImmI. The ImmI region (antipodal to immC in the circular representation of the plasmid genome) contains a C1-controlled operon of three genes: c4, icd, and ant1/2, which confer a separate specificity to immunity. It permits P1 to plate on lysogens of the closely related phage P7, and vice versa, despite functional interchangeability of their C1 repressors (54, 131, 280, 326). The ant1/2 and icd genes determine an antirepressor protein and an inhibitor of cell division, respectively, whereas the c4 gene determines an antisense RNA that acts as a secondary repressor of Icd and antirepressor synthesis (58, 131, 257). Maintenance of lysogeny requires, in addition to the expression of c1, the expression of c4 to prevent synthesis of the antirepressor protein (reviewed in reference 126). A 77-bp antisense RNA encoded by c4 regulates expression of downstream genes in its own message (57; diagrammed in Fig. 1 of reference 343a). Two short single-stranded regions, b' and a', exposed in its cloverleaf-like structure, block translation of icd (and of translationally coupled ant1) by interaction with the complementary regions, a2 and b2, in the c4 icd ant1/2 mRNA. The interaction occludes the RBS in front of the icd gene (58). The resulting translational block additionally permits premature termination of ant transcription via a Rho-dependent terminator (33). Slight differences between P1 and P7 in these short regions are responsible for the heteroimmunity of the two phages. An additional specificity determinant, sas (site of Ant specificity), is involved in determining the capacity of Ant proteins to induce prophage on heteroimmune superinfection. The site resides centrally within the ant genes of P1 and P7 (J. Heinrich and H. Schuster, unpublished results). It was suggested that Ant proteins must normally load onto their phage-specific sas sites to perform their antirepressor function (294).

The action of C4 appears limited to the mRNA of its own operon, as we did not find any pair of sequences complementary to b' and a' in the P1 genome, other than those designated a1 and b1, located upstream of c4, and presumed to participate in unmasking of a2 and b2 by competitive binding of b' and a'.

(c) ImmT. The ImmT region, distant from ImmC and ImmI, contains a small gene, lxc, whose product modulates the function of C1 by formation of a ternary complex with C1 and operator DNA (272, 317, 318). Participation of Lxc in the C1-operator complexes that control c1 expression increases the affinity of the repressor for the operator and down-regulates the synthesis of C1 itself (318). Lxc can thus relieve repression at weak operators by decreasing the intracellular concentration of C1 below that critical for their binding and simultaneously enhance repression at strong operators by increasing their affinity for C1. Consequently, lxc mutations can have divergent effects on transcription from different C1-controlled promoters (272, 273, 311). Part of the modulating effect of Lxc may be mediated by looping between nearby operators complexed with C1, as looping is dramatically increased by Lxc (128, 343a). As Lxc has been reported to strongly inhibit the ability of Coi to dissociate operator-C1 complexes (317), it may assist in establishing P1 as a prophage.

The initiation codon of lxc appears to overlap with the termination codon of the preceding ulx gene of unknown function, indicating transcriptional and translational coupling of the two genes (Table 3). The coupling of lxc and ulx implies that lxc, in addition to being transcribed from its own promoters, may be transcribed from PdarB, which presumably drives transcription of ulx and the preceding gene, darB (Table 4). None of the promoters that might drive transcription of lxc is under the control of C1, suggesting that Lxc-mediated modulation of C1 function is independent of ImmC and ImmI.

(ii) Plasmid maintenance. Despite its low copy number per bacterial origin (0.7 to 1.4) (248), the P1 plasmid is lost with a frequency of only about 10–5 per cell per generation (261). Like other plasmids that have accepted the tradeoff of large size for low copy number, P1 must counteract the increased risk of loss at cell division. Four functions have been identified that allow it to accomplish this task: a regulated plasmid replicon, a partitioning mechanism, a site-specific recombination system, and a plasmid addiction (postsegregational killing or growth inhibition) mechanism. These are discussed below. Analysis of the genome of P1 indicates that whereas the lytic origin may contribute to plasmid replication (349), there is no second addiction or partition module in P1 as there are in certain other plasmids (80, 140).

(a) Plasmid replication and partition. The P1 plasmid replication and partition genes are encoded within one region of P1 DNA, where they form two operons transcribed in the same direction. The P1 replicon has been intensively studied (reviewed in references 49 and 51). It belongs to the large family of plasmid replicons that encode an initiator protein, called RepA in P1, and contain multiple binding sites for that initiator (19-bp iterons in P1). One set of these sites, incC, precedes the repA gene and is part of the replication origin (oriR). A second set, incA, follows repA and is a regulatory locus (Table 1). Replication from the P1 plasmid origin, oriR, proceeds in both directions (241). P1 plasmid and E. coli chromosome replication have much in common, including sequestration of the origin when its 5'-GATC sequences are hemimethylated (1) and a requirement for DnaA (Table 7) (113). One difference is that the ADP form of DnaA suffices for P1 replication, whereas the ATP form is essential for bacterial replication (333).

The P1 repA promoter is nested among the incC iterons and consequently is subject to autorepression by RepA (5). Replication control is exerted by incC and the additional iterons of incA in more than one way. By sequestering RepA, the incA iterons can prevent the concentration of RepA at the origin from attaining the threshold value required for firing (reviewed in reference 49). By becoming "handcuffed" to the iterons of incC via bound RepA, the additional iterons can interfere with both replisome assembly (239) and the release of autorepression that would otherwise replenish sequestered RepA (50). The availability of RepA for replication is limited not only by sequestration and handcuffing but also by the formation of inactive RepA dimers which require bacterial chaperones for conversion to active protein monomers (334). We suggest an additional role of incA in the control of repA expression. The putative transcriptional terminator of repA is located within the incA region (Table 5), and thus occupation of incA iterons by RepA may act as a roadblock to the completion of repA transcripts. Abbreviated repA transcripts might undergo more rapid degradation than intact transcripts.

The partition module of P1, which is situated downstream of the repA gene and the incA iterons, is a major contributor to stability of the plasmid. It ensures partition of plasmid molecules to daughter cells and has been, with its homologs, the subject of numerous detailed studies and reviews (reviewed most recently in references 97 and 306). The module consists of an operon of two genes, parA and parB, which ends at a site, parS, a centromere analog (4). ParA is an ATPase whose activity is stimulated by ParB (73). Both ATP and ADP forms of ParA act as autorepressors, the ADP form being the more effective (70, 71). A stimulation of repression by ParB (87) is assisted by parS (114). The parS site contains two kinds of recognition sequences for ParB, four heptameric and two hexameric boxes, and a binding site for the host protein IHF (Tables 1 and 7) (249). ParB and IHF form a complex at parS (72, 89) to which ParA, in the ATP form, can bind via ParB (37, 90). Binding of ParB to parS can permit parS sites to pair (83) and can nucleate spreading of ParB to the flanking DNA and silence transcription of genes as far as several kilobases away (258). Whether ParA and ParB translocate plasmid molecules to their target positions within a cell during partitioning by themselves or attach them to unknown host components of partition machinery is unclear.

We did not find in P1 DNA any additional sequences similar to parS that could serve as potential binding sites for ParB. This suggests that the function of ParB is limited to partitioning in P1, whereas in certain other plasmids, e.g., RK2 and N15, the homologous proteins appear to have additional functions (102, 251, 335). Two putative {sigma}70 promoters of P1 gene 23, immediately downstream of parB, of which one overlaps the IHF binding region of parS, appeared weak when tested in vivo in the absence of ParB (259; A. Dobruk and M. Lobocka, unpublished results). The possibility that conditions might exist leading to their significant expression and regulation by ParB has not been studied.

(b) DNA cyclization and multimer resolution. Faithful partitioning of newly replicated plasmid molecules is assisted by prior resolution to monomers of plasmid multimers that are inevitably formed in recA+ bacteria. The resolution is usually accomplished by a site-specific recombination. In P1 it is ensured by the recombinase Cre, encoded near its site of action, lox (8, 299, 300), far from the genes of replication and partition (Fig. 1). The resolution of plasmid multimers into monomers increases the number of partitionable P1 molecules and hence plasmid stability (21). The same accessory proteins that are required for the resolution of ColE1 dimers by the bacterial recombinase XerCD are also required for the stable maintenance of P1 prophage (244). These proteins (ArgR, for which a putative binding site to the left of lox has been located, and PepA) are proposed to constrain the directionality of the recombination in vivo. The possibility that the lox-cre module might assist plasmid stability in another way has also been suggested (10). Interplasmid recombination involving a replicating plasmid could generate a concatemeric replication substrate that, following amplification as a rolling circle, could be resolved into plasmid monomers by Cre-mediated recombination. The plasmid yield per replication initiation event would be increased, providing protection against RepA insufficiency.

The lox site is the unique target for Cre action in P1 DNA, as we did not find any other similar sites. In the P1 genome, the cre gene is downstream of cra, a gene of unknown function. In the prophage, cre is weakly expressed, either from promoters P1cre and P2cre, which lie within cra, or together with cra from a third promoter, P3cre (302). The –35 regions of P3cre and P2cre promoters overlap GATC sequences (Table 4), and P3cre was shown to be down-regulated in E. coli by Dam methylation, as P2cre probably also is. Whether methylation of one strand is sufficient for the repression is unknown. If this was not so, the expression of cre in a prophage would increase immediately following each round of replication and be attenuated later as a result of methylation of the newly replicated strand in the region of P3cre and P2cre promoters.

A putative fourth cre promoter, P4cre, predicted to be the strongest, lies almost 500 bp upstream of cre and on the opposite side of lox (Table 4). Its –35 region is partially overlapped by the C1 operator, Ocre (Table 6), indicating that the promoter is inactive in the P1 prophage. Apparently, the cre gene is expressed most efficiently at the time of infection and during P1 lytic development. Following infection, the Cre recombinase can cyclize the DNA of phages that carry a lox site on each redundant end (302). The first phages to be packaged belong to this privileged minority and hence are to be found in every productive burst. The other phage DNAs must rely on homologous recombination to cyclize.

(c) Plasmid addiction. A fail-safe P1 stabilization mechanism is encoded by the addiction operon, phd doc, located in a region far from other plasmid maintenance genes (Fig. 1). The phd doc operon programs the inhibition of growth (and the eventual death) of any daughter cells emerging plasmid free. Two small genes of this operon encode a stable protein toxin, Doc, and an unstable cognate antidote, Phd (95, 96, 192, 211). Doc appears to block protein synthesis reversibly (R. Magnuson and M. Yarmolinsky, unpublished results). Phd and Doc can form a complex. This interaction prevents Doc from killing plasmid-containing cells and enhances the Phd-mediated repression of the autoregulated phd doc promoter (210, 211). Phd is slowly degraded by the host protease ClpXP (194), allowing for the release of the toxin protein upon plasmid loss. Recently documented connections among prokaryotic toxin-antitoxin systems, Phd-Doc among them, and their relationship to the eukaryotic nonsense-mediated RNA decay system places Phd-Doc in a grander evolutionary context than anticipated (13).

The nucleotide sequence downstream of doc contains two ORFs that could encode proteins of 66 and 347 residues, designated here pdcA and pdcB (post-doc). The 5' end of pdcA overlaps 17 terminal nucleotides of the doc gene. The phd-doc transcript appears to include the transcripts of pdcA and pdcB, as suggested by the lack of predicted Rho-independent terminators of the phd doc operon within pdcA or pdcB, and the lack of a separate recognizable {sigma}70 promoter sequence that could drive their transcription. Whether the functions of pdcA and pdcB are accessory or unrelated to the function of phd doc remains to be seen.

(iii) Restriction-modification. P1 mod and res are tandem genes that encode the subunits of a type III restriction-modification enzyme, EcoPI. This bifunctional enzyme has been the subject of extensive studies (reviewed in references 32 and 38; see also reference 161).

The P1 Mod subunit recognizes the DNA sequence 5'-AGACC and catalyzes methylation of the central adenine residue at the N-6 position (23), using adenosyl methionine as the methyl donor. The Res subunit catalyzes the double-strand cleavage of DNA about 25 bp to the 3' side of a recognition sequence, but it does so only when bound to the Mod subunit in a Res2Mod2 complex (161).

Whereas methylation by Mod can occur at any recognition sequence, each double-strand scission by Res-Mod requires a pair of unmethylated recognition sequences, in head-to-head configuration (32, 219). The cut appears to be triggered by collision between two converging Res-Mod enzymes that, powered by ATP hydrolysis, have been translocated along the DNA (220). The requirement for collision, in combination with the asymmetry of enzyme recognition sites, provides an efficient protection from cleavage of newly replicated DNA in which only one strand is unmethylated, because the unmodified sites in one orientation are always paired with modified sites in the opposite orientation.

Host killing by restriction is avoided during the process of establishing a P1 prophage in a bacterial cell with an unmodified chromosome. This is achieved by complex regulatory mechanisms that delay restriction long enough to allow complete methylation of host DNA (253). In P1 lysogens, transcripts containing either mod or res messages are detected. This indicates that mod and res form separate operons, although the beginning of res immediately follows the end of mod (285). The {sigma}70 promoters for res, separate from those for mod, have been identified previously or are predicted here (Table 4). Additional regulation occurs at the translational and posttranslational levels (253).

(iv) Superinfection exclusion. During attempts to clone the c1 gene of P1, Devlin et al. isolated a fragment of DNA that, although not carrying c1, protected cells from superinfection by wild-type P1, its c1 and virs mutants, and the heteroimmune P7 phage (77). This extended immunity phenotype, designated superimmunity and subsequently attributed to superinfection exclusion, was associated with a gene upstream of the c4 icd ant operon and transcribed in the opposite direction (170, 213). We find that this gene, designated previously sim and here simC, is the last of three genes of the simABC operon. The sim genes appear to encode precursors of periplasmic proteins, suggesting that their functions are related. The location of putative promoter sequences in the region preceding simC indicates that simC is cotranscribed with simB, or with simA and simB (Table 4). A plasmid carrying the sim operon was found by minicell analysis to specify a processed, as well as full-length, form of the SimC protein (213). Two additional proteins apparently expressed from this clone were seen following sodium dodecyl sulfate (SDS)-PAGE (170). One is probably SimB (12.0 kDa). The other migrated as a very small protein and, although not mentioned by the authors, is probably SimA (predicted molecular weight, 4.8 kDa).

The superimmunity (or superinfection exclusion) phenotype was observed in the presence of SimC alone (213). The SimC protein appears to be an analog of superinfection exclusion proteins that act to prevent injection of superinfecting phage DNA into the cytoplasm of the infected cell. Many phages, both temperate and virulent, encode such proteins. SimC appears to act in the periplasmic space, where the processed form is found, or in the cytoplasmic membrane, similarly to the SieA protein of P22 (139) and the Imm protein of T4 (205), either by helping to destroy the injected DNA or by preventing its entry into the cytoplasm (139, 170, 206, 213). The processing of SimC requires SecA (213), as probably does the processing of SimA and SimB, which, like SimC, have putative signal peptides.

Products of the sim operon, like the product of sieA of P22 (139, 307), exclude both phage and transducing particle DNA. However, exclusion of the latter by Sim is much less efficient, indicating that the Sim system can discriminate between P1 and foreign DNA. Conceivably, SimC or another Sim protein can interact with a P1 protein that is bound specifically with phage DNA as it enters a cell during infection.

Cells carrying low-copy-number sim+ plasmids display a Sim phenotype (77, 170), which appears inconsistent with a role of the sim genes in lysogeny. It is possible that in P1 lysogens, the sim functions are induced only under certain circumstances or that the main role of Sim proteins is to protect from superinfection those cells in which lytic development has been initiated and the sim gene dosage is high. The physiological role of sim functions could be analogous to that of superinfection exclusion functions encoded by the immT and sp genes of the lytic phage T4. It has been proposed that superinfection exclusion mechanisms may preserve genetic diversity among phages, protecting phage populations from dominance by a phage that would otherwise outcompete any related phage when given a possibility to propagate in a superinfected cell (205).

(v) HumD and Lxr. The monocistronic operon immediately upstream of the P1 addiction module encodes HumD protein, a homolog of E. coli UmuD' (200). The P1 humD gene, like E. coli umuD, is transcribed from a LexA-regulated promoter (Tables 4 and 7). P1 lacks a homolog of umuC, which in E. coli is functionally associated and cotranscribed with umuD.

Aside from recA, umuD and umuC of E. coli are the only LexA-regulated genes of the SOS response that are required for DNA damage-induced mutagenesis (290). They encode a low-fidelity and low-processivity DNA polymerase, PolV, whose main function is to bypass DNA lesions (309). A RecA-mediated autocleavage of the umuD gene product (215) is required to form active PolV, which is a complex of UmuD' and UmuC in a 2:1 ratio (43). P1 HumD corresponds to the processed form of UmuD, UmuD', and can functionally replace it (217).

The regulatory region of humD overlaps the regulatory region of a divergently transcribed gene of unknown function. This gene could encode a 190-residue, slightly acidic protein that has no homologs in databases. We surmise that its transcription, like that of humD, is negatively regulated by LexA. One of two putative {sigma}70 promoters of this gene overlaps the LexA binding site that controls humD (Tables 4 and 7), and thus we name the gene lxr (LexA regulated).

Genes expressed in lytic development. Expression of the majority of P1 genes during lytic growth follows a strict temporal pattern that can lead to the production of mature phage particles within less than 1 h. For many bacteriophages such as T4 (180), Mu (18, 55), P2 (339), and P4 (98), the regulatory cascade has been subdivided into early, middle, and late stages. In P1, the cascade appears to be simpler; early transcription switches directly to late transcription, without a well-defined intermediate stage (188, 193). C1 operator sequences, which bind the primary phage repressor C1, act as a switch of early genes. Lpa (late promoter activator) binding sequences, which are in the –22 region of late promoters and enable RNA polymerase to initiate transcription upon binding of Lpa, act as a switch of late genes.

(i) C1-controlled operons. Analysis of the entire P1 genome has revealed 17 operons that have C1 operators in the region of their {sigma}70 promoter sequences or, in some cases, between promoters and proximal genes (Table 6). One of these operons (pmgR) is preceded by two nonoverlapping operators, and two others (ban and c1) are preceded by operators that are bivalent, bringing the total number of monovalent C1-binding sites to 20, all of which were identified previously. Although we did not find any additional sequences with strong similarity to known C1-binding sites, sequences that resemble the C1-binding sites but are missing the highly conserved C at position 7, are present in the parS site and in the promoter region close to the darB gene. The sequence in parS did not interact with C1 in vitro (N. Sternberg, personal communication). Whether these sequences can act cooperatively with other C1 operator sequences to bind C1 in vivo, as does the Oc1b operator, which also has the C at position 7 replaced by another base, remains to be seen. Looping between C1-bound operators located close to each other has been demonstrated (128), but looping between well-separated operator sequences has not.

The 17 C1-controlled operons, transcribed from {sigma}70 promoters, contain 49 genes that are derepressed early in lytic development. Transcription does not start simultaneously, since the operators exhibit different affinities for C1 and are differently influenced by the Lxc protein, which modulates the C1-operator interaction. Lytic replication genes start to be transcribed within 5 min of infection, allowing a prompt initiation of phage DNA replication. Increased amounts of P1 DNA can be detected about 15 min after infection, at about the same time as cleavage of the packaging site, pac, in P1 DNA (297). Detection of DNA synthesis within 5 min has also been reported (282). Expression of other early functions was reported to start 10 to 15 min after induction of P1 lysogens (187). The intense transcription of P1 early functions is attenuated later in lytic development, prior to transcriptional activation of late genes (111). The attenuation depends on the E. coli RNA polymerase-associated protein SspA. Whether SspA acts directly on complexes of RNA polymerase with P1 {sigma}70 promoter sequences or requires the action of a P1 protein is unknown. We did not find any conserved sequences, other than promoter sequences and C1-binding sites, in the regulatory regions of P1 early genes.

(a) P1-encoded tRNAs. Several phages encode their own tRNAs (e.g., see references 88, 177, and 247); the T4-like vibriophage KVP40 encodes at least 25 tRNAs at a single locus (223). Some of these tRNAs supplement the host pool of rare tRNAs to facilitate efficient expression of selected phage genes during lysogeny or at certain stages of lytic development (179, 247). Additionally, it has been proposed that phage tRNAs can provide selective pressure to keep base composition and codon usage of phage DNA significantly different from those of its host (summarized in reference 227). We find that the P1 genome contains three sequences characteristic of tRNA genes. All three are located between the ban gene and its promoter, in the apparently untranslated regions of the ban operon. Two are adjacent; one is further downstream, separated from the two by three protein-coding genes. Control of the P1 tRNA genes from a promoter repressed by C1 implies that they are not expressed during lysogeny.

The proposed cloverleaf structures of all predicted P1 tRNAs are shown in Fig. 3. The first tRNA, tRNA1, has the anticodon GUU, characteristic of E. coli tRNAAsn. It is 90% identical to the predicted tRNAAsn of the Salmonella enterica serovar Typhi plasmid pHCM2 (242) (GenBank accession no. AL513384) and contains all bases essential for the aminoacylation of tRNAAsn in E. coli: the anticodon bases G34, U35, and U36 and the discriminator base G73 (201).



View larger version (17K):
[in this window]
[in a new window]
 
FIG. 3. P1-encoded tRNAs and codons presumably recognized by them. The identity determinants of tRNAs are marked by asterisks. The putative modification of tRNA3 by lysinylation at the 2 position of cytosine alters the recognition specificity of the anticodon. The bar graphs compare the usage frequencies of codons recognized by P1 tRNAs in protein-coding genes of P1 (black bars) and E. coli K-12 (grey bars) per 1,000 codons. The usage frequency of a particular codon in P1 is its relative abundance among codons in genes that encode proteins. The usage frequency of a particular codon in E. coli K-12 is according to the March 2004 edition of the Codon Usage Database found at its website (http://www.kazusa.or.jp/codon/).

 
The second tRNA, tRNA2, has the anticodon UGU, characteristic of E. coli tRNAThr. Although it is highly homologous to alanine tRNA-UGC of a cyanobacterium, Synechocystis sp. (SYCSLRB; 83% identity), and to tRNAAla of numerous other organisms, it contains neither the G3-U70 wobble pair, which is the primary determinant of the acceptor identity of the E. coli tRNAAla, nor the discriminator base G20, which is also an important characteristic of this tRNA (250, 308). Instead, in addition to the anticodon bases G35 and U36, which are identity determinants of all E. coli tRNAThrs, it contains base pairs G1-C72 and C2-G71, which in E. coli tRNAThr, are crucial for threonine charging activity (116).

The third tRNA, tRNA3, has the anticodon CAU, which should correspond to the Met codon AUG. However, in its sequence this tRNA appears to be 89% identical to the E. coli tRNAIle-2, encoded by the ileX gene. None of the differences (outside the anticodon) between tRNA3 and tRNAIle-2 affect residues known to be essential for the function of E. coli tRNAIle-2. In the tRNAIle-2 of E. coli the wobble position C34 is modified by lysinylation to lysidine (i.e., 2-lysylcytidine). The modified anticodon acquires the AUA (Ile)-decoding capacity and is required for the recognition of this tRNA by isoleucyl-tRNA synthetase (231, 232, 283). The other identifying characteristics of E. coli tRNAIle are the anticodon loop bases A37 and A38, the discriminator base A73, and the base pairs C4-G69, U12-A23, and C29-G41 (236, 240). All of them are present in P1 tRNA3. Certain other phages, including 933W (247), T4 (224), and the T4-like phages RB69 (see http://phage.bioc.tulane.edu/) and KVP40 (223) also encode homologs of E. coli tRNAIle-2, predicted to contain the lysylcytidine modification at C34. The T4 homolog of E. coli tRNAIle-2 was confirmed to have the capacity to decode ATA (274).

Of the codons that are apparently recognized by P1 tRNAs, two, ACA and ATA, are rare in E. coli genes but overrepresented in certain P1 genes (Fig. 3). The ACA codon is the rarest threonine codon in E. coli, but its effect on translation has not been studied. The ATA codon, which is the fifth rarest codon in E. coli, is known to dramatically decrease translation of those E. coli mRNAs that contain it, especially when present in multiple copies, or in tandem, or in a single copy in the anterior part of a gene (99, 166, 353). The ATA codons are overrepresented more than twofold in 59 of 113 protein-coding genes of P1 as compared to their representation in E. coli genes. Four of these 59 genes (isaB, rlfA, 7, and pacB) contain both single ATAs and tandem ATAs. In eight, ATAs are among the first five codons of the proximal region (at position 5 of lydC, 4 of 21, 2 of ppfA, 3 of tciB, 2 of ppp, 4 of doc, 4 of pdcB, and 2 of c1). At least two of these genes, doc and c1, are known to be expressed, albeit not efficiently, during lysogeny, indicating that the low levels of ileX tRNA that are normally present in E. coli cells are sufficient for their translation, since tRNA3 is not expressed. However, the known translation-limiting role of ATA codons in E. coli implies that the efficiency of translation of doc, c1, and other genes rich in the ATA codons increases under conditions of abundance of P1 tRNA3. In the case of doc and c1, the physiological significance of such an increase, if it does occur, is unclear. In the case of certain other genes, the requirement for abundant tRNA3 to permit their efficient expression may provide an additional control on the timing of P1 lytic development at the translational level. All P1 tRNA genes appear to be transcribed from the C1-controlled promoter of the ban operon and so become available early during lytic development.

It is noteworthy that lpa, whose product activates transcription of P1 late genes, has in its proximal region three rare E. coli codons, recognized by P1 tRNAs, namely ATA at position 13 and an ATA ACA pair at positions 18 and 19. Whether the abundance of P1 tRNA3, and perhaps tRNA2, early in P1 lytic development normally contributes to efficient translation of lpa mRNA, which in turn ensures efficient transcription of late genes, remains to be tested.

Insertions of numerous phages into bacterial chromosomes occur at or near tRNA genes (reviewed in reference 46). Conceivably, tRNA genes could have been acquired by a P1 ancestor either through its integration and then imperfect excision from the DNA of its host or via recombination from other phages whose prophages can integrate into a bacterial chromosome.

(b) C1-controlled replication functions. P1 DNA synthesis has been reported to start as early as 5 min after infection of E. coli (282). Lytic replication initiates bidirectionally in the theta mode from an origin, oriL, distinct from the plasmid origin and located within the essential replication gene, repL (61, 112, 295). Later, lytic replication switches into the rolling-circle mode (60). The shift is accompanied by a gradual decrease in accumulation of {theta}-shaped replication intermediates as