ABSTRACT
A DNA fragment carrying the genes coding for EcoO109I endonuclease and EcoO109I methylase, which recognize the nucleotide sequence 5′-(A/G)GGNCC(C/T)-3′, was cloned from the chromosomal DNA of Escherichia coli H709c. TheEcoO109I restriction-modification (R-M) system was found to be inserted between the int and psu genes from satellite bacteriophage P4, which were lysogenized in the chromosome at the P4 phage attachment site of the corresponding leuX gene observed in E. coli K-12 chromosomal DNA. Thesid gene of the prophage was inactivated by insertion of one copy of IS21. These findings may shed light on the horizontal transfer and stable maintenance of the R-M system.
Escherichia coliH709c has been widely used as an antigenic tester strain of theE. coli O109 group, and the serotype formula of this strain was established as O109:K(−):H19 by Ørskov et al. (23). A type II restriction endonuclease, R.EcoO109I, which recognizes and cleaves the nucleotide sequence of 5′-(A/G)G↓GNCC(C/T)-3′, has been isolated from E. coli H709c (21). It has been reported that R.EcoO109I cleavage is inhibited by the modification of the outer cytosine in the recognition sequence (29). However, neither the position nor the products of methylation by the cognate methyltransferase, M.EcoO109I, have been determined yet.
To date, about 150 type II restriction-modification (R-M) genes have been cloned and their nucleotide sequences have been analyzed (27). The genes coding for endonuclease and methyltransferase are closely linked on either chromosomal DNA or plasmid DNA. Of the 167 type II R-M enzymes isolated from E. coli, 11 genes have been cloned and their nucleotide sequences have been analyzed. The EcoRI (22),EcoRV (3), EcoRII (16), andEco29kI (40) systems are encoded by plasmid DNA, whereas the EcoHK31I (17) system is encoded by chromosomal DNA. The characterization of type II R-M systems has shown that some systems contain other components in addition to the requisite endonuclease and methyltransferase. One of these is the C element, which is known to activate R expression in the BamHI andPvuII R-M systems (12, 33). Genes encoding proteins involved in DNA mobility, such as transposases, integrases, and invertases, are sometimes found in the vicinity of R-M systems located on chromosomal DNA (1, 5, 13, 17, 31, 35). These proteins might facilitate the transfer of R-M genes among different bacterial strains.
In this study, we report the cloning and characterization of theEcoO109I R-M system and the location of the system on the chromosome. The nucleotide sequence adjacent to the R-M system has led to interesting speculation about the evolutionary history ofEcoO109I.
Purification of R.EcoO109I and M.EcoO109I.R.EcoO109I and M.EcoO109I were partially purified from the cell extracts by combined chromatography on DEAE-Sephacel, phosphocellulose, hydroxylapatite, and heparin-Sepharose. When the peak fractions were electrophoresed on a sodium dodecyl sulfate (SDS)-polyacrylamide gel, R.EcoO109I and M.EcoO109I produced major bands of 32.5 and 45 kDa, respectively. The corresponding bands were blotted onto a polyvinylidene difluoride membrane (20) and then subjected to N-terminal amino acid sequence analysis. The first 20 amino acids of R.EcoO109I and M.EcoO109I obtained on Edman degradation were Met-Asn-Lys-Gln-Glu-Val-Ile-Leu-Lys-Val-Gln-Glu-Xxx-Ala-Ala- Trp-Trp-Ile-Leu-Glu and Ser-Ser-Lys-Lys-Phe-Ile-Ser-Leu- Phe-Ser-Gly-Ala-Met-Gly-Leu-Xxx-Leu-Gly-Leu-Gln (Xxx, not identified), respectively.
Isolation of EcoO109I R-M genes.To isolate the two genes, oligonucleotide N1 (Table 1) was synthesized from the N-terminal amino acid sequence of R.EcoO109I and used as a probe for Southern hybridization with E. coli H709c chromosomal DNA digested with various restriction endonucleases. The 4.8-kb BglII fragment was cloned into the BamHI site of the pUC118 vector to obtain pUC-B1. The purified plasmid DNA from the clone was digested with R.EcoO109I, and the plating efficiency of λ virulent phage for the cells carrying pUC-B1 was the same as that for control cells carrying no plasmid. These results suggested that a partial, i.e., not the complete, EcoO109I R-M gene was located on the 4.8-kbBglII fragment. In order to find longer DNA fragments carrying genes encoding complete EcoO109I R-M enzymes within the E. coli H709c chromosomal DNA, the 9-kb BamHI fragment was cloned into the BamHI site of the λEMBL3 vector to obtain EMBL3-25. DNA was purified from the phage, and the 5.8-kb EcoRV-BamHI fragment was analyzed in detail (Fig. 1). The 3.1-kbEcoT22I fragment was inserted into the PstI site of pKF3 and the resulting recombinant plasmid, pKF3-1, was transferred to E. coli TH2. R.EcoO109I activity in the cell extract was assayed at 37°C by adding 2 μl of enzyme solution to 15 μl of reaction mixture (10 mM Tris-HCl [pH 7.5], 10 mM MgCl2, 1 mM dithiothreitol, and 0.5 μg of T4 cytosine-containing DNA [dC DNA]). M.EcoO109I activity was assayed as the susceptibility of pKF3-1 to R.EcoO109I. The colonies carrying the plasmid expressed both endonuclease and methyltransferase activities. These results indicated that the 3.1-kb region is essential for encoding both the restriction endonuclease and the methyltransferase.
Bacterial strains, plasmids, phages, and oligonucleotides
Restriction map of the 9-kb BamHI fragment. The positions and orientation of the R.EcoO109I (ecoO109IR) and M.EcoO109I (ecoO109IM) genes, as well as those of ORF1, ORF2, and ORF3, are indicated by arrows.
Nucleotide and deduced amino acid sequences.The DNA sequence of the 3.1-kb EcoT22I fragment that covers the entireEcoO109I R-M gene is shown in Fig.2. The two open reading frames (ORFs) were aligned tail to tail, and a 38-bp spacer region was found between them. A putative palindromic sequence, which is found in theStsI R-M system (15), was seen within the spacer region; this could be the transcriptional termination site for both genes. In the ORF assigned to the endonuclease gene, an ATG codon appeared at nucleotide position 762 and a termination codon at nucleotide position 1578. In addition, an appropriate ribosome-binding sequence, GGA, was present 11 bp upstream of the ATG codon. The ORF consisted of 816 bp and encoded a 272-amino-acid-residue polypeptide. The predicted mass, 31,435 Da, was close enough to the value estimated on SDS-polyacrylamide gel electrophoresis. R.EcoO109I exhibits identity with R.SinI (13) and R.Eco47I (31), which recognize G ↓ G(A/T)CC, and R.Sau96I (32) and R.Eco47II (31), which recognize G ↓ GNCC.
Nucleotide sequence of the 3,142-bp EcoT22I fragment. The amino acid sequences assigned to ecoO109IR and ORF1 are given below the nucleotide sequence, and the sequence assigned to ecoO109IM is given above the nucleotide sequence. The nucleotide sequence is numbered from the leftmost end, and the amino acid sequences of ecoO109IR and ecoO109IM are numbered from the initiation codon of each gene. The potential ribosome-binding sequences are dotted. A pair of arrows indicates palindromic sequences characteristic of the termination signal.
In the ORF assigned to the methylase gene, a TTG codon appeared at nucleotide position 2863, an ATG codon appeared at nucleotide position 2823, and a termination codon appeared at nucleotide position 1620. The ORF consisted of 1,242 bp and encoded a 414-amino-acid-residue polypeptide. The predicted mass, 45,701 Da, was close enough to the value estimated by SDS-polyacrylamide gel electrophoresis. An appropriate ribosome-binding sequence, AGG, was present 8 bp upstream of the TTG codon. M.EcoO109I exhibits identity with M.SinI (13), M.HgiBI (6), M.HgiCII (7), and M.HgiEI (8), which recognize GG(A/T)CC, and M.Sau96I (32), M.Eco47II (31), and M.PspI (26), which recognize GGNCC. All these methylases catalyze the formation of 5-methylcytosine at the inner cytosine (24).
Sequences flanking the R-M system.Two additional ORFs (ORF1 and ORF2) and one ORF (ORF3) were discovered upstream of the R.EcoO109I and the M.EcoO109I genes, respectively. ORF1 (303 bp) was discovered upstream of the R.EcoO109I gene and partially overlaps the gene. A molecular mass of 11,455 Da is in good agreement with the predicted sizes of other C proteins, which associate with several type II R-M systems and regulate the expression of R-M genes (1, 37). ORF1 appears to be distantly related to known C proteins but shows homology to the DNA-binding domains of various regulatory proteins (18, 36, 38). The role of ORF1 is under investigation.
One ORF (ORF2; 1,284 bp), which showed significant homology to the integrase from the P4 phage (9), was identified upstream of ORF1. Furthermore, BLAST searches of the GenBank database revealed that a DNA sequence similar to that of the P4 phage attachment site followed by the E. coli K-12 MG1655 leuX gene (2) was found upstream of the int gene. The 190-amino-acid polypeptide (ORF3) encoded upstream of the M.EcoO109I gene also exhibited similarity with thepsu gene product from the P4 phage. The DNA upstream of M.EcoO109I also contains sequences that exhibit identity with the cos sequence and δ genes from the P4 phage. From these results, we assumed that the DNA of the hybrid P4 phage, in which the cII, β, and gop genes were replaced by R-M genes (including ORF1), was inserted into E. coli H709c chromosomal DNA through site-directed recombination catalyzed by P4 integrase.
In order to confirm that the complete P4 genes, except for thecII, β, and gop genes, were inserted into theE. coli H709c chromosome, we synthesized several oligonucleotides based on the DNA sequence of P4 phage DNA and used them for PCR (Table 1). PCR was carried out with ExTaq DNA polymerase and an LA-PCR kit (Takara Shuzo Co. Ltd., Kyoto, Japan) as recommended by the manufacturer. The lengths of the fragments amplified from E. coli H709c DNA with the δ-N, δ-C, sid-N bottom, and Att oligonucleotides were the same as expected from the nucleotide sequence. The restriction profiles of these fragments were the same as expected. However, the fragments amplified with the δ-C and Att oligonucleotides were 2 kb longer than expected. These results suggested the possibility of rearrangement or insertion of a new sequence into the sid gene. The nucleotide sequence of the 3-kb DNA fragment amplified with the sid-C and sid N-top oligonucleotides was analyzed, and it was shown that one copy of IS21 (25) was inserted between nucleotides 312 and 313 in the top strand and between nucleotides 308 and 309 in the bottom strand of the sid gene, which generated a frameshift mutation of the sid protein. The results are summarized in Fig. 3.
Schematic diagram of the EcoO109I R-M genes on the E. coli H709c chromosome. (A) Gene order of the P4 phage integrated into the E. coli chromosome. (B) Gene order of the E. coli H709c chromosome adjacent toEcoO109I R-M genes. The genes, their directions of transcription, and cos and phage attachment sites (att) are shown. Genes that are necessary and unnecessary for lytic growth of the P4 phage are shown by black and dotted arrows, respectively. The positions where the R-M system is integrated into P4 are indicated by open triangles. R and M represent ecoO109IRand ecoO109IM, respectively. The locations of the PCR primers used for amplification of the genes are also indicated.
Absence of type I R-M genes and a methylation-specific restriction system in E. coli H709c. E. coli has been shown to contain type I R-M systems, such as EcoK,EcoB, and EcoIC, as well as restriction systems requiring modification, such as Mcr and Mrr. In order to determine whether or not E. coli H709c possesses one of those genes for type I R-M systems and systems requiring modification, in addition to the type II EcoO109I R-M system, we amplified these genes by PCR. Primers based on the nucleotide sequence of E. coli K-12 MG1655 were designed to amplify these genes (Table 1). DNA fragments of the expected sizes were amplified from E. coli W3110 DNA but not from E. coli H709c or E. coli XL-1 Blue MRF′ DNA. These results suggested that E. coli H709c did not contain theEcoK, mcrABC, and mrr genes, which are present in E. coli K-12 derivatives.
There have been several reports of the close association between enzymes involved in DNA mobility and R-M systems. A partial P4 phageint gene occurs next to the R.SinI (13), M.EcoHK31I, and M.EaeI genes (17); a partial integrase gene of retronphage φR73 also occurs next to the R.EaeI gene (11, 17); and a gene for the Int family of recombinases occurs 3′ to the M.AccI gene (5). A gene for a DNA invertase-like enzyme is found near the M.PaeR7I (35) gene, and the transposon resolvase gene is located upstream of the R.BglII gene (1). A putative transposase-encoding gene is found in the intergenic area between the R.Eco47I and M.Eco47II genes (31). These enzymes are supposed to facilitate the transfer of the R-M systems among bacterial species.
We found that complete P4 phage genes, other than the cII, β, and gop genes, were lined up in a sequential order adjacent to the EcoO109I R-M system. Furthermore, a DNA sequence similar to that of the P4 phage attachment site, followed byE. coli K-12 MG1655 chromosomal DNA, was found upstream of the int gene. Comparison of the G+C contents of the sequenced regions of the EcoO109I R-M system, including ORF1 and the surrounding system, revealed an interesting feature: although the overall G+C content of P4 phage DNA was 49%, the genes in theEcoO109I R-M system had an average G+C content of 36%. This indicates that the P4 phage was lysogenized in the E. coliH709c chromosome at the P4 attachment site observed in E. coli K-12, in which genes nonessential for lytic growth, i.e., thecII, β, and gop genes, were replaced byEcoO109I R-M genes, including ORF1.
P4 can complete its life cycle only if it infects a cell that already has a helper phage, such as P2, within it or if a helper phage is supplied later. This means that P4 acts as a satellite phage or a parasite. We have found that one copy of IS21 was inserted in the sid gene, which generates a frameshift mutation of the sid protein. The sid gene product is supposed to be responsible for determining the precise size and symmetry of the structure into which the helper P2 gene products will assemble. Inactivation of the P4 sid gene does not necessarily prevent the formation of plaques, and the mutant produces P4 PFU with large P2-sized capsids which contain two or three copies of the mutant P4 genome (30).
P4 can inject its own DNA into E. coli and other gram-negative bacteria, such as Salmonella andKlebsiella (19). When P4 infects a sensitiveE. coli host harboring the genome of helper phage P2, it may enter either the lysogenic or lytic pathway, being dependent on all the morphopoietic and lytic functions encoded by the helper to accomplish the latter mode of replication. In the absence of the helper phage, infection of E. coli by P4 may lead to either an immune-integrated condition, analogous to the lysogenic state, or the establishment of the multicopy plasmid mode of maintenance. This property, as well as P4’s genetic organization, suggests that P4 may be considered an episomal element that evolved the ability to exploit a helper bacteriophage for horizontal propagation through a novel specialized transduction mechanism (19). These data are consistent with the following hypothesis for the transfer of theEcoO109I R-M system to the chromosome DNA. First, a hybrid P4, in which the cII, β, and gop genes were replaced by EcoO109I R-M genes, was produced through bacterial recombination. Second, the bacteria carrying the hybrid P4 were infected by a helper phage, such as P2, and thus the P4 phage particles were released. Third, E. coli H709c was infected by the hybrid P4 phage, and the P4 DNA was integrated into the chromosome. Finally, the sid gene was inactivated by the insertion of IS21, and the prophage carrying theEcoO109I R-M system was maintained stably on the chromosome. It is conceivable that migration of the R.EcoO109I gene alone is not unfavorable for the P4 phage because of the lack of a target site of R.EcoO109I in its DNA (9) but is lethal for the host cell. It is quite interesting that theint genes found in four of seven R-M systems were similar to those of P4 and related φR73 phages. Extensive analysis of the nucleotide sequences adjacent to other R-M systems will provide clues to the evolution and migration of the R-M systems in bacteria.
Nucleotide sequence accession number.The GenBank accession number for the DNA sequence of the gene encoding theEcoO109I R-M system is AF157599 .
ACKNOWLEDGMENTS
E. coli H709c was obtained from M. Miyahara (Institute of Public Health, Tokyo, Japan).
This work was supported in part by a Grant-in-Aid for Scientific Research on Priority Areas (no. 296) from the Ministry of Education, Science, Sports and Culture, Japan.
FOOTNOTES
- Received 17 June 1999.
- Accepted 13 August 1999.
- Copyright © 1999 American Society for Microbiology