Agmenellum quadruplicatum M.AquI, a novel modification methylase

The complete type II modification methylase of Agmenellum quadruplicatum was cloned in Escherichia coli as an R.Sau3A fragment of approximately 4.5 kilobases. The coding sequence was contained in a stretch of 1,156 base pairs which was organized into two parallel, partly overlapping open reading frames of 248 and 139 codons. In vivo complementation experiments showed that the synthesis of both predicted peptides was required for full methylase activity. The amino acid sequences were considerably similar to regions of other deoxycytidylate methylases.

The complete type II modification methylase of Agmenellum quadruplicatum was cloned in Escherichia coli as an R Sau3A fragment of approximately 4.5 kilobases. The coding sequence was contained in a stretch of 1,156 base pairs which was organized into two parallel, partly overlapping open reading frames of 248 and 139 codons. In vivo complementation experiments showed that the synthesis of both predicted peptides was required for full methylase activity. The amino acid sequences were considerably similar to regions of other deoxycytidylate methylases.
In procaryotic cells, two fundamentally different restriction-modification (R-M) systems have been found. They can be distinguished by their need for cofactors, as well as by their genetic organization. The R-M complex of the so-called type I systems requires S-adenosylmethionine, ATP, and Mg2+ for its functioning and contains the products of three genes: the first codes for the S unit, which recognizes the specific site, the second directs the synthesis of the R unit, which is responsible for cutting unmethylated DNA, and the last is translated into the M unit, which methylates nonor hemimethylated DNA. The absence of both of the protecting methyl groups at the recognition sites primes the R-M complex for attacking the DNA; the actual cleavage, however, does not occur at the recognition site itself, but somewhere else in the DNA molecule. The type II systems, on the other hand, code for two proteins: an endonuclease needing only Mg2+ for cleavage (which takes place at the recognition site itself) and a modification methylase requiring only S-adenosylmethionine for the protection of DNA. All type II systems characterized at the genetic level so far have been found to be encoded by just two genes: one for the endonuclease and another for the methylase. Only in the case of DpnII have two different methylase genes been discovered (3); both code for a protein that methylates independently of the other.
In the cyanobacterium Agmenellum quadruplicatum, an R-M system that consists of isoschizomers of the AvaI (10,13) has been identified. On the basis of the factor requirements of both the restriction and the modification components, it was classified as type II and, therefore, expected to consist of one methylase gene and one endonuclease gene. In this paper, however, we present data indicating that the methylase requires the expression of two open reading frames (ORFs) designated ot and P. MATERIALS AND METHODS Strains and media. Since most Escherichia coli strains carry the mcr restriction system that destroys foreign methylated DNA, only E. coli MC1061 (mcrB mcrA) (4,20)  M AquI genes. MC1061 was grown in brain heart infusion I medium or in Luria-Bertani medium (Oxoid Ltd., Basingstoke, United Kingdom). The antibiotics ampicillin and kanamycin were used at concentrations of 70 and 50 pug/ml, respectively. A wild-type strain of A. quadruplicatum PR-6 (PCC 7002; donated by N. Tandeau de Marsac, Institut Pasteur, Paris, France) was the source of chromosomal DNA; it was grown in medium ASN III (21) buffered with 2 mM N-2-hydroxyethylpiperazine-N'-2-ethanesulfonic acid (HEPES) (pH 7.8) and supplemented with 100 ,ug of vitamin B12 per liter. The culture was illuminated with cool-white fluorescent light at 1,000 lx and aerated with a stream of 95% air-5% CO2 bubbling through.
Enzymes and chemicals. Restriction endonucleases were purchased from New England BioLabs, Inc. (Beverly, Mass.), Promega Biotec (Leiden, The Netherlands), and Pharmacia (Uppsala, Sweden). R AquI was isolated as described by Lau and Doolittle (13). Calf intestinal phosphatase was from Boehringer GmbH (Mannheim, Federal Republic of Germany). Phage T7 modified polymerase (Sequenase) was obtained from U.S. Biochemicals (Cleveland, Ohio). All enzymes were used in accordance with manufacturer specifications. Oligonucleotides were synthesized on a Cyclone DNA synthesizer (Biosearch Inc., San Rafael, Calif.).
Construction of the colony bank. DNA from A. quadruplicatum (isolated by method 3 of Lambert and Carr [12]) was digested partially with R. Sau3A (15), and the various incubation mixtures were pooled and fractionated on lowmelting-point agarose gels. Fragments ranging in size from 2 to 9 kilobases were extracted from the gels and ligated into plasmid pACYC177 (6), which had been cut with R BamHI and dephosphorylated with calf intestinal phosphatase. A bank of 8,000 colonies was constructed by transforming E. coli MC1061 with the resulting recombinant plasmids. The transformants were pooled, transferred from plates into 750 ml of brain heart infusion medium, and grown for 6 h at 37°C before being harvested by centrifugation. Plasmids were isolated as described previously (15).
Selection. Since the product from a successfully cloned M AquI gene should render the two AquI sites of pACYC177 resistant to cleavage by R AquI, a 5-,u sample of the plasmid bank (representing approximately 2.5% of the total yield) was incubated with 50 U of endonuclease R AquI for 8  phenol and precipitated with ethanol. The cleavage selection procedure was repeated overnight. The resulting DNA was purified and introduced into E. coli MC1061. The resulting transformants were plated onto agar containing ampicillin and kanamycin and gave rise to approximately 50 colonies. Twelve randomly chosen colonies were further investigated, of which four contained plasmids that were resistant to R -AquI cleavage in vitro. To ensure that we had not isolated mutants lacking both AquI sites in pACYC177 but genuine clones containing the M AquI gene, we removed the total insert by excising an R DraII-R BstEII fragment from all four R AquI-resistant recombinants. Upon replication in E. coli, the resulting deletion clones were again sensitive to R -AquI, proving that a reversible modification rather than some mutation had caused the original clones to become R AquI resistant. Judging from their restriction patterns, the R -AquI-resistant clones all had the same insert. One of them, pAQ6, was the basis for all subsequent experiments. Subclones containing just the R-ClaI-R -NsiI fragment inserted into pUC19 or pUC18 (30) (yielding pMAQU and pMAQUANTI, respectively; see Fig. 1) still gave rise to the expression of M AquI, and the latter clone was investigated further.
Plasmids. All plasmids used in this study were derivatives of pAQ6 (Fig. 1). The vectors used were pUC18 and pUC19 (30) and pACYC177 (6). A more detailed description of the construction of all the plasmids shown in Fig. 1 is given in the Appendix.
DNA sequencing. The DNA sequence of pMAQU was largely determined with the dideoxynucleotide protocol of Sanger et al. (23) on denatured double-stranded plasmid DNA with specific oligonucleotide primers. To confirm the sequence in the area of the two overlapping ORFs (see Results), we sequenced tracts by the chemical degradation method of Maxam and Gilbert (16).
Southern hybridizations. For Southern blotting, 2 ,ug of chromosomal DNA was degraded with the appropriate restriction enzymes, fractionated by electrophoresis, and transferred to nitrocellulose (26). The probe was obtained by purifying the total insert of pMAQUANTI by electrophore-  DNA amplification by the polymerase chain reaction. Amplification of a DNA fragment consisting of a ribosomebinding site and the start of the a ORF was done by the procedure of Saiki et al. (22). Linearized plasmid pUCa (5 ng) was incubated for 24 cycles with TAQ polymerase (Cetus Corp., Norwalk, Conn.), deoxynucleoside triphosphates, and two oligonucleotides of 20 and 42 nucleotides (nt) with the following sequences: 5'-CACGTAGAAACTCAAGT ACC-3' and 5'-GGGTACGTAAGGAGGTTGTGATGGAA AAAAAACTGATAAGCC-3'. The 20-mer was complementary to nt 697 to 716 in the a frame (Fig. 2), and the 42-mer consisted of the following elements: nt 1 to 3 (GGG) for stability, nt 4 to 20 (ribosome-binding sequence of the R. SinI gene), and nt 21 to 42 (first codons of the proposed a ORF).
Isolation of recombinant-encoded M AquI. E. coli cells harboring plasmid pSDot, were grown in brain heart infusion medium. Cells were harvested by centrifugation and broken by treatment with lysozyme (10 min at 20°C), followed by sonication (three times for 1 min each time at 4°C). From this crude lysate, M AquI was isolated as described earlier for a lysate from A. quadruplicatum cells (10).
In vitro transcription-translation. 35S-labeled proteins were obtained with the help of a procaryotic transcriptiontranslation kit (Amersham International, Amersham, United Kingdom), separated on a 17.5% sodium dodecyl sulfate-polyacrylamide gel, and visualized by autoradiography. Proteins to be used for in vitro methylation assays were made with the same kit, but with nonradioactive methionine; 30% of the total yield was used for the reactions as described earlier (10).
Computer programs. For the analysis of sequence data, we made use of the GCG software package distributed by the University of Wisconsin (7). Protein files generated with the aid of the GCG programs from published DNA sequences were compared on an Atari ST microcomputer. To this end, a program was developed to score for similarity with a preset window size and a maximum number of mismatches. Although scoring tables for comparing proteins suggested by the GCG programs and by Schwarz and Dayhoff (24) were also used, the best pictures, i.e., those with the lowest background noise, were obtained by comparing the original amino acid sequences with straight "one-amino-acid-is-onescore" counting.

RESULTS
Cloning of the genes. A library consisting of A. quadruplicatum DNA in pACYC177 was constructed, and clones with fully functional M AquI genes were selected (see Materials and Methods). One of the resulting positive clones (pAQ6) was used to investigate the M AquI function further. A subclone (pMAQUANTI) containing the ClaI-NsiI fragment of pAQ6 (Fig. 1) still had M AquI activity, as the isolated plasmid proved to be resistant to cleavage by R AquI. When the insert was trimmed further, neither the plasmid from which the leftmost 1.2 kilobases of the insert had been deleted nor the one lacking the rightmost 1.1 kilobases of the insert produced any detectable M AquI activity. The complete sequence of the insert in pMAQUANTI was determined (GenBank accession number M28051). This revealed, to our surprise, not one but two parallel, partly overlapping ORFs (Fig. 2). These ORFs were termed a and P. The primary structure of the region in which the first and second ORFs overlapped was confirmed by repeated sequencing by both the dideoxy (with both the Klenow fragment of polymerase I and Sequenase) and chemical degradation methods.
To rule out the possibility that part of the ORFs was a fragment incorporated during cloning, we compared the organization of pMAQUANTI with that of chromosomal A. quadruplicatum DNA by Southern blot analysis. The NsiI-ClaI insert of pMAQU was used as a probe against chromosomal DNA cut with various enzymes. The resulting autoradiographs featured the expected bands, showing the organization of pMAQUANTI and that of the A. quadruplicatum chromosome to be identical (data not shown).
When we were satisfied that the unexpected appearance of two ORFs (ao and ,B) was indeed not due to a sequencing or cloning error, we set out to investigate whether the modification funcfion really consists of two polypeptides or whether the product of the two ORFs is a single protein chain, because of, e.g., a ribosome shift or RNA processing.
Complementation experiments. Both the at and ,B ORFs were cloned separately into two compatible vectors, pACYC177 and pUC18, yielding pa and p,B, respectively. Transformation of E. coli with either the a or the ,B plasmid alone did not lead to the synthesis of functional M AquI.
The presence of pa and p1 in the same cell might conceivably lead, via homologous recombination, to the reconstitution of the wild-type sequence. Plasmids arising in this manner would produce active M AquI but could be distinguished from the input plasmids by their altered sizes. However, Fig. 3 shows no evidence of such recombinational events.
In pa, the expression of the at ORF is ensured, as it is still downstream of the sequence that apparently is active as a promoter in E. coli. In p13, the 1 ORF is expressed because of a DNA segment comprising the promoter and 5' codons of the E. coli lacZ gene which are in frame with the residual 3' at ORF codons (the 5' part of a having been deleted from p13; see Appendix). If in the wild type the two overlapping ORFs encode a single long protein, e.g., because of the ribosome slipping a frame, this process might also act on p1 transcripts and yield a peptide consisting of the normal C-terminal sequence (encoded by the 13 ORF) preceded by a few foreign residues (encoded by lacZ and the a ORF). If, however, two peptides are synthesized in the wild type, plasmid p13 would be expected to yield the normal 13 peptide plus a small lac-a fusion peptide.
When p13 was introduced into E. coli together with pa, the resulting transformed cells produced functional M AquI.
This result implies that the peptides encoded in the a and 13 ORFs are complementary in the reconstitution of an active enzyme. However, this experiment does not yet provide conclusive evidence for the two-protein hypothesis, since detached N and C termini of proteins are known to complement in some instances (e.g., the lacZ gene [17]).
To address this question, we constructed plasmid p+41, in which the lac-ot fusion frame is interrupted by 4 bases but the 13 frame is left intact (see Appendix). As an additional control, we constructed p1AATG, a plasmid lacking the initiation codon of the proposed 13 gene. In both p+413 and p1AATG, the ORFs are preceded by the lac promoter. The results of complementation experiments with these plasmids are summarized in Fig. 3. Figure 3A shows the DNAmodifying potential of the primary plasmids, pa and p1, grown either separately or together. Plasmid p+413 (which Proteins obtained by coupled in vitro transcriptiontranslation of several plasmids. Lanes: A, pUC19 (control); B, p3 (1B ORF); C, pI19CN (a and 1B ORFs downstream of their own promoters and of the lac promoter); D, pSDaot (a and ORFs downstream of the lac promoter and the ribosome-binding sequence of R SinI); E, pANTIP (13 ORF antiparallel to other genes of pUC18). The sizes of marker proteins run alongside lanes A to E are given in kilodaltons. The proteins indicated are 1-lactamase (Ap), the product of the a ORF (a), and the product of the ORF (13). carries an intact ORF) also complemented pa (Fig. 3B). In contrast, plasmid ppl&ATG (which lacks the putative initiation codon of the 13 ORF) could not complement pa. Since in the case of p+41 translation initiated at the lacZ signals should terminate at a stop codon 80 nt upstream from the start codon, we can conclude that the ORF is translated independently from its own ribosome-binding sequence. The triplet GGA at position 1119 (Fig. 2) might fulfill this function.
To investigate whether an independent promoter regulates the ORF, we constructed a pUC18 derivative containing the ORF with approximately 150 bases 5' to the ATG and in which the transcription of was antiparallel to that of the other genes of pUC18 (pANTI1). The results of cotransformation with pANTI,B and pa are shown in Fig. 3. The ORF was apparently expressed in E. coli to an extent sufficient to complement the a ORF. This result is also shown in Fig. 4, lane E, which represents the fractionation of the products resulting from in vitro transcription-translation of plasmid pANTI,B (see below).
Proteins. To prove that the predicted proteins were actually synthesized, we subjected various plasmids to an in vitro transcription-translation system. Although the a unit apparently was expressed in vivo, as demonstrated in the complementation experiments, no protein was detected in the first in vitro assay (Fig. 4, lane C). A polypeptide with the expected molecular mass of the unit, however, was abundantly synthesized when plasmids with the ORF were used in this in vitro system (Fig. 4, lane D).
Even when the a ORF was cloned downstream of the lac (pMAQU) or ampicillin (pa; not shown) promoter, no protein of the expected size was observed. Furthermore, when a mixture of proteins synthesized in vitro was incubated with lambda DNA, the DNA was not protected against R AquI restriction, suggesting that the amount of functional protein synthesized in vitro was at best very low. As our inability to demonstrate the a gene product might be ascribed to a low rate of initiation of protein synthesis, rather than to an insufficient amount of mRNA, we set out to put the ORFs in their native configuration under the control of an active promoter and an optimized ribosome-binding sequence.
To this end, plasmid pSDa13 was constructed (see Appendix). This plasmid contains the lac promoter, an optimized ribosome-binding sequence (of the R-SinI gene, because of its identity to the consensus E. coli sequence [9]), and the complete overlapping a and a frames in their wild-type configuration. In vivo this construct rendered DNA resistant to R * AquI, as assayed with DNA extracted from cells harboring the plasmid, indicating that active M AquI was synthesized. When transcribed-translated in vitro, the plasmid yielded not only the a protein but also a protein with the molecular mass expected for the a protein (Fig. 4, lane D). No protein representing the combined a and a ORFs was observed. From cells harboring pSDa13, M * AquI was isolated and assayed in vitro. The yield was approximately 750 U of M * AquI per g of cell material, on the same order as that in A. quadruplicatum cells (unpublished observations). One unit was defined as the amount of enzyme required to render 1 ,ug of lambda DNA resistant to cleavage by R -AquI after 1 h of incubation.
Similarity between M AquI and other deoxycytidylate methylases. The predicted amino acid sequences of the M AquI peptides were compared with those of some other deoxycytidylate methylases: M BspRI (GGCC) (19) similarities to the a-as well as the 13-encoded polypeptides. The similarity between M AquI and M. SinI was striking even at the DNA level (Fig. 5, part IX). In combination with the protein similarity plots, the DNA-DNA comparison clearly showed that the conserved protein sequences are in fact encoded by conserved DNA sequences.
The a and ,B ORFs overlap in the middle of the normally poorly conserved region in which, in other modification methylases, the specific recognition site of the protein is thought to be encoded (1). This could mean either that the whole function occurs on just one of the proteins or that the a-and 3-encoded proteins perfectly fit together when active. Either of these possibilities would throw a new light onto the mechanisms of recognition and binding of methylases.
The similarity between the two subunits of M AquI and all other dCMP methylases indicates that they are evolutionarily related even if M AquI is encoded by two ORFs. Since the group of cyanobacteria is very old, an alternative explanation might be that M AquI represents an ancient link between type I and type II R-M systems. It is tempting to speculate that, in analogy with the type I systems, one of the encoded proteins is also functional as an S unit in the restriction reaction.
The original clone, pAQ6, resulted in no restriction activity when protein fractions were assayed in vitro. The region 5' to the a unit contained, as far as sequenced, no large ORFs. However, 3' to the ,B unit there were two ORFs that were, because of their close proximity to the M AquI genes, both interesting candidates for encoding the restriction enzyme. Unfortunately, the restriction endonucleases in general are so dissimilar at the protein level that sequence comparisons with other endonucleases do not provide a clue. APPENDIX Plasmids pMAQUANTI and pMAQU, both expressing M AquI, were made by inserting the ClaI-NsiI fragment of pAQ6 into pUC18 and pUC19, respectively. As described in Results, these plasmids contained two ORFs (a and 1), which were both found to be required for M AquI activity. To prove the functionality of both ORFs, we prepared a number of derivatives in which either ORF was removed. From pMAQUANTI a further clone (pUCa, in which only the a ORF remained intact) was made by deleting the internal XbaI fragment. A plasmid containing just the P ORF (p3) was made by inserting the ClaI-BgIIH fragment of pAQ6 into pUC18. The P ORF in pp was positioned in such a way that it was downstream of the lac promoter in pUC18. The ,B ORF was overlapped at its 5' end by a hybrid frame consisting of the N-terminal 12 codons of lac (as in the polylinker region of pUC18) fused in frame to the C-terminal 46 codons of the proposed a ORF. From p13 two further constructs were derived: one (p+4,B) in which the 3' remaining codons of the a ORF were perturbed by filling in the Asp718I site and leaving P intact and the other in which the first ATG of the P region was deleted (deletion of a KpnI-XbaI fragment; p13AATG). A plasmid containing the P3 gene antiparallel to all other genes present (pANTI13) was made by deleting the BglII-HindIII fragment from pMAQUANTI. The at unit was cloned separately by inserting the PvuII-XbaI fragment of pMAQUANTI into the ScaI site of pACYC177, yielding plasmid pa. In view of the efficiency of in vitro expression, the a ORF was placed behind a strong ribosome-binding sequence by amplifying (see Materials and Methods) a segment of plasmid pUCa by the polymerase chain reaction technique. The fragment so generated was purified by gel electrophoresis, cut with R BsmI, and ligated into pUCa opened at the blunted SphI site and the BsmI site. This newly constructed clone (pSDa) had the a ORF behind a consensus ribosome-binding site but still lacked a promoter sequence. From pSDa the HindIII-AccIII fragment was cut, isolated, and exchanged for the HindIII-AccIII fragment of pMAQU, thus yielding pSDa1. In the resulting construct, both genes were under the control of the lac promoter, and the a ORF had a good ribosome-binding site 4 base pairs upstream of its initiator ATG.