JB
Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Chen, S. L.
Right arrow Articles by Shapiro, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chen, S. L.
Right arrow Articles by Shapiro, L.

 Previous Article  |  Next Article 

Journal of Bacteriology, August 2003, p. 4997-5002, Vol. 185, No. 16
0021-9193/03/$08.00+0     DOI: 10.1128/JB.185.16.4997-5002.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.

Identification of Long Intergenic Repeat Sequences Associated with DNA Methylation Sites in Caulobacter crescentus and Other {alpha}-Proteobacteria

Swaine L. Chen and Lucy Shapiro*

Department of Developmental Biology, Stanford University School of Medicine, Stanford, California 94304-5329

Received 18 March 2003/ Accepted 14 May 2003


    ABSTRACT
 Top
 Abstract
 Text
 References
 
A systematic search for motifs associated with CcrM DNA methylation sites revealed four long (>100-bp) motifs (CIR sequences) present in up to 21 copies in Caulobacter crescentus. The CIR1 and CIR2 motifs exhibit a conserved inverted repeat organization, with a CcrM site in the center of one of the repeats.


    TEXT
 Top
 Abstract
 Text
 References
 
Methylation of DNA performs key functions in eukaryotic and prokaryotic cells. Bacterial adenine DNA methylation usually occurs in restriction-modification systems, which differentiate between self and non-self DNA (26). Two prominent bacterial methyltransferases, however, are not part of restriction-modification systems: Dam in Escherichia coli and other {gamma}-proteobacteria (4, 6) and CcrM in Caulobacter crescentus and other {alpha}-proteobacteria (22). Dam and CcrM regulate gene transcription and the timing of DNA replication initiation and can be important for virulence (7, 8, 20).

Dam is not essential in E. coli. Regulation of transcription by Dam methylation in E. coli requires sequences in addition to the GATC methylation site. Two well-studied examples are phase variation in the pyelonephritis-associated pili (pap) operon (9, 25) and the outer membrane protein antigen 43 promoter (9). In both cases, regulation depends on specific Dam methylation sites, which are distinguished by their surrounding sequence.

In contrast, CcrM is an essential gene in the {alpha}-proteobacteria C. crescentus (22), Brucella abortus (20), Sinorhizobium meliloti, and Agrobacterium tumefaciens (12). DNA methylation in C. crescentus regulates transcription in the promoter for ccrM itself (23) and the P1 promoter of ctrA, a global transcriptional regulator (19). Therefore, we sought to determine whether the CcrM recognition site, GANTC, is associated with conserved motifs. We identified four large (>100-bp) intergenic motifs in C. crescentus that contain conserved CcrM sites. Two of these motifs and several other motifs in other {alpha}-proteobacteria share three features: (i) they are composed of two inverted repeats; (ii) a CcrM site is in the center of one of the inverted repeats; and (iii) a conserved central linker joins the two inverted repeats. These novel motifs in {alpha}-proteobacteria may mediate regulatory functions of CcrM.

Genome sequences were downloaded from GenBank (ftp://ftp.ncbi.nih.gov/GenBank/genomes/Bacteria) and processed with the Genome-Tools package (http://genome-tools.sourceforge.net) (13). Sequence alignments were done with the CLUSTALW 1.82 software program (24) and BLAST (1). Consensus RNA secondary structures were predicted by using ConStruct 2.0 (14, 15), which uses the RNAfold 1.4 algorithm (10, 16, 28). Default settings were used for CLUSTALW and ConStruct.

We examined 15 bp of sequence centered on each CcrM site (5 bp upstream and downstream of each GANTC) in C. crescentus. Excluding those which were associated with known transposases or insertion elements (17), four 15-mers occurred more than four times in intergenic sequences (Table 1; also shown are results for other {alpha}-proteobacteria). Sequence conservation around each of these 15-mers extended to over 100 bp (for alignments, see supplementary materials at http://caulobacter.stanford.edu/CIR). Using BLASTN to identify matches to each long motif, we found that only one or two matches do not contain CcrM sites. These long conserved motifs are therefore called Caulobacter CcrM-associated intergenic repeat 1 (CIR1) to CIR4. Two of these motifs, Caulobacter CIR1 and CIR2 (present in 21 and 16 copies, respectively) (Fig. 1A and B), appear to be conserved in other bacteria; only these two motifs in C. crescentus and related motifs in other {alpha}-proteobacteria are discussed below.


View this table:
[in this window]
[in a new window]
 
TABLE 1. List of repeated 15-mers centered on CcrM sites in {alpha}-proteobacteria

 


View larger version (93K):
[in this window]
[in a new window]
 
FIG. 1. DNA sequence alignments for Caulobacter and Brucella CIR1 and CIR2 sequences. Sequences were identified by BLASTN on the entire genome sequence, and full (not truncated) matches were identified manually. Alignments are shown for Caulobacter CIR1 (A), Caulobacter CIR2 (B), Brucella CIR1 (C), and Brucella CIR2 (D). Nucleotides are color coded, with A in red, C in blue, G in black, and T in green. Sequences are annotated on the left with the chromosomal coordinate of the first (leftmost) base shown and on the right with the length of sequence shown. Negative coordinates indicate sequences that have been reversed and complemented. In panels C and D, an "I" indicates the sequence is from chromosome I, and an "II" indicates the sequence is from chromosome II. Asterisks above the sequences indicate strictly conserved bases. The gray bars at the bottom of the alignments indicate the level of conservation, with the tallest bars meaning strict conservation in all sequences and no bar meaning no conservation. The location of the conserved CcrM site is highlighted with a black box. Arrows in panel A highlight a hybrid CIR1/CIR2 sequence.

 
The local gene organization around each Caulobacter CIR1 and CIR2 sequence is shown in Table 2. CIR1 and CIR2 are often shortly downstream of flanking open reading frames (ORFs) (Fig. 2); the stop codon is often the beginning of the CIR1 or CIR2 consensus sequence (a distance of 1 in Table 2). In these cases, CIR1 and CIR2 have not truncated the flanking ORFs (based on BLASTP [1] compared to the GenBank [5] nonredundant database, only 1 of 26 ORFs whose stop codon is supplied by a CIR1 or CIR2 sequence is truncated). The identities of the flanking ORFs suggest no function for the CIR sequences.


View this table:
[in this window]
[in a new window]
 
TABLE 2. List of ORFs flanking CIR1 and CIR2 sequences in Caulobactera

 


View larger version (11K):
[in this window]
[in a new window]
 
FIG. 2. Local genomic organization around repeated intergenic sequences. The orientation of genes flanking IRU/ERIC sequences in E. coli and CIR1 and CIR2 sequences in C. crescentus ("C.") and B. melitensis ("B.") is summarized. The vertical bar indicates the position of the intergenic repeat sequence. The term "overlap with ORF" means that the intergenic repeat sequence extends into the coding sequence of at least one of the flanking ORFs.

 
Because Caulobacter CIR1 and CIR2 are close to flanking genes, we expect them to be at least partially transcribed. Both motifs are composed of two 52-bp inverted repeat sequences (arms) separated by a 12-bp linker and thus, if transcribed, are predicted to form two long stem-loops joined by the linker (Fig. 3A). Of 38 differences between CIR1 and CIR2, 20 are compensatory changes preserving potential base pairing. The linkers are conserved and nonpalindromic, allowing CIR1 and CIR2 to be oriented. The CcrM site is in the middle of one of the arms (blue circles in Fig. 3A). The presence of exactly one CcrM site seems important: only 2 of 37 CIR1 and CIR2 sequences have CcrM sites in both arms. Additionally, the arms (each individually inverted repeats) are nearly inverted repeats of each other, but one arm contains a single difference which destroys what would otherwise be a complementary GANTC site.



View larger version (25K):
[in this window]
[in a new window]
 
FIG. 3. Predicted consensus RNA secondary structures for putatively transcribed Caulobacter (A) and Brucella (B) CIR1 and CIR2 motifs. Structures were predicted based on combined alignments of CIR1 and CIR2 motifs in each bacterium. Colored lines connecting paired bases indicate the probabilities of base pairing as follows: red, high probability; magenta, intermediate probability; and blue, low probability. The location of the potentially transcribed GANTC site is circled in light blue. The sequence shown in panel A corresponds to the one labeled 539421 in Fig. 1A; the sequence in panel B corresponds to the one labeled 68730-I in Fig. 1C.

 
Two 110-bp motifs in Brucella melitensis (Brucella CIR1 and CIR2, present in 39 and 35 copies, respectively) are strikingly similar to the Caulobacter CIR1 and CIR2 motifs (Fig. 1C and D). The Brucella CIR1 and CIR2 motifs are (i) composed of two inverted repeat arms joined by a central linker (Fig. 3B), (ii) have a CcrM site in the center of one of the inverted repeats, (iii) have a conserved central linker, and (iv) sometimes provide stop codons for flanking ORFs (data not shown). The Brucella CIR1 motif is also often downstream of flanking ORFs (Fig. 2). These ORFs are not related to the flanking ORFs in Caulobacter (finding the best BLAST hit in C. crescentus of the 137 ORFs flanking Brucella CIR1 and CIR2 sequences results in only two ORFs flanking Caulobacter CIR1 or CIR2 sequences; by random chance, one would expect to find three). Thus, flanking ORFs again provide no suggestions for CIR functions.

Potentially related CIR motifs in other {alpha}-proteobacteria are diagrammed in Fig. 4 (for full sequences and alignments, see supplementary materials). The Mesorhizobium CIR1 motif is shorter than those in Caulobacter and Brucella, and the central linker is different. However, it is also composed of two inverted repeats (arms) with a conserved CcrM site in the center of one arm. The Sinorhizobium CIR1 is composed of two inverted repeats, but the conserved CcrM site is within the central linker, whose sequence differs from the Caulobacter and Brucella linkers. However, two motifs previously identified in S. meliloti, RIME1and RIME2 (for Rhizobium-specific intergenic mosaic elements 1 and 2) (18), also have two inverted repeat arms joined by a central linker. The linker sequence in RIME1 is similar to the Caulobacter and Brucella CIR1 and CIR2 linker, but RIME1 has no conserved CcrM site in its arms. The lack of conserved CcrM sites in RIME1 and RIME2 explains why these sequences were not found by our searches. We found only a previously identified 440-bp motif associated with CcrM sites in Rickettsia prowazekii, with no resemblance to other CIR sequences. Notably, R. prowazekii lacks a CcrM homolog.



View larger version (16K):
[in this window]
[in a new window]
 
FIG. 4. Schematic diagram of CIR and related sequences in {alpha}-proteobacteria. Boxes with the same color and arrow markings represent sequences conserved between different CIRs. Half arrows pointing in opposite directions indicate complementary sequences that may form stem-loop secondary structures if transcribed. Conserved CcrM sites are indicated by a light blue circle. The central linker in red (orientation indicated by the full arrow is arbitrary) is conserved between Caulobacter CIR1 and CIR2, Brucella CIR1 and CIR2, and Sinorhizobium RIME1, but not the other sequences. CcrM sites are conserved at the loop within arms in the Caulobacter CIR1 and CIR2 motifs, the Brucella CIR1 and CIR2 motifs, and the Mesorhizobium CIR1 motif.

 
A similar search in E. coli for Dam-associated motifs yielded only three 14-mers (the Dam recognition site is 4 bp instead of 5 bp). These were associated with the IS5 transposase, the 23S rRNA gene cluster, and an Rhs element (for "rearrangement hot spot," a large, protein-coding repeat element) (27). Accordingly, no previously identified repeated intergenic sequence in E. coli K-12 is associated with Dam sites. The Caulobacter and Brucella CIR1 and CIR2 motifs resemble IRU/ERIC sequences in E. coli (11, 21). IRU/ERIC sequences are ~120 bp long, highly conserved, palindromic, and present in similar numbers. IRU/ERIC sequences were also found by sequence analysis; they are transcribed and have detectable transcriptional termination activity. However, gene regulation is probably not their primary role because this does not explain their extensive conservation (2, 11, 21). By a similar argument, then, gene regulation is likely not the primary function of the CIR sequences.

The IRU/ERIC sequences differ from CIR1 and CIR2 in important ways, however. IRU/ERIC sequences have no consensus methylation sites, appear usually between genes in an operon (Fig. 2), and have a single conserved stem-loop in their predicted RNA secondary structure (11, 21). No other previously identified repeated intergenic sequences outside of {alpha}-proteobacteria are analogous to the Caulobacter and Brucella CIR1 and CIR2 motifs; these CIR motifs are thus a new class of repeated intergenic sequences.

Like repeated intergenic sequences in other bacteria, the function of the CIR motifs is unknown. The association with methylation sites is novel, suggesting that understanding them may shed light on the functions of CcrM methylation. Their predilection for the end of genes suggests involvement in gene regulation, but they are not similar to known transcriptional terminators, and this would not explain their conservation. Their high conservation suggests a maintenance process, such as gene conversion (as has been postulated for the IRU/ERIC sequences). The GC content of the Caulobacter CIR1 and CIR2 sequences is 44.8% ± 6.3% (all other intergenic sequences are 64.8% ± 11.5%), which suggests a foreign origin. However, they are not similar to known transposases or insertion elements. Furthermore, these sequences may be modular, since there is one hybrid Caulobacter CIR1/CIR2 sequence (arrows in Fig. 1A; Fig. 4), and several other CIR sequences seem to have variants based on different arm sequences (see supplementary materials). Since repeated sequences seem to be found ubiquitously in intergenic sequences in all organisms (3), further characterization of CIR motifs and other intergenic sequences, both upstream and downstream of genes, is essential for understanding genome function and evolution.


    ACKNOWLEDGMENTS
 
This work was supported by National Institute of Health grant GM51426 and NIH grant 2T32GM07365 to the Medical Scientist Training Program (S.L.C.).


    FOOTNOTES
 
* Corresponding author. Mailing address: Department of Developmental Biology, Stanford University School of Medicine, Beckman Center, B300, 279 Campus Dr., Stanford, CA 94304-5329. Phone: (650) 725-7678. Fax: (650) 725-7739. E-mail: shapiro{at}cmgm.stanford.edu. Back


    REFERENCES
 Top
 Abstract
 Text
 References
 

  1. Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403-410.[CrossRef][Medline]
  2. Bachellier, S., E. Gilson, M. Hofnung, and C. W. Hill. 1996. Repeated sequences, p. 2012-2040. In F. C. Neidhardt, R. Curtiss III, J. L. Ingraham, E. C. C. Lin, K. B. Low, B. Magasanik, W. S. Reznikoff, M. Riley, M. Schaechter, and H. E. Umbarger (ed.), Escherichia coli and Salmonella: cellular and molecular biology, 2nd ed., vol. 2. ASM Press, Washington, D.C.
  3. Bao, Z., and S. R. Eddy. 2002. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12:1269-1276.[Abstract/Free Full Text]
  4. Barbeyron, T., K. Kean, and P. Forterre. 1984. DNA adenine methylation of GATC sequences appeared recently in the Escherichia coli lineage. J. Bacteriol. 160:586-590.[Abstract/Free Full Text]
  5. Benson, D. A., I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, B. A. Rapp, and D. L. Wheeler. 2000. GenBank. Nucleic Acids Res. 28:15-18.[Abstract/Free Full Text]
  6. Brooks, J. E., R. M. Blumenthal, and T. R. Gingeras. 1983. The isolation and characterization of the Escherichia coli DNA adenine methylase (dam) gene. Nucleic Acids Res. 11:837-851.[Abstract/Free Full Text]
  7. Garcia-Del Portillo, F., M. G. Pucciarelli, and J. Casadesus. 1999. DNA adenine methylase mutants of Salmonella typhimurium show defects in protein secretion, cell invasion, and M cell cytotoxicity. Proc. Natl. Acad. Sci. USA 96:11578-11583.[Abstract/Free Full Text]
  8. Heithoff, D. M., R. L. Sinsheimer, D. A. Low, and M. J. Mahan. 1999. An essential role for DNA adenine methylation in bacterial virulence. Science 284:967-970.[Abstract/Free Full Text]
  9. Henderson, I. R., P. Owen, and J. P. Nataro. 1999. Molecular switches: the ON and OFF of bacterial phase variation. Mol. Microbiol. 33:919-932.[CrossRef][Medline]
  10. Hofacker, I. L., W. Fontana, P. F. Stadler, S. Bonhoeffer, M. Tacker, and P. Schuster. 1994. Fast folding and comparison of RNA secondary structures. Monatsh Chem. 125:167-188.[CrossRef]
  11. Hulton, C. S., C. F. Higgins, and P. M. Sharp. 1991. ERIC sequences: a novel family of repetitive elements in the genomes of Escherichia coli, Salmonella typhimurium, and other enterobacteria. Mol. Microbiol. 5:825-834.[Medline]
  12. Kahng, L. S., and L. Shapiro. 2001. The CcrM DNA methyltransferase of Agrobacterium tumefaciens is essential, and its activity is cell cycle regulated. J. Bacteriol. 183:3065-3075.[Abstract/Free Full Text]
  13. Lee, W., and S. L. Chen. 2002. Genome-Tools: a flexible package for genome sequence analysis. BioTechniques 33:1334-1341.[Medline]
  14. Luck, R., S. Graf, and G. Steger. 1999. ConStruct: a tool for thermodynamic controlled prediction of conserved secondary structure. Nucleic Acids Res. 27:4208-4217.[Abstract/Free Full Text]
  15. Luck, R., G. Steger, and D. Riesner. 1996. Thermodynamic prediction of conserved secondary structure: application to the RRE element of HIV, the tRNA-like element of CMV and the mRNA of prion protein. J. Mol. Biol. 258:813-826.[CrossRef][Medline]
  16. McCaskill, J. S. 1990. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29:1105-1119.[CrossRef][Medline]
  17. Nierman, W. C., T. V. Feldblyum, M. T. Laub, I. T. Paulsen, K. E. Nelson, J. A. Eisen, J. F. Heidelberg, M. R. Alley, N. Ohta, J. R. Maddock, I. Potocka, W. C. Nelson, A. Newton, C. Stephens, N. D. Phadke, B. Ely, R. T. DeBoy, R. J. Dodson, A. S. Durkin, M. L. Gwinn, D. H. Haft, J. F. Kolonay, J. Smit, M. B. Craven, H. Khouri, J. Shetty, K. Berry, T. Utterback, K. Tran, A. Wolf, J. Vamathevan, M. Ermolaeva, O. White, S. L. Salzberg, J. C. Venter, L. Shapiro, C. M. Fraser, and J. Eisen. 2001. Complete genome sequence of Caulobacter crescentus. Proc. Natl. Acad. Sci. USA 98:4136-4141.[Abstract/Free Full Text]
  18. Osteras, M., J. Stanley, and T. M. Finan. 1995. Identification of Rhizobium-specific intergenic mosaic elements within an essential two-component regulatory system of Rhizobium species. J. Bacteriol. 177:5485-5494.[Abstract/Free Full Text]
  19. Reisenauer, A., and L. Shapiro. 2002. DNA methylation affects the cell cycle transcription of the CtrA global regulator in Caulobacter. EMBO J. 21:4969-4977.[CrossRef][Medline]
  20. Robertson, G. T., A. Reisenauer, R. Wright, R. B. Jensen, A. Jensen, L. Shapiro, and R. M. Roop II. 2000. The Brucella abortus CcrM DNA methyltransferase is essential for viability, and its overexpression attenuates intracellular replication in murine macrophages. J. Bacteriol. 182:3482-3489.[Abstract/Free Full Text]
  21. Sharples, G. J., and R. G. Lloyd. 1990. A novel repeated DNA sequence located in the intergenic regions of bacterial chromosomes. Nucleic Acids Res. 18:6503-6508.[Abstract/Free Full Text]
  22. Stephens, C., A. Reisenauer, R. Wright, and L. Shapiro. 1996. A cell cycle-regulated bacterial DNA methyltransferase is essential for viability. Proc. Natl. Acad. Sci. USA 93:1210-1214.[Abstract/Free Full Text]
  23. Stephens, C. M., G. Zweiger, and L. Shapiro. 1995. Coordinate cell cycle control of a Caulobacter DNA methyltransferase and the flagellar genetic hierarchy. J. Bacteriol. 177:1662-1669.[Abstract/Free Full Text]
  24. Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.[Abstract/Free Full Text]
  25. van der Woude, M., B. Braaten, and D. Low. 1996. Epigenetic phase variation of the pap operon in Escherichia coli. Trends Microbiol. 4:5-9.[CrossRef][Medline]
  26. Wilson, G. G. 1988. Cloned restriction-modification systems: a review. Gene 74:281-289.[CrossRef][Medline]
  27. Zhao, S., C. H. Sandt, G. Feulner, D. A. Vlazny, J. A. Gray, and C. W. Hill. 1993. Rhs elements of Escherichia coli K-12: complex composites of shared and unique components that have different evolutionary histories. J. Bacteriol. 175:2799-2808.[Abstract/Free Full Text]
  28. Zuker, M., and P. Stiegler. 1981. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9:133-148.[Abstract/Free Full Text]


Journal of Bacteriology, August 2003, p. 4997-5002, Vol. 185, No. 16
0021-9193/03/$08.00+0     DOI: 10.1128/JB.185.16.4997-5002.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.




This article has been cited by other articles:


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Chen, S. L.
Right arrow Articles by Shapiro, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chen, S. L.
Right arrow Articles by Shapiro, L.


Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Appl. Environ. Microbiol. Infect. Immun. Eukaryot. Cell
Mol. Cell. Biol. J. Virol. Microbiol. Mol. Biol. Rev.
ALL ASM JOURNALS