Previous Article | Next Article ![]()
Journal of Bacteriology, August 2003, p. 4997-5002, Vol. 185, No. 16
0021-9193/03/$08.00+0 DOI: 10.1128/JB.185.16.4997-5002.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
-Proteobacteria
Department of Developmental Biology, Stanford University School of Medicine, Stanford, California 94304-5329
Received 18 March 2003/ Accepted 14 May 2003
|
|
|---|
|
|
|---|
-proteobacteria (4, 6) and CcrM in Caulobacter crescentus and other
-proteobacteria (22). Dam and CcrM regulate gene transcription and the timing of DNA replication initiation and can be important for virulence (7, 8, 20). Dam is not essential in E. coli. Regulation of transcription by Dam methylation in E. coli requires sequences in addition to the GATC methylation site. Two well-studied examples are phase variation in the pyelonephritis-associated pili (pap) operon (9, 25) and the outer membrane protein antigen 43 promoter (9). In both cases, regulation depends on specific Dam methylation sites, which are distinguished by their surrounding sequence.
In contrast, CcrM is an essential gene in the
-proteobacteria C. crescentus (22), Brucella abortus (20), Sinorhizobium meliloti, and Agrobacterium tumefaciens (12). DNA methylation in C. crescentus regulates transcription in the promoter for ccrM itself (23) and the P1 promoter of ctrA, a global transcriptional regulator (19). Therefore, we sought to determine whether the CcrM recognition site, GANTC, is associated with conserved motifs. We identified four large (>100-bp) intergenic motifs in C. crescentus that contain conserved CcrM sites. Two of these motifs and several other motifs in other
-proteobacteria share three features: (i) they are composed of two inverted repeats; (ii) a CcrM site is in the center of one of the inverted repeats; and (iii) a conserved central linker joins the two inverted repeats. These novel motifs in
-proteobacteria may mediate regulatory functions of CcrM.
Genome sequences were downloaded from GenBank (ftp://ftp.ncbi.nih.gov/GenBank/genomes/Bacteria) and processed with the Genome-Tools package (http://genome-tools.sourceforge.net) (13). Sequence alignments were done with the CLUSTALW 1.82 software program (24) and BLAST (1). Consensus RNA secondary structures were predicted by using ConStruct 2.0 (14, 15), which uses the RNAfold 1.4 algorithm (10, 16, 28). Default settings were used for CLUSTALW and ConStruct.
We examined 15 bp of sequence centered on each CcrM site (5 bp upstream and downstream of each GANTC) in C. crescentus. Excluding those which were associated with known transposases or insertion elements (17), four 15-mers occurred more than four times in intergenic sequences (Table 1; also shown are results for other
-proteobacteria). Sequence conservation around each of these 15-mers extended to over 100 bp (for alignments, see supplementary materials at http://caulobacter.stanford.edu/CIR). Using BLASTN to identify matches to each long motif, we found that only one or two matches do not contain CcrM sites. These long conserved motifs are therefore called Caulobacter CcrM-associated intergenic repeat 1 (CIR1) to CIR4. Two of these motifs, Caulobacter CIR1 and CIR2 (present in 21 and 16 copies, respectively) (Fig. 1A and B), appear to be conserved in other bacteria; only these two motifs in C. crescentus and related motifs in other
-proteobacteria are discussed below.
|
View this table: [in a new window] |
TABLE 1. List of repeated 15-mers centered on CcrM sites in -proteobacteria
|
![]() View larger version (93K): [in a new window] |
FIG. 1. DNA sequence alignments for Caulobacter and Brucella CIR1 and CIR2 sequences. Sequences were identified by BLASTN on the entire genome sequence, and full (not truncated) matches were identified manually. Alignments are shown for Caulobacter CIR1 (A), Caulobacter CIR2 (B), Brucella CIR1 (C), and Brucella CIR2 (D). Nucleotides are color coded, with A in red, C in blue, G in black, and T in green. Sequences are annotated on the left with the chromosomal coordinate of the first (leftmost) base shown and on the right with the length of sequence shown. Negative coordinates indicate sequences that have been reversed and complemented. In panels C and D, an "I" indicates the sequence is from chromosome I, and an "II" indicates the sequence is from chromosome II. Asterisks above the sequences indicate strictly conserved bases. The gray bars at the bottom of the alignments indicate the level of conservation, with the tallest bars meaning strict conservation in all sequences and no bar meaning no conservation. The location of the conserved CcrM site is highlighted with a black box. Arrows in panel A highlight a hybrid CIR1/CIR2 sequence.
|
|
View this table: [in a new window] |
TABLE 2. List of ORFs flanking CIR1 and CIR2 sequences in Caulobactera
|
![]() View larger version (11K): [in a new window] |
FIG. 2. Local genomic organization around repeated intergenic sequences. The orientation of genes flanking IRU/ERIC sequences in E. coli and CIR1 and CIR2 sequences in C. crescentus ("C.") and B. melitensis ("B.") is summarized. The vertical bar indicates the position of the intergenic repeat sequence. The term "overlap with ORF" means that the intergenic repeat sequence extends into the coding sequence of at least one of the flanking ORFs.
|
![]() View larger version (25K): [in a new window] |
FIG. 3. Predicted consensus RNA secondary structures for putatively transcribed Caulobacter (A) and Brucella (B) CIR1 and CIR2 motifs. Structures were predicted based on combined alignments of CIR1 and CIR2 motifs in each bacterium. Colored lines connecting paired bases indicate the probabilities of base pairing as follows: red, high probability; magenta, intermediate probability; and blue, low probability. The location of the potentially transcribed GANTC site is circled in light blue. The sequence shown in panel A corresponds to the one labeled 539421 in Fig. 1A; the sequence in panel B corresponds to the one labeled 68730-I in Fig. 1C.
|
Potentially related CIR motifs in other
-proteobacteria are diagrammed in Fig. 4 (for full sequences and alignments, see supplementary materials). The Mesorhizobium CIR1 motif is shorter than those in Caulobacter and Brucella, and the central linker is different. However, it is also composed of two inverted repeats (arms) with a conserved CcrM site in the center of one arm. The Sinorhizobium CIR1 is composed of two inverted repeats, but the conserved CcrM site is within the central linker, whose sequence differs from the Caulobacter and Brucella linkers. However, two motifs previously identified in S. meliloti, RIME1and RIME2 (for Rhizobium-specific intergenic mosaic elements 1 and 2) (18), also have two inverted repeat arms joined by a central linker. The linker sequence in RIME1 is similar to the Caulobacter and Brucella CIR1 and CIR2 linker, but RIME1 has no conserved CcrM site in its arms. The lack of conserved CcrM sites in RIME1 and RIME2 explains why these sequences were not found by our searches. We found only a previously identified 440-bp motif associated with CcrM sites in Rickettsia prowazekii, with no resemblance to other CIR sequences. Notably, R. prowazekii lacks a CcrM homolog.
![]() View larger version (16K): [in a new window] |
FIG. 4. Schematic diagram of CIR and related sequences in -proteobacteria. Boxes with the same color and arrow markings represent sequences conserved between different CIRs. Half arrows pointing in opposite directions indicate complementary sequences that may form stem-loop secondary structures if transcribed. Conserved CcrM sites are indicated by a light blue circle. The central linker in red (orientation indicated by the full arrow is arbitrary) is conserved between Caulobacter CIR1 and CIR2, Brucella CIR1 and CIR2, and Sinorhizobium RIME1, but not the other sequences. CcrM sites are conserved at the loop within arms in the Caulobacter CIR1 and CIR2 motifs, the Brucella CIR1 and CIR2 motifs, and the Mesorhizobium CIR1 motif.
|
120 bp long, highly conserved, palindromic, and present in similar numbers. IRU/ERIC sequences were also found by sequence analysis; they are transcribed and have detectable transcriptional termination activity. However, gene regulation is probably not their primary role because this does not explain their extensive conservation (2, 11, 21). By a similar argument, then, gene regulation is likely not the primary function of the CIR sequences.
The IRU/ERIC sequences differ from CIR1 and CIR2 in important ways, however. IRU/ERIC sequences have no consensus methylation sites, appear usually between genes in an operon (Fig. 2), and have a single conserved stem-loop in their predicted RNA secondary structure (11, 21). No other previously identified repeated intergenic sequences outside of
-proteobacteria are analogous to the Caulobacter and Brucella CIR1 and CIR2 motifs; these CIR motifs are thus a new class of repeated intergenic sequences.
Like repeated intergenic sequences in other bacteria, the function of the CIR motifs is unknown. The association with methylation sites is novel, suggesting that understanding them may shed light on the functions of CcrM methylation. Their predilection for the end of genes suggests involvement in gene regulation, but they are not similar to known transcriptional terminators, and this would not explain their conservation. Their high conservation suggests a maintenance process, such as gene conversion (as has been postulated for the IRU/ERIC sequences). The GC content of the Caulobacter CIR1 and CIR2 sequences is 44.8% ± 6.3% (all other intergenic sequences are 64.8% ± 11.5%), which suggests a foreign origin. However, they are not similar to known transposases or insertion elements. Furthermore, these sequences may be modular, since there is one hybrid Caulobacter CIR1/CIR2 sequence (arrows in Fig. 1A; Fig. 4), and several other CIR sequences seem to have variants based on different arm sequences (see supplementary materials). Since repeated sequences seem to be found ubiquitously in intergenic sequences in all organisms (3), further characterization of CIR motifs and other intergenic sequences, both upstream and downstream of genes, is essential for understanding genome function and evolution.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»