Previous Article | Next Article ![]()
Journal of Bacteriology, February 2003, p. 1266-1272, Vol. 185, No. 4
0021-9193/03/$08.00+0 DOI: 10.1128/JB.185.4.1266-1272.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Department of Biological Sciences, University of Cincinnati, Cincinnati, Ohio 45221
Received 20 September 2002/ Accepted 18 November 2002
|
|
|---|
|
|
|---|
Data from genomes of hyperthermophilic archaea suggest a similar interplay among DNA acquisition, DNA loss, and selection for function. For example, Pyrococcus species can acquire large blocks of DNA by lateral transfer (9), yet they have also maintained relatively constant genome sizes during speciation (35). However, these and other archaea from geothermal biotopes have extreme evolutionary divergence from the organisms in which most DNA transactions have been analyzed at the molecular level, and they grow optimally at temperatures which destabilize the primary and secondary structure of DNA. Thus, the frequency with which deletions occur in nonessential sequences, the average size of individual deletions, and the influence of DNA sequences on the positioning of endpoints cannot be assumed to correspond to those of the bacterial or eukaryotic systems in which the process of spontaneous deletion has been analyzed. Since these parameters should influence the rate and course of DNA removal, determining them in hyperthermophilic archaea represents an important step in understanding the evolutionary dynamics of their genomes.
In this study, we combined genetic selection and sequence analysis to determine the frequency and molecular nature of spontaneous deletions in the chromosome of Sulfolobus acidocaldarius, a crenarchaeote that populates acidic hot springs and grows optimally at about 80°C and pH 3. To our knowledge, this is the first quantitative molecular analysis of deletion formation in any archaeon or hyperthermophile. In contrast to the bacterial and eukaryotic target systems that have been similarly examined, spontaneous deletion events in the S. acidocaldarius pyrE gene were infrequent and not directed by repeated sequences at their ends.
|
|
|---|
The frequency of deletion formation in the wild-type pyrE gene was calculated from the isolations that yielded strains MR123, MR311, and JDS22. The calculations had to account for the fact that about 60% of all spontaneous Foar mutants have a leaky phenotype. This class of mutants was included in a set of 79 strains that yielded JDS22 (15) but had been purged during assembly of the remaining strains (284 strains). Therefore, deletion mutants MR123, MR311, and JDS22 represent the results of screening a total of 79 + (284/0.40), or approximately 790, independent Foar mutants initially isolated.
Genetic assays. The frequency of Foar cells was determined by plating aliquots of 113 small liquid cultures (less than 108 cells each) on selective medium and by plating appropriate dilutions of the cultures on nonselective (xylose-tryptone-uracil) medium to enumerate viable cells. The average frequency of mutants under these conditions was 2.5 x 10-6 per viable cell. Deletion of tandemly duplicated DNA was similarly measured by three independent determinations of phenotypic reversion under conditions previously described (34).
Sequence analysis. Deletion endpoints were defined by aligning their pyrE sequences with the wild-type pyrE (orotate phosphoribosyltransferase) coding sequence of 594 nucleotides (nt) (GenBank accession number Y12822). This sequence, and for some analyses, a sequence of 2,251 nt encompassing pyrE and the adjacent aspartate carbamyltransferase (pyrB) and orotidine 5'-monophosphate decarboxylase (pyrF) coding regions, was analyzed using programs of the GCG package supported by the SeqWeb interface (Accelerys Inc., Madison, Wis.). Direct repeats (DRs) and inverted repeats (IRs) were found by COMPARE, and consensus sequences were found by PILEUP, using penalties of 5 for gap creation and 1 for gap extension. Sequences surrounding the deletion endpoints were randomized by SHUFFLE, and six potential stem-loop structures stable at 65°C were identified by M-FOLD.
Probabilities. The expected numbers of short DRs or IRs near deletion endpoints were calculated from the number of N-mers that fit within a 20-nt window and the probability of matching any given N-mer at both ends of the deletion. Thus, for N = 3, there are 18 overlapping trimers in each 20-nt region and a 1/64 probability that a given trimer in the 5' region will define a DR or complementary IR compared with a given trimer in the 3' region. When the trimers at the ends are not paired in phase (i.e., when position relative to the deletion endpoint is ignored), all unordered combinations of trimers, i.e., 1/2(18)(19) = 171, are considered. This leads to an expected value per deletion of 171/64 for N = 3. Corresponding calculations yield values of 153/256 for N = 4 and 136/1,024 for N = 5. These expected yields were multiplied by the number of deletions available (five) to give the overall expected number of observed DRs or IRs.
The probability that n of 10 deletion endpoints would fall within duplication-prone regions comprising 104 bp (17.5%) of the pyrE gene is slightly greater than the Poisson term Pn = (xn/n!)e-
for 10 endpoints distributed randomly among six bins. In the latter case, x = 10/6, and the probability that three endpoints would fall into any one bin is P3 = [(10/6)3/3!]e-10/6 = 0.146.
|
|
|---|
To our knowledge, S. acidocaldarius strains MR123 and MR311, together with strains MR31, MR103, and a strain which we provisionally designate JDS22 (Materials and Methods), comprise the only collection of mutants of a hyperthermophilic archaeon having spontaneous deletions. For simplicity, the alleles are here designated
123,
311,
31,
103, and
22, respectively. Figure 1 shows the relative sizes and positions of the deletions. Only two (
31 and
123) preserve the translational reading frame. Although overlaps occur among four of the deletions, the endpoints do not coincide, and the endpoints do not appear to be tightly clustered (Fig. 1).
![]() View larger version (18K): [in a new window] |
FIG. 1. Deletions and other sequence features of the S. acidocaldarius pyrE gene. Horizontal open bars on the genetic map depict coding sequences of the pyrB, pyrE, and pyrF genes of S. acidocaldarius; arrows show direction of transcription. Open bars above the map show the size and location of spontaneous deletions (designated in italics). All other numbers identify nucleotide positions in the pyrE coding sequence (GenBank accession number Y12822). Positions of the first and last nucleotides removed are given above the genetic map on either side of the open bars. Regions of potential stem-loop structures, numbered in order of decreasing thermodynamic stability, are indicated above the map. The first and last nucleotides of these regions (in roman numerals) are as follows: I, 200 to 217; II, 133 to 153; III, 67 to 88; IV, 528 to 541; V, 481 to 508, VI, 340 to 350. Regions which have been duplicated by various spontaneous mutations are represented by the trapezoidal symbols below the genetic map. Sites of frequent mutation are shown as partial sequences in parentheses below the map; numbers indicate the position of the first nucleotide in each region. More-detailed information on the gene sequence and mutations are provided in reference 15.
|
22 could be considered to be located as close as 2 bp from the first region, but this result is not significant at the 95% confidence level. (A total of 16 bp, or 2.7%, of the pyrE gene lies within 2 bp of one of these four mutation-prone regions, so the probability that at least 1 of 10 deletion endpoints randomly distributed within pyrE would fall within 2 bp of a hot spot is about 0.27.) Similarly, deletion endpoints were not significantly associated with sites promoting duplication. Duplications of pyrE regions that total 104 bp have been found by sequencing spontaneous mutants, but none of the duplication endpoints coincide with deletion endpoints (Fig. 1). Deletion
22 lies entirely within a duplication-prone region of 43 bp at the 5' end of the coding region, and one endpoint of
103 falls within a region near the 3' end of pyrE. However, the probability that 3 of 10 randomly distributed endpoints would fall within regions of pyrE totaling 104 bp is greater than 0.146 (see Materials and Methods), so this result is not considered significant.
Sequence analysis of deletion endpoints. (i) DRs.
In other genetic systems, most deletions occur between DRs of less than 15 nt and remove one of the repeats. Mechanisms proposed to explain this pattern include misalignment of a growing 3' end between DRs, such that a stretch of template is bypassed during replication, or resection of the 5' ends of a double-strand break to generate single-stranded 3' ends which then anneal at DRs and support gap filling (3). In a previous study,
22 was found to have occurred between two GCT triplets in the 5' region of the pyrE, leaving one triplet (15). This fits the pattern seen in other systems, but the DR is shorter than most of the other examples (8, 13, 25, 30). Only one other spontaneous deletion in the set,
123, has endpoints defined by a repeat, and this repeat is only 1 nt (Table 1).
|
View this table: [in a new window] |
TABLE 1. Sequence contexts of deletion endpoints
|
22 DR in two ways. The first was based on the observation that
22 could have arisen by removing 20 consecutive bp starting at any of four positions in the nontranscribed strand (Table 1): the first G, the first C, the first T, or the next nucleotide (C). This reflects the general property that a deletion involving two DRs N nucleotides long has N + 1 possible pairs of internucleotide breakpoints. As a result, a given deletion event will meet the criterion of involving a DR of N nucleotides if any of N + 1 pairs of windows N nucleotides wide coincides with a DR when scanned in phase across the actual breakpoints. Thus, any given deletion has a 4/64 probability of involving DRs of 3 bp, and the probability that at least one of the five deletions would exhibit this property by chance is 20/64, or 0.31. Our second assessment of significance was based on the proposed role of DRs as sites of strand annealing. Since the stability of annealing increases dramatically with longer complementary sequences, we searched the region for longer DRs that would anneal with correspondingly higher probability. The pyrE sequence itself contains one DR of 10 nt, nine DRs of 8 nt, and 13 DRs of 7 nt, and spacings between these DRs do not differ markedly from the size distribution of the pyrE deletions recovered (Table 2). Thus, the failure to use longer DRs for deletion does not reflect their scarcity in this region or a constraint on the size of deletions recovered in the selection. These considerations argue that the coincidence of deletion endpoints with the GCT repeat is largely fortuitous. |
View this table: [in a new window] |
TABLE 2. DRs in the wild-type pyrE sequence
|
|
View this table: [in a new window] |
TABLE 3. Reversion of tandem duplications
|
31 is defined by a self-complementary IR, removing 5'AT... .TA3' from the nontranscribed strand. This minimal IR was not considered significant; the annealing product has negligible stability, and a 2-nt IR is expected at least once in five deletions by chance with a probability of 5/16, or 0.31. We reexamined the endpoint regions using a relaxed criterion in which self-complementary IRs of 3 nt or greater could occur anywhere within 10 bp of the deletion endpoint. This revealed eight trimeric IRs, two tetrameric IRs, and one pentameric IR. For comparison, the yields expected by chance for five deletions are 13.4 trimeric, 3.0 tetrameric, and 0.66 pentameric IRs (see Materials and Methods). We also confirmed that the region is not deficient in IRs of higher quality; pyrE contains one IR of 9 nt, one IR of 8 nt, and 10 IRs of 7 nt dispersed throughout the gene (not shown).
To evaluate possible placement of secondary structure at one end of a deleted sequence, we identified the six most stable stem-loop structures predicted in single-stranded DNA of the pyrE region (Fig. 1). These structures encompass a total of 124 nt, or 21% of the coding sequence, but no deletion endpoints fall within any of them. One deletion endpoint is adjacent to one of the stem-loop structures (the 3' end of
31 and stem-loop II) (Fig. 1). We also searched for sequences in pyrE and the two adjacent genes that could anneal across both ends of any of the five deletions. The two best examples of such potentially "templating" sequences are shown in Fig. 2. Each represents an octanucleotide, the first half of which matches the 5' flank of a deletion and the last half of which matches the 3' flank. Annealing of these segments would have minimal base pairing for stabilization (4 bp for each segment) and would entail formation of large single-stranded regions (Fig. 2). Nevertheless, three observations make us hesitant to exclude a mechanistic role for these sequences in the formation of
22 and
103. (i) The two deletions have different sizes and positions, yet their potential templates occur at nearly the same location. (ii) The two potentially templating octanucleotides do not occur in the flanking pyrB or pyrF genes and are predicted to occur by chance once in every 20 to 50 kb of S. acidocaldarius DNA. (iii) In E. coli, one lacI deletion, S24, is known whose proposed intermediate closely resembles the structures depicted in Fig. 2 (12).
![]() View larger version (22K): [in a new window] |
FIG. 2. Octanucleotides corresponding to novel joints. As a convenient way to show the relative positions of the sequences, the upper strand of the wild-type pyrE gene is drawn annealed to the octanucleotide in the lower strand that represents the novel joint of the corresponding deletion. This method of depiction is that used in other deletion studies (11, 12, 16, 30) and is not a proposal of a specific mechanism. Sequences remaining after deletion are shown in bold type, as for Table. 1. Numbers with vertical lines indicate the nucleotide position in each strand with respect to the pyrE coding sequence; the remaining numbers indicate the sizes of the loops shown.
|
31 and
22 share an octanucleotide sequence (TTTTGATA) at their 3' ends (Table 1). In order to test the generality of shared sequences, we aligned the 5'-end regions of the deletions with each other, the 3'-end regions with each other, and the 5'-end regions with the reverse complements of the 3'-end regions. Alignments involving all five deletions revealed no base conserved at a level greater than 60% at any position fixed with respect to the endpoint. To test for a sequence whose position may vary, we aligned the 5'-end regions with each other and the 3' end regions with each other, without constraint with respect to the position of the deletion endpoint. This yielded 100% consensus sequences TNNNT (where N is any nucleotide) (5' ends) and A (3' ends), with the deletion endpoints falling over a 10-nt range and a 9-nt range, respectively. To confirm that these consensus sequences are not significant, we separately randomized each of the end region sequences and repeated the alignments. This yielded 100% consensus sequences of ANT and ANA for the 5'- and 3'-end regions, respectively. Thus, the consensus sequences derived from randomized end regions were of equal or higher quality as those found near the actual deletion endpoints. |
|
|---|
|
View this table: [in a new window] |
TABLE 4. Properties of spontaneous deletion in diverse target genes
|
For example, the frequency of deletions in S. acidocaldarius pyrE would resemble that of E. coli lacI if about 90% of pyrE deletions went undetected (Table 4). Potential reasons for inefficient detection of a deletion include the following: (i) a leaky phenotype, (ii) inviability, or (iii) failure to be recognized as a deletion during screening. Interference from a leaky phenotype (possibility i) is excluded by the fact that FOA selects single-nucleotide frameshifts, small in-frame deletions, and small in-frame insertions throughout pyrE (15). Inviability of a large proportion of pyrE deletions (possibility ii) is inconsistent with recovery of deletions removing most of the gene and deletions that sharply decrease expression of the cotranscribed pyrF gene (15, 28). With regard to the failure to be recognized as a deletion (possibility iii), our screening of PCR products on agarose gels successfully identified an 18-bp deletion, and our screening probably has a detection threshold of 12 to 15 bp. Thus, a significant underestimation of the deletion frequency under our conditions would require the majority of pyrE deletions to be very short (i.e., less than about 15 bp) or to occur in pyrF. Neither of these alternatives agrees with the spectrum of spontaneous mutation in the pyrE and pyrF genes, in which only one deletion was observed among 108 independent, randomly chosen Foar mutants. This deletion (
22) removes 20 bp and lies in pyrE, which was the location of about 95% of all spontaneous mutations conferring the Foar phenotype (15).
Another property, the relatively low proportion of deletions for the S. acidocaldarius pyrE gene compared to the other genes in Table 4 could, in principle, result from the following: (i) production of other classes of mutation at a much higher rate than those in the other genetic systems or (ii) selection of leaky mutants at a much higher rate than those in the other systems. Possibility i is excluded by two sets of independent mutation rate measurements, which show that pyrE has a forward mutation rate very close to those of E. coli lacI and other bacterial genes (15, 18). Situation ii is more difficult to exclude, as about 60% of mutants selected by 50 µg of FOA per ml exhibit a leaky phenotype, and it is not known whether the other systems of Table 4 yield a significant proportion of leaky mutants. Therefore, the frequency of deletions among nonleaky S. acidocaldarius pyrE mutants (about 1%) may provide a more conservative value for comparison to the other systems, but this would still be only one-ninth of the closest value in Table 4.
Our failure to recover a deletion extending beyond the boundaries of the pyrE gene also raises the question of whether the FOA selection can recover such deletions in S. acidocaldarius. These would be precluded if DNA sequences flanking pyrE were essential, for example. The transcript beginning with pyrE includes pyrF and at least one other ORF, designated orf8 (32). However, two observations argue against essential roles for either pyrF or orf8: (i) frameshift mutations in pyrE which exert strong polar effects on pyrF expression are frequently isolated (28), and (ii) several pyrF frameshifts and one nonsense mutation, which inactivate pyrF and should exert similar polarity on orf8 expression, have also been isolated (15). An alternative explanation for the observed placement of deletions is that some position-dependent property of pyrE makes it more susceptible than pyrF to various forms of mutation, including deletion. This is consistent with the following facts: duplication mutations are relatively common in pyrE but have not been found in pyrF (15); point mutations are 20-fold more frequent in pyrE than pyrF (15); in S. solfataricus, IS elements insert threefold more frequently into pyrE than into pyrF (23); and diverse forms of DNA damage all induce mutation in S. acidocaldarius (29, 34).
Potential deletion mechanisms. The low frequency of deletions in S. acidocaldarius impeded our efforts to collect many examples for analysis, yet sequences of the available alleles distinguish the dominant mode (or modes) of their formation from those commonly proposed for other organisms. No association of deletion endpoints with DRs, IRs, or stem-loop structures was supported at the 95% confidence level, despite the fact that these structures are about as abundant in the S. acidocaldarius pyrE gene as in the various target genes where they have been found to promote spontaneous deletions (3, 8, 12, 25, 26). However, genetic assays in S. acidocaldarius do provide evidence of strand slippage mechanisms of deletion under certain conditions. Tandem duplications reverted at high frequencies, which increased with increasing length of the repeated sequence (Table 3). Furthermore, homopolymeric runs and short repeats have been shown to promote frameshift mutations and triplet expansions, respectively, in the wild-type pyrE gene (15). Our observations that tandem duplications were deletion-prone, whereas natural DRs were not may reflect the fact that the natural DRs are shorter (on average) and interrupted by nonrepeated sequences.
In some systems, consensus sequences occur at the endpoints of spontaneous deletions and have been attributed to normal or aberrant processing by DNA breaking or joining enzymes such as topoisomerases (3, 5). The available set of S. acidocaldarius deletions included only one example of a common sequence found at the 3' ends of two deletions (
22 and
31), and this remains difficult to interpret mechanistically. The lack of AA or TT dinucleotides at deletion endpoints argues that an activity resembling topoisomerase II of Sulfolobus shibatae, which cleaves DNA predominantly at AA or TT, yielding 2-nt, 5' extensions (4), is not a significant source of small- and medium-sized deletions in S. acidocaldarius. However, it should be emphasized that DNA sequence specificity has not been established for other potentially relevant enzymes of S. acidocaldarius. For example, the reverse gyrase of S. acidocaldarius (a topoisomerase I) exhibits the same minimal sequence specificity of its counterpart in S. shibatae (19), so it is difficult to rule it out as a potential source of deletions. The possible role of nearby sequences in stabilizing nascent deletions ("templating" [12]) could be inferred in two cases and will require isolation of additional deletions to evaluate. Our observations also seem consistent with repair of double-strand breaks by some form of nonhomologous end joining (3), which, in turn, is consistent with the associations of MRE11 and RAD50 homologues in the genome of S. acidocaldarius and other hyperthermophilic archaea (7).
Implications for genome evolution. According to mutational analysis of the pyrE and pyrF genes, S. acidocaldarius has one of the lowest genomic error rates yet measured, despite its extremely harsh growth conditions (15). The present study further shows that, within the low mutant frequency, deletions make up an unusually low proportion of spontaneous mutations. The relative frequencies of different classes of mutations now documented in S. acidocaldarius predict a pattern for the elimination of a gene following release of selective pressure. Initially, frameshift mutations should accumulate in homopolymeric runs, then elsewhere in the sequence, followed by slower accumulation of base pair substitutions and duplications, followed in turn by yet slower accumulation of deletions. It also seems significant that in S. acidocaldarius the more-frequent classes of mutations are reversible. This would serve to provide a window of time during which function of a mutated gene can be restored should moderate selection be reapplied. This idea finds experimental support in the observation that eight lineages of S. acidocaldarius subjected to multiple cycles of forward and reverse mutation failed to generate a functional pyrE gene with an altered sequence (2). Also, analysis of other genomes, such as that of Saccharomyces cerevisiae, has identified defective but recoverable genes which represent a reserve of potential functionality (17).
The slow, incremental nature of gene elimination predicted by the mutational processes in S. acidocaldarius seems to favor genome streamlining while minimizing concomitant inactivation of beneficial genes. It should be noted, however, that this situation may not typify all hyperthermophilic archaea. S. acidocaldarius has few, if any, active IS elements on the basis of genetic assays (15) and preliminary sequence analysis of the genome (R. Garrett, personal communication), but other Sulfolobus species have a number of them (20, 23, 31). In particular, transposition of several distinct families of IS elements causes a high frequency of spontaneous mutation at the pyrE and pyrF genes of S. solfataricus and may also generate frequent chromosomal rearrangements (23, 31). In such lineages, IS elements may assume the dominant role in inactivating nonessential genes (through insertion) and in removing them (through transposase-catalyzed imprecise excision).
This work was supported in part by grant MCB9733303 from the National Science Foundation.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»