Previous Article | Next Article ![]()
Journal of Bacteriology, June 2006, p. 4253-4263, Vol. 188, No. 12
0021-9193/06/$08.00+0 doi:10.1128/JB.00001-06
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Yoshimi Nemoto,
Rebecca E. Colman,
Zack Jay ,
and
Paul Keim*
Department of Biological Sciences, Northern Arizona University, Flagstaff, Arizona 86011-5640
Received 2 January 2006/ Accepted 14 February 2006
|
|
|---|
|
|
|---|
Although many VNTRs have no phenotypic effect and generate neutral genetic variation, some VNTR loci can alter important biological functions. A well-documented example is phase variation in organisms such as Haemophilus influenzae, Neisseria meningitidis, and Mycoplasma hyorhinis (27). In these pathogens, homopolymeric or dinucleotide repeats located between the 35 and 10 regions of the promoter differentially affect transcription of downstream genes, dependent upon the number of repeated sequence units (27). In other cases, VNTRs affect the actual amino acid sequence of proteins, rather than merely affecting transcription levels (27). VNTR loci located within or near genes should therefore be observed for signs of altered phenotypic and selective effects. However, in general, VNTR variation has not been associated with known biological effects and variation at many VNTR loci is likely selectively neutral (24). Regardless, the mutational processes generating VNTR variation should be similar whether a particular locus is under selection or not.
The variation generated at VNTR loci provides a high level of subtyping discriminatory power, making VNTRs very useful molecular epidemiological markers (13, 18). Typically, VNTR molecular subtyping systems consist of a series of VNTR loci around which PCR primers are designed. The resulting amplicons are then separated by electrophoresis. Differences in amplicon size at individual loci are assumed to be due to variation in repeat copy number at that locus, and the banding/peak patterns are scored accordingly. The use of different fluorescent dyes and amplicon size ranges allows multiple loci to be multiplexed and analyzed simultaneously. These multilocus VNTR analysis (MLVA) subtyping systems have provided unprecedented differentiation among strains of bacterial species previously thought to have very little sequence variation otherwise. These subtyping systems have been highly successful at differentiating among strains of a number of pathogens (for a review see reference 18). These MLVA systems have also been useful on multiple geographic scales, providing useful genetic discrimination whether the populations were worldwide (1, 12, 15) or regional (10, 15) or from a localized outbreak (10).
An understanding of VNTR mutational rates and the factors affecting mutational rates would make them more effective for epidemiological or microbial forensic investigations. Mutational rate estimates allow for probabilistic modeling of genetic relatedness. In turn, probabilistic modeling can provide statistical confidence measures that are critical in forensic or epidemiological situations for source attribution or identification of true versus fortuitous disease clusters. This approach was recently applied to a pair of New York City tourists who contracted plague (19). Using a probabilistic model developed from VNTR mutation rate data, Lowell et al. (2005) were able to assess the probability that the tourists had contracted the disease at their rural home in New Mexico as opposed to some other more nefarious urban source.
The rapid mutation rates at VNTR loci are thought to be the result of the compounding effects of various factors intrinsic to these loci. Such factors include repeat copy number, repeat unit size and sequence purity, and the functionality of the mismatch repair system (5, 7, 24). Of these, the most important factor seems to be repeat copy number, and any other factor should be evaluated only after the effect of repeat copy number has been accounted for (24). Repeat copy number effects appear to apply both across loci (2, 29) and within an individual locus across different alleles (30). However, most of these studies focused on only dinucleotide repeats or neglected to directly measure mutation rates, substituting locus diversity instead. Finally, the exact relationship between repeat copy number and mutation rate remains unclear (24). Both the amount of mutation rate variation explainable by repeat copy number and the linearity or nonlinearity of the relationship are still relatively unknown. Studies specific to bacterial VNTRs are also lacking, despite their increased use in genotyping.
Here, we estimate mutation rates for 28 Escherichia coli O157:H7 VNTR loci by performing a series of parallel, serial passage experiments (PSPEs) on nine diverse E. coli O157:H7 strains and an isogenic derivative (mutS) of one of the diverse strains. We investigate the effects of repeat copy number on mutation rate by estimating and comparing single-locus (O157-10) mutation rates for nine diverse strains where repeat copy number ranged from 9 to 66. We further investigate array size effect upon mutation rate by examining repeat copy number correlations across loci. Finally, we investigate the effect of mismatch repair on mutation rate by comparing overall mutation rates between a wild-type E. coli O157:H7 strain (EC536) and its isogenic derivative, a mutator (mutS) strain (EC1212). These data and their analysis will provide the foundation for evolutionarily based models for molecular epidemiological investigations.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. E. coli O157:H7 strains
|
291,000 total generations. Briefly, each PSPE consisted of 96 or 100 independent clonal lineages that were each serially passaged 10 or 40 times. For each of the PSPEs, a single isolated colony of the strain (T = 0) was used to start 96 or 100 independent clonal lineages by streaking for single colonies on 48 or 50 halved tryptic soy agar plates, respectively. All cultures were grown at 37°C for 24 h before the next passage. Each lineage was then serially passaged 10 or 40 times by streaking from a single colony from the previous passage. Strain ATCC 700927's lineages were passaged 40 times, while the remaining nine strains' lineages were passaged 10 times. The paired EC536 and EC1212 PSPEs were done with 100 lineages while the remaining PSPEs all involved 96 lineages. DNA was extracted from all 96 or 100 lineages at T = 10 for the T = 10 passage experiments and at T = 09, T = 19, and T = 40 for the T = 40 passage experiment using a simple heat lysis protocol (12). Mutational events for each strain were then visualized using MLVA, as previously described (14). As a confirmation of the fragment sizing scoring, we analyzed 34 mutational products and parental alleles to verify that detected mutations were due to changes in repeat copy number. Twenty-eight of the 37 previously described MLVA loci were used in the analysis. The remaining nine loci were not analyzed since they showed no diversity among geographically diverse E. coli O157:H7 isolates in the previous study and were therefore not expected to mutate in the PSPEs (14). Mutational events and products for nine of the strains were determined from an analysis of the T = 10 populations. Mutational events and products for strain ATCC 700927, however, were determined from an analysis of three time points (T = 09, T = 19, and T = 40) due to the increased likelihood of multiple mutational events occurring at the same locus and in the same lineage. The two earlier time points were used to detect these rare occurrences so that they could be added to the total mutations apparent at T = 40.
Mutation rate calculations.
Single-locus and combined 28-locus mutation rates were estimated for each strain by dividing the observed number of mutations by the number of total generations. The number of total generations was calculated as the average number of generations/colony x the number of lineages with usable data x the number of transfers. The number of generations per colony was the same for all calculations and was determined to be
27.0 generations/colony using an average of viable plate count results from all the strains. The number of lineages used in each individual strain single-locus calculation is listed in Table 2 as "n." The number of lineages used in the combined single-locus mutation rates was based on an average of the number of lineages with usable data across all nine strains for each locus. The number of lineages used in the combined 28-locus mutation rate calculations was based on an average of the number of lineages with usable data across all 28 loci for each individual strain calculation and an average of the number of lineages with usable data across all 28 loci and all nine strains for the combined strain calculations. In all, the nine PSPEs used to calculate combined VNTR mutation rates in E. coli O157:H7 encompassed a total of 2.9 x 105 generations, allowing the estimation of combined single-locus mutation rates greater than
3.4 x 106 mutations/locus/generation.
|
View this table: [in a new window] |
TABLE 2. Mutation rates and products for 10 E. coli O157:H7 strains
|
(allele frequencies)2 while allele number was simply the total number of alleles observed for a given locus in the isolate set. Regressions of both measures of diversity on mutation rate were performed in order to examine the relationship between mutation rate and diversity. Regressions were performed both with and without data from the most diverse locus, O157-10, due to the unusual outlier status of this locus and its strong impact on the data set (i.e., locus O157-10 data represented two-thirds of the total mutation rate data). Analysis of repeat copy number and mismatch repair effects. The effect of repeat copy number on mutation rate was analyzed at a single diverse locus and across all mutating loci. A regression of locus O157-10 mutation rate on locus O157-10 repeat copy number across the nine strains was used to examine the effects of repeat copy number without the potential confounding effects of any other repeat-related factors. A regression of mutation rate on repeat copy number for all mutating loci in the nine PSPEs was used to confirm and generalize the results observed at locus O157-10 across multiple loci. The effect of mismatch repair was examined by comparing the number of observed mutations between EC536 and EC1212 using a chi-square test. Some lineages generated during the EC1212 PSPE possessed locus O157-56 alleles that differed by 1 bp from the T = 0 allele size. These alleles were sequenced along with controls to identify the source of the size variation.
|
|
|---|
In this study, we examine the nine wild-type strains for their VNTR mutational products, calculate the resulting mutant allele frequencies, and estimate mutational rates in both a combined multiple-locus and single-locus fashion. An in vitro mutator (mutS) strain population and a natural isolate population assemblage were used to contrast the wild-type and mutant in vitro VNTR mutational processes.
Combined 28-locus trends.
Each T = 10 experiment represented
25,000 generations while the single T = 40 experiment represented
100,000 generations. Overall, we report here nearly 300,000 generations of growth across
900 clonal lineages for the nine PSPE wild-type populations. Our evaluation of 28 VNTR loci represents a combined locus x generation analysis of nearly 8.4 million loci x generations. Cumulatively, 186 mutational events were observed corresponding to a combined 28-locus mutation rate estimate of 6.4 x 104 mutations/generation (Table 2). Of the 186 mutations, 116 (62.4%) were insertions and 70 (37.6%) were deletions (
2 = 11.376, P = 0.001; Table 2; Fig. 1). However, this overall insertion-to-deletion bias was due to a highly biased result at a single locus in a single PSPE (O157-10, ATCC 700927). In this locus x strain example, we observed a highly significant difference of 30 (93.8%) insertions versus two (6.3%) deletions (
2 = 24.500, P < 0.001; Table 2). Indeed, when the ATCC 700927 locus O157-10 result was removed, there was no significant difference between insertions and deletions in the combined data (
2 = 2.104, P = 0.147).
![]() View larger version (11K): [in a new window] |
FIG. 1. Frequency distributions of mutation products. Shown are frequencies of insertion (A), deletion (B), and total (C) mutations involving <21 repeat units plotted as a percentage of total mutations.
|
Large repeat copy number mutations were much more likely to be deletions than insertions and to occur at locus O157-10 than at the other VNTR loci. The vast majority (11/12 or 92%) of the large repeat copy number mutations were deletions, including three 5-repeat, one 6-repeat, one 10-repeat, two 12-repeat, two 13-repeat, one 19-repeat, and one 20-repeat mutation. The single large repeat copy number insertion event involved the addition of seven repeats (Fig. 1). While nine of the large repeat copy number mutational events occurred at locus O157-10, the remaining three were consistent with the higher-than-expected frequency trend observed above, arguing for the generality of our observations regarding large repeat copy number mutations. These three mutations included a five-repeat deletion at locus O157-9 and 6- and 13-repeat deletions at locus O157-11. While both insertion and deletion events were observed, there was a clear bias for deletions versus insertions in these large repeat copy number events. Mechanistically, both insertions and deletions are feasible, but selection could easily affect their observed frequency. VNTR alleles of very large size could be detrimental or just highly unstable, leading to the observation of a greater number of deletion events.
VNTR mutation rates. The VNTR mutation rate estimates were very high and varied across loci. Sixteen of the 28 VNTR loci mutated at least once in the nine combined PSPEs. Combined data for the nine populations provide the most accurate mutation rate estimates for individual loci. The combined population rates for the 16 mutating loci ranged from a low of 3.4 x 106 to as high as 4.0 x 104 mutations/generation (Table 2). The lowest rates were based upon single observations while the highest rates (O157-10) were based upon 124 events. No mutations were observed for 12 VNTR loci, suggesting that their individual mutation rates must be less than 3.4 x 106 mutations/generation, the detection limit given the 291,000 generations that we examined.
Diversity at individual loci, estimated from a large collection of natural isolates, was correlated with our in vitro mutation rate estimates. Diversity (D) and allele number are two measures of natural diversity, and both were correlated with mutation rate. Diversity across the 28 loci ranged from 0.05 to 0.97 while allele number ranged from 2 to 47 for our set of 344 isolates. Diversity was fitted best using a second-degree polynomial relationship with mutation rate (r2 = 0.634, P < 0.0001; Fig. 2A and B) while total number of alleles fitted a linear relationship (r2 = 0.813, P < 0.0001; Fig. 2C and D). A nonlinear relationship between mutation rate and diversity is expected because D is a limited metric with a maximal value of 1.0. Although a second-degree polynomial relationship may not be the intrinsic relationship between diversity and mutation rate, it was the best fit to the data presented here. Because allele number has no theoretical maximum value, its linear relationship is also a reasonable expectation. Additional mutation rate data between the majority of our locus data and the outlier locus O157-10 may provide further insight into these relationships. The correlation between mutation rate and the two diversity measures was significant with and without the inclusion of data from locus O157-10. However, only the correlation results for the analysis without locus O157-10 are presented, due to the outlier status of locus O157-10 in the plots (Fig. 2A and C).
![]() View larger version (19K): [in a new window] |
FIG. 2. Diversity as a function of mutation rate. Two diversity measures, diversity (A and B) and total number of alleles (C and D), are plotted against mutation rate. Diversity was calculated from a collection of 344 diverse E. coli O157/O55 isolates. Regression plots are presented both including (A and C) and excluding (B and D) data from locus O157-10 (indicated by an arrow in panels A and C). Correlations from data excluding locus O157-10 are presented for both diversity measure plots.
|
![]() View larger version (23K): [in a new window] |
FIG. 3. Locus O157-10 mutation rate as a function of repeat copy number. (A) Allele frequency distribution for 344 diverse E. coli O157/O55 isolates is presented with the PSPE strain repeat copy numbers labeled 1 to 9 and colored black. Strains 1 to 9 are H6436, ATCC 700927, Spain 401, F6750, EC536 and EC1212, 01A6820, DEC4C, 01A7146, and Spain 41, respectively. (B) A correlation between repeat copy number and mutation rate for locus O157-10 is presented along with the regression line equation.
|
2 = 17.974, P = 0.021). Indeed, the 3.2 x 104 to 1.2 x 103 mutations/generation range of combined 28-locus mutation rates (Table 2) could be attributed solely to the differences in mutation rate at locus O157-10. One hundred twenty-four (66.7%) of the 186 observed mutations occurred at locus O157-10, and when those mutations were removed from the combined VNTR locus calculations, there were no statistically significant differences among combined 28-locus mutation rates for the nine strains (
2 = 12.280, P = 0.139). Locus O157-10 allele sizes varied the most dramatically of any locus in this study and therefore had the most impact on combined 28-locus mutation rates. Because the allele size variation at the other loci was smaller, they had a lesser effect on the combined 28-locus mutation rates. Thus, it appears that there is no pure "strain" effect on combined-locus mutation rates, but clearly "allele size" at individual loci does affect mutation rate.
We also detected a significant and strong correlation between mutation rate and repeat copy number across loci (r2 = 0.833, P < 0.0001; Fig. 4A). However, this result was largely due to the locus O157-10 data. Removing the locus O157-10 data lessened the statistical significance somewhat, but differences in repeat copy number still explained nearly half of the variation in mutation rate (r2 = 0.452, P < 0.0001; Fig. 4B). This was all the more dramatic because the nine wild-type strains were not preselected for allele sizes at the non-O157-10 loci and, hence, our statistic power for detecting allele size effect was greatly reduced for these other loci. Importantly, allele size appears to be a general across-locus effect and ca. 50% of mutation rate variance should be predictable from repeat copy number based upon this data set. The magnitude of the repeat copy number effect should be roughly equivalent to the slope of the regression line. Therefore, each additional repeat copy at a locus increases the rate
8.1 x 106 to 1.1 x 105 mutations/generation (Fig. 4A and B). Similarly at the O157-10 locus, each additional repeat copy increases the mutation rate by
7.1 x 106 mutations/generation (Fig. 3B). These data provide a quantitative method for predicting mutation rate differences among alleles, solely based upon repeat copy number differences.
![]() View larger version (20K): [in a new window] |
FIG. 4. Mutation rate across loci as a function of repeat copy number. Correlations between repeat copy number and mutation rate are presented for all mutating loci in the PSPEs, both including (A) and excluding (B) data from locus O157-10. Equations for the two regression lines are also presented.
|
29 repeats (Fig. 5). Unlike smaller repeat mutations, the frequency of large repeat copy number events was not affected by additional repeat copies, once the 29-repeat threshold was reached (Fig. 5;
2 = 1.556, P = 0.817). The number of repeats involved in the large repeat copy number events was also not affected by additional increases in repeat copy number. For instance, at 29.67 repeats, there was one 5-, one 13-, and one 19-repeat deletion. At 41.67 copies, there was a single 12-repeat deletion. At 48.67 copies, there was one 5- and one 12-repeat deletion. At 59.67 copies, there was one 20-repeat deletion, and at 66.67 copies, there was one 7-repeat insertion and one 10-repeat deletion. Large repeat copy number mutations do not follow the same frequency progression observed for small repeat mutations, and they are highly biased towards deletion events.
![]() View larger version (12K): [in a new window] |
FIG. 5. Large repeat copy mutations occur in alleles with higher repeat copy numbers. The number of mutations involving >4 repeats at locus O157-10 is presented for the nine PSPE strains according to their repeat copy numbers.
|
2 = 0.029, P = 0.866). However, the 28 VNTR loci examined here all have repeat unit sizes ranging from 5 to 62 bp, and MutS has only been shown to recognize stem-loop structures of
4 bp (21). Interestingly, three T = 10 lineages from the EC1212 PSPE population possessed locus O157-56 allele sizes that differed by 1 bp from the T = 0 allele size. Allele sequencing revealed that these 1-bp discrepancies were due to a 1-bp deletion in one lineage and to 1-bp insertions in two lineages. These mutations occurred in a poly(G) tract downstream from locus O157-56 but within the amplicon for that locus. No such 1-bp shifts were observed in any of the wild-type EC536 lineages. This is a small number of mutations but consistent with other reports of MutS action on VNTRs of repeat sizes less than 4 bp (16, 25). MutS is capable of recognizing and repairing mutations in single nucleotide repeats but not larger (e.g., 5-bp) repeat arrays. |
|
|---|
Beyond a doubt, the mutational processes that lead to great VNTR diversity are complex. We have observed many different mutation products, including insertions and deletions of both single and multiple repeats, and mutation rates that vary both among loci and even among alleles at a single locus. We believe that VNTR mutations may occur by both SSM and recombination mechanisms. While all the factors governing these processes are unclear, repeat copy number certainly plays an important role in determining mutation rates both among different loci and among different alleles.
We propose here a practical, dynamic model for smaller VNTR mutations (e.g., disregarding the 12 large repeat copy number mutations that we observed here). (i) Approximately 80% of the mutational events will consist of a single-repeat change, and these will have an equal chance of being an insertion or deletion. (ii) The remaining 20% of the time, a mutation will involve multiple repeats but will have an equal chance of being an insertion or deletion. (iii) If a mutation involves >4 to 5 repeat units, a deletion event will be much more likely than an insertionthough this may be due to selection rather than an intrinsic mechanistic attribute. (iv) The number of repeats involved in a multiple-repeat mutation will roughly follow a geometric distribution, where a two-repeat change is more likely than a three-repeat change, which is more likely than a four-repeat change, etc.
Our proposed model is remarkably similar to a statistical model proposed to model VNTR mutations in humans (4). In this two-phase model, X is equal to one repeat with a probability of P. Multiple-repeat mutations would comprise the remaining mutational events with a frequency of 1 P. The number of repeats (X) involved in the multiple-repeat mutational events would follow a geometric distribution (gj) with a specified variance,
(4). This allows for a certain percentage of mutations to consist of multiple repeat units and, using the values above (e.g., P = 0.80), could provide an excellent model for examining allele frequency distributions for various VNTR loci in natural populations of E. coli O157:H7. If one uses 0.80 for P, then 20% of the mutational events would be multiple-repeat events. A geometric progression following P(X = n) = (1 P)n 1 would then suggest that 80% of the remaining 20% (
16%) of these would be two-repeat mutants. The predicted three-repeat and four-repeat mutant frequencies would then be 3.2% and 0.6%, respectively. Again, disregarding large repeat copy number events, we observed 80% one-repeat, 12% two-repeat, 5% three-repeat, and 3% four-repeat mutants. These observed data approximate the theoretical geometric progression. In conclusion, this model represents a great improvement over other VNTR mutational models that assume that only single-repeat mutations will occur (8).
The above pattern of single- and multiple-repeat mutations was observed for 93.5% of the mutations in this study and is consistent with SSM, the mechanism thought to be primarily responsible for VNTR mutations. An SSM model dictates that the majority of mutation events should represent the smallest possible mismatches (16). Thus, as seen here, single-repeat mutations should be more common than two-repeat mutations, which should be more common than three-repeat mutations, etc. The 12 (6.5%) mutations in this study that deviated from this pattern involved large repeat copy number mutational events that were likely due to a secondary mutational mechanism involving recombination.
Recombination has mostly been discounted as a potential VNTR mutational mechanism since mutations in recombination genes such as recA (16) and rad52 (11) have had no measurable effect upon VNTR mutation rates. Recombination seems much more likely than SSM to have produced the large repeat copy number mutations observed here. If SSM is the predominant mutational mechanism and recombination plays only a secondary role, then it is possible that any difference in mutation frequency due to a lack of recombination in previous studies was masked by the much greater frequency of SSM mutations. Very large sample sets would be needed to detect any differences in the frequency of such rare events, especially in a background of more frequent SSM mutations. Alternatively, the repeat arrays being examined could have been too short for any appreciable unequal crossing-over to occur. In E. coli, recA-mediated recombination requires a minimum of
20 bp of homologous sequence and its frequency increases exponentially between 20 and 74 bp; thereafter, recA-mediated recombination increases linearly (28). In this study, large repeat copy number mutations were observed only at locus O157-10 when there were
29 repeats in the array (
174 bp of repeat sequence). However, during an unequal crossover event the regions are offset and, thus, less than 174 bp of homologous sequence would be available for recombination. The maximum pairing of sequence during a chromosomal misalignment would consist of half of the repeats, with a corresponding homologous sequence length of 87 bp at the 29-repeat threshold. Interestingly, this is near the 74 bp that is required for near-maximum recombination frequency in E. coli. Whether SSM or recombination is the mechanism, the low frequency of large repeat copy number events combined with their tendency to occur only at large allele sizes indicates that the higher-than-expected frequency of such events could likely be ignored for smaller and more common VNTR arrays. The SSM model with its geometric distribution of multiple-repeat mutation frequencies appears to be adequate in most cases.
Is the expansion of tandem arrays bounded, or can they expand to infinite size? If expansion is intrinsically unbounded, what prevents the generation of very large arrays (7, 24)? Theoretically, insertions and deletions are equally likely using either an SSM or a recombination mutation model. Previous studies have indicated insertion (23, 26, 31) and deletion (16, 26) biases, or even no bias (9), depending upon the locus examined. Here, across a panel of loci, we determined that there was no statistically significant difference between insertion and deletion frequencies. However, we also determined that in the case of large repeat copy number mutations, deletions are far more likely than insertions, a phenomenon that has been tentatively observed elsewhere (6). This could be due to selection against large allele states and thus could reflect a stabilizing mechanism that prevents uncontrolled array expansion. Large repeat copy number deletion events could even compensate for an insertion bias since a single large deletion could effectively remove multiple insertion events. Although this process likely applies across loci, large alleles may be more detrimental at some loci than others, such as those in important coding regions, which could explain why some loci are able to maintain very large arrays while other loci remain relatively small.
For recently emerged pathogens and especially in the analysis of disease outbreak isolates, MLVA represents an important and sometimes the only tool for discriminating among isolates. While the rapid evolution of VNTR loci provides great discriminatory power among closely related isolates, it also inhibits the analytical power for detecting more distant relationships. In a set of uncharacterized isolates the true relationships will be unknown and of differing distances. This can be mitigated somewhat by using more VNTR loci (e.g., >20) with a range of mutation rates. This will increase the analytical capacity of a subtyping system to estimate a range of heterogeneous genetic relationships (14). It is important to recognize that the lack of congruence among phylogenetic reconstructions may be as much due to a lack of characters as to the molecular method. Researchers should not expect accurate phylogenetic reconstructions if only a small set of rapidly mutating markers is employed. Regardless, it is important that any phylogenetic reconstructions be viewed as hypotheses and, if possible, compared to alternatives using probability models. Bayesian approaches are only possible once the mutation rates and associated mutational product probabilities are known.
The insights into VNTR mutation processes gained from our mutation rate studies provide the foundational knowledge needed to construct probabilistic models for natural pathogen populations. Specifically, VNTR mutation rate estimates can be used for improving the quality of molecular subtyping in two unique ways. First, using mutation rate estimates and a Poisson distribution, confidence intervals can be established around the number of generations required for one or more mutations (19). This can be used to place a statistical value on the likelihood of two isolates being related to each other that goes far beyond simple fragment matching approaches. Second, VNTR mutation rate estimates provide a framework for deciding how to mathematically weight VNTR differences when constructing phylogenies. Weighting data from fast-mutating loci less than data derived from slowly mutating loci should provide more accurate phylogenies. Exactly how to weight loci is problematic, but one suggestion is to use the inverse of the mutation rate. Unfortunately, mutation rate estimates are not easily obtainable for all organisms at all loci and can vary depending upon the allele size. The correlation between diversity and mutation rate suggests that the inverse of the diversity could also be used as a weighting factor, which would at least provide a weighting value for every locus and for organisms where mutation rate studies are not available. A third possibility would be to apply a PHRANA approach wherein isolates are analyzed using slower-mutating loci first, followed by faster-mutating loci to "cluster bust" any undifferentiated clades (13). Ideally, the first markers applied in any PHRANA approach would be highly evolutionarily stable markers such as single nucleotide polymorphisms, although the approach would also benefit analyses of VNTR markers alone if rapid markers are segregated from slower ones in the analysis. This approach requires categorizing VNTR loci into different levels (i.e., slow, medium, fast, etc.), but does not require mutation rate data for all loci. Standardization would also be easier using this approach, although the exact cutoffs for categorizing VNTR loci would have to be determined and would, of necessity, be somewhat subjective.
While the manuscript was in review, Noller and colleagues (22) published a mutation study on seven VNTR loci in E. coli O157:H7, a subset of the 28 loci studied here. Due to the smaller population sizes employed, only the most rapidly mutating loci were observed to generate mutational products in their study. These included three loci: O157-10 (TR2), O157-9 (TR1), and O157-3 (TR5). Four additional loci were studied, O157-17 (TR3), O157-19 (TR7), O157-25 (TR4), and O157-34 (TR6), but did not mutate under their experimental design (22), again, probably due to the use of smaller population sizes. The frequency of single-repeat versus multiple-repeat mutation products was consistent with our results. Single-repeat mutational products were observed
85% of the time by Noller et al. (2006), which was consistent with our finding that
75% of VNTR mutational products are single-repeat differences. Noller et al. (2006) also observed an insertion bias, but in our study, this was limited to a single locus x strain example and does not appear to be a general phenomenon. Nonindependent sampling could lead to an observed bias that is not intrinsic. Importantly, the rate estimates reported by Noller et al. (2006) are significantly different from and
10-fold higher than those in our study. We believe that these significantly higher rates are overestimates due to the nonindependent design of their mutant sampling. In other words, many of their observed mutants were likely the result of a single rather than multiple mutation events. The sampling of multiple progeny from a single event would lead to elevated rate estimates. Noller et al. (2006) categorize their sampling design as nonindependent, which really represents a mutant observation rate rather than a mutational process rate. Observation of nonindependent mutants is highly dependent upon the experimental design and sample scheme and only somewhat dependent upon the underlying mechanistic rates. They can be used to establish relative rates among loci but should be used with great caution otherwise.
This work was supported by the Bioforensics Demonstration and Application Program and the Cowden Endowment in Microbiology.
Present address: Montana State University, Bozeman, Mont. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»