Previous Article | Next Article ![]()
Journal of Bacteriology, February 2007, p. 1238-1243, Vol. 189, No. 4
0021-9193/07/$08.00+0 doi:10.1128/JB.01183-06
Copyright © 2007, American Society for Microbiology. All Rights Reserved.
College of Dentistry,1 School of Medicine,2 Department of Biology, New York University, New York, New York3
Received 31 July 2006/ Accepted 25 October 2006
|
|
|---|
|
|
|---|
Genetic diversity is well documented within S. mutans. For example, variability among strains of S. mutans has been demonstrated for the presence of plasmids (8, 11, 23); mutacin I, II, III, and IV operons (5, 14, 33, 34, 46); serotype antigens (38, 46); competence (15, 21, 29); and the msm, bgl, cel, and gtfBC loci (46), among others. Indeed, the genome of S. mutans UA159 shows that nearly one-third of its open reading frames are of unknown function and 16% are unique to S. mutans. Recent work by Waterhouse and Russell (46) shows a mosaic of different genetic loci, or what they call "dispensable genes," distributed among strains of S. mutans. Given the wide distribution and diversity of genotypes and genetic loci in S. mutans cited above, it seems likely that different strains of S. mutans will have both unique and common genetic loci not present on UA159 (37, 46), which should prove useful in charting S. mutans' evolutionary history.
A cryptic plasmid resides in
5% of the isolates of S. mutans (8, 24). The function of this plasmid remains unknown, although its sequence has been published (48). Because of its high sequence variability in the hypervariable region (HVR) and its low prevalence, the cryptic plasmid is a useful epidemiological marker for studying transmission (11) and, as here, its phylogenic history. The 5.6-kb plasmid was initially thought to be related to bacteriocin production, because most bacteriocins of gram-positive bacteria are plasmid encoded (43). Subsequent discovery of the chromosomal locus for mutacins I, II, III, and IV (33, 34), coupled with sequencing of the plasmid, showed that mutacins are not plasmid encoded (9, 48). Nonetheless, virtually all known plasmid-bearing strains of S. mutans elaborate either mutacin I or II, and these strains are also naturally competent (29). Moreover, mutacin and competence are coordinately expressed as part of an overall mechanism to acquire DNA (16, 28).
Here, we examined the population structures of plasmid-containing strains of S. mutans from individuals of different racial/ethnic and geographic backgrounds, using both the hypervariable region of the plasmid and several chromosomal loci to construct phylogenies. We found two significantly incongruent phylogenies, one for the chromosome and another for the plasmid, which displayed independent histories, perhaps as a result of horizontal transfer.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Plasmid-containing strains of S. mutans
|
603-bp fragment of the HVR area of the 5.6-kb cryptic plasmid, spanning base pairs 4510 to 5336 on pUA140 (48). Primers MutA-F1 (TACGTTCAGTTACACACATG) and MutA-R (CTTAGCAACAGTAACTATTG) yielded an amplicon of
210 bp, and Mut2-F (ATAACGGGGGGCTTAAGCTGTA) and Mut2-R (GCCAAGAATGGTCTGAAGAAACA) yielded an amplicon of
600 bp for the detection of mutacin I and mutacin II, respectively. Primers of the intergenic spacer region (IGSR) were as described previously (12), yielding an amplicon of
389 bp. Primers for determining the serotype were described elsewhere (38). The HVR and IGSR were sequenced in both directions, using the specific forward and reverse primers described above. Phylogenetic analyses. To test the robustness of the data for differences in methodology, maximum likelihood (ML) and weighted parsimony were used, as implemented by PAUP version 4.0 beta 10 (42).
For ML, ModelTest (32) was used to identify an efficient model. For all data sets, the HKY85+G model was selected with the following specified parameters and search strategies. For the IGSRs of plasmid-containing strains, the parameters were as follows: fA = 0.3100, fC = 0.1866, fG = 0.2465, and fT = 0.2569 (where f is frequency); transition/transversion ratio, r = 1.815 (
= 3.602); gamma shape parameter for modeling rate differences among sites,
= 0.0843 (modeled using four discrete rate categories). A heuristic search for the ML tree was performed by 100 taxon addition replicates, swapping on all saved trees, so that 181,158 total trees were evaluated. For the HVRs of plasmids, the parameters were as follows: fA = 0.3553, fC = 0.1613, fG = 0.1253, and fT = 0.3581; transition/transversion ratio, r = 1.401 (
= 3.420); gamma shape parameter,
= 0.0084 (using four discrete rate categories). A heuristic search for the ML tree was performed by 100 taxon addition replicates, swapping on all saved trees, so that 13,781 total trees were evaluated. For the IGSRs of strains without plasmids, the parameters were as follows: fA = 0.3115, fC = 0.1899, fG = 0.2450, and fT = 0.2536; r = 0.915 (
= 1.814);
= 1.216 (four discrete categories). The same parameters and search strategy were used for bootstrap analyses, except that neighbor joining was used to generate starting trees for each replication for the plasmid HVR data.
For weighted parsimony, transversions were weighted twice as much as transitions to reflect the greater susceptibility of transitions to be superimposed. Also, we noted that ModelTest suggested that the data fit models where transition rates were higher than transversion rates (see the
values above). Indel "gaps" were ignored, and parsimony-uninformative characters were excluded. The following strategies were used with the different data sets. For the IGSRs of plasmid-containing strains, two analyses were undertaken, one with only the DNA characters and another including the DNA, serotype, and mutacin characters. In the latter case, the serotype-plus-mutacin partition was weighted equally to the partition of parsimony-informative DNA characters. In both cases, a complete branch-and-bound analysis was performed. A strict consensus was calculated for the multiple maximum-parsimony trees obtained in each case. For bootstrap analysis, starting trees were obtained by 10 random taxon addition sequences per replication. For the HVRs of plasmids, only DNA characters were used. In both the search for the maximum-parsimony tree and bootstrap analysis, 10 random taxon addition sequences were performed to obtain starting trees. Evolutionary changes in mutacin and serotype were reconstructed by parsimony.
To get the mean pairwise distance of ingroup taxa, the HKY85+G model was used to derive a pairwise distance matrix. The average of all cells below the diagonal was calculated with Excel (Microsoft).
Nucleotide sequence accession numbers. The sequences of the plasmid HVRs and IGSRs have been deposited in the GenBank database under accession numbers AF139604 to AF139611, AF077024 to AF077024, and AF093650 to AF093667.
|
|
|---|
![]() View larger version (29K): [in a new window] |
FIG. 1. Chromosomal-DNA fingerprints of plasmid-containing strains of S. mutans from unrelated individuals, restricted with HaeIII. (A) Strains from different ethnically/racially/geographically distinct hosts. (B) S. mutans from a Caucasian population from the United States and Australia. Strain CA143 contains a 0.9-kb insertion sequence. (C) Strain CH830, obtained from an individual from the southern region of China (Hainan province), displays a chromosomal-DNA fingerprint similar to that of strain CA96, a strain isolated from a Caucasian individual from Birmingham, AL. Lane , bacteriophage lambda cut with HindIII size standard.
|
![]() View larger version (21K): [in a new window] |
FIG. 2. Unrooted maximum-likelihood phylogeny of the cryptic 5.6-kb plasmid as inferred from HVR sequences (see Material and Methods for details of analysis). Taxa with names in blue represent strains with mutacin II, and those in black represent strains with mutacin I; taxa and branches in red represent strains and ancestral lineages with serotype e. One of two possible reconstructions is depicted for serotype e; the alternative possibility is that serotype e was independently derived for CH5A. The pairs of numbers on branches are bootstrap values: the first number is from a likelihood bootstrap analysis, and the second is from a weighted-parsimony bootstrap (1,000 replicates each). Hyphens and branches without numbers indicate bootstrap values that were below 50%. The branch lengths are proportional to the numbers of substitutions/site, as reconstructed using the HKY85+G likelihood model. Abbreviations for ethnicity of the human host: AF, African; AA, African American; CA, Caucasian American; CH, Chinese; JP, Japanese; BR, Brazilian; AM, Amazon Indian; SW, Swedish Caucasian; HI, Hispanic.
|
The evolutionary association between the plasmid and the mutacin II phenotype suggests that the mutacin loci may have been acquired independently three times, possibly by horizontal transfer (blue taxa in Fig. 2). In contrast, an association with serotype e is probably continuous along an evolutionary lineage (red lines in Fig. 2), although this association has been lost independently three times (in the lineages to CH43, JP85-5, and AA138 and a larger clade including BR15, AA31, and JP9-4). That is, the evolutionary history of serotype conversion either is linked to or parallels the plasmid's history. The serotype e strains are found in Asian (CH830 and CH5A) and African/African American (AF199, AA669, LM7, and AA545) hosts, but also in a strain from a Hispanic child (HI24) and Swedish Caucasian (SW114) hosts. This association with the most basally derived hosts (Fig. 3) is consistent with an early introduction of the plasmid into S. mutans.
![]() View larger version (17K): [in a new window] |
FIG. 3. Maximum-likelihood phylogeny of IGSR sequences from strains with plasmids, rooted with IGSR Streptococcus ratti CCUG 27642 (see Materials and Methods for details of the analysis). Taxa and branches in blue represent strains with mutacin II; taxon names in red represent strains with serotype e. The triplets of numbers on branches are bootstrap values: the first two numbers are from weighted-parsimony analysis including or excluding, respectively, serotype and mutacin characters (2,000 and 1,000 bootstrap replications); the third is from a likelihood bootstrap (952 replications). When the serotype and mutacin characters were included and each was weighted the same as the set of DNA characters (i.e., the three data partitions were weighted equally), HI24 grouped with the AF199 cluster with a parsimony bootstrap value of 58%. Hyphens and branches without numbers indicate bootstrap values that were below 50%. The branch lengths are proportional to the numbers of substitutions/site as reconstructed using the HKY85+G likelihood model.
|
Both weighted-parsimony and maximum-likelihood analyses of the IGSR DNA characters alone produced trees in which only two branches were well supported by bootstrapping. In an attempt to maximize the phylogenetic signal, characters for serotype and mutacin types were added in a maximum-parsimony analysis; each was given character weights equal to the set of nine informative nucleotides so that the three different partitions (serotype, mutacin, and IGSR) were equally weighted (Fig. 3). However, the serotype or mutacin characters did not add much to the resolution; besides the two nodes with bootstrap values shown in Fig. 3, only one other node was supported above 50% (HI24 plus the AF199 cluster; 58% bootstrap value).
Although the poor resolution limits our ability to draw decisive conclusions, there are some interesting features of the IGSR tree that contrast with the plasmid HVR tree. First, association of the strains with mutacin II is continuous along an evolutionary lineage, and associations with serotype e have evolved multiple times. Although it is formally possible that multiple independent changes to the mutacin II genotype occurred, a single-gain-single-loss scenario is most parsimonious, given the ML tree. Second, relationships among the taxa are different. For example, whereas the JP9-4 IGSR is identical to that of CH638 and CH639 but is in a distinct clade from the CA96 cluster (Fig. 3), the plasmid HVR of JP9-4 is most closely related to the CA96 cluster but is phylogenetically distinct from the CH638 and CH639 HVRs (Fig. 2). Also, the AA140 IGSR cluster is essentially identical to the SW114 IGSR (Fig. 3), but the plasmids of these strains are at opposite ends of the tree (Fig. 2).
|
|
|---|
In view of the overall diversity displayed in the CDF profiles, it was surprising to find that strains obtained from Caucasian populations were essentially identical. This pattern of similarity was also found in both HVR and IGSR loci, suggesting that strains from this population were derived from a relatively homogenous founder population. Although most strains came from Caucasians from Birmingham, AL, two strains came from Australia and Michigan. Clearly, sampling from a more heterogeneous Caucasian population is necessary to make definitive inferences about possible bottleneck effects that restricted the diversity of these strains. Moreover, the evolutionary history of plasmid-bearing strains may not be generalizable to all S. mutans populations. It is noteworthy, however, that a similar bottleneck effect has been reported for Caucasians (35, 40, 47). This lack of diversity among Caucasian strains may also be interpreted as representing the age of the population; that is, the youngest human race is the least diverse because they have had a shorter time to evolve.
Our data show that plasmid-containing strains of S. mutans contain twice as many polymorphisms in the IGSR as the corresponding locus in plasmid-free strains. This finding could indicate that the plasmid facilitates or stabilizes the entry or recombination of foreign DNA because of its greater copy numbers within the cell. Support for this contention is found in a plasmid from strain CA143 from a Caucasian child (mutacin II; serotype c) that contained an insertion element (IS1216; GenBank accession no. AF104381) downstream of the HVR. This region shares similarity with the chromosomal mutF-mutG operon of mutacin II, in addition to two transposase fragments (SMU.226c and SMU.1329c) present in S. mutans UA159 (2). It also suggests that there may be a linkage between mutacin II and the cryptic plasmid. Alternatively, the plasmid's presence maybe simply the result of the cell's genetic competence, promoting the uptake of more than one genetic element, including the mutacin locus. The greater diversity of plasmid-containing strains is also compatible with an early evolutionary association between the plasmid and S. mutans, as well as with recent plasmid loss in the plasmid-free strains.
While the HVR region yielded a well-resolved tree, there seems to be little evolutionary congruence between the plasmid and the mutacin loci or the ethnic/geographic host groups, except for those clusters of like strains found at the terminal branches. The HVR portion of the plasmid was selected for constructing phylogenies because of its unusually high concentration of polymorphic sites and because the region was noncoding and hence presumably not subject to selection on gene products. The lack of a clear lineage with the mutacin loci was surprising, because the bacteriocin operons reside on plasmids for most of the lactic acid bacteria in the order Lactobacillales (43).
Interestingly, however, the association between the plasmid and the serotype e locus was well supported (Fig. 2). Most strains of S. mutans are serotype c. However, transition between serotypes e and c involves gene conversion, because the sequences flanking both sides of the serotype determinants are homologous (38). The source of the converted block of DNA must come from outside the host chromosome, since UA159 (serotype c) has no site with homology to the antigen-specific serotype e coding region (2, 38). The phylogenetic congruence of the serotype with the plasmid history suggests that the serotype conversion was plasmid mediated. A perfect correlation might not be expected, because the plasmid could be lost later, yielding a plasmid-free serotype e strain (46).
Using the IGSR to resolve phylogeny has a solid foundation for differentiating bacteria at the species level, particularly within the genus Streptococcus (12). Here, further resolution to the intraspecies level was possible for the plasmid-containing strains, but not for plasmid-free strains. The phylogeny predicted from the IGSR revealed a well-resolved "Asian clade" that included most of the strains from Japan and China (Fig. 3). Many of the strains from the Caucasian and African American racial groups also clustered together at the terminal branches, having similar or identical IGSR sequences. Mutacin II clustering was also consistent with tree topography, but like the racial clusters other than the Asian clade, the bootstrap values fell below 50%.
Other investigators, using different genetic markers, have attempted to examine S. mutans' population structure, with little or no resolution, even though multiple loci were analyzed (31, 46). One study attributed this lack of concordance among loci to the dispensable or non-core-genome nature of the loci studied (46). The implication is that these "dispensable" loci arose from horizontal transfer and, like the cryptic plasmid, have their histories in another organism. In a preliminary report, other investigators sequenced several of the housekeeping genes of S. mutans, using the multilocus sequence-typing approach (31). They were unable to derive a strong phylogenetic signal, however, due to the small variation within the alleles sequenced. They went on to speculate that S. mutans may have only recently associated with its human host, perhaps tied to the development of agriculture, and had little time to diverge. While the multilocus sequence-typing approach has uncovered population structures in several species of bacteria and fungi (26, 39), the method apparently could not resolve S. mutans' structure. In another effort to characterize the population structure of S. mutans, anchoring the tree based on mutacin production, Balakrishnan and coworkers (5) were able to separate S. mutans based on multiple phenotypic and genotypic characters but were not able to construct a consistent phylogeny due to the disparate signal.
In summary, this study examined host-parasite coevolution at four levels: gene, plasmid, S. mutans, and human host. Although the data were not sufficient to show parallel evolutionary histories at all four levels, the collective patterns of individual segments allowed the reconstruction, albeit incomplete, of several parallel phylogenies. The data were able to show fairly strong evidence for incongruence between the phylogenies of a cryptic plasmid and S. mutans. Intriguingly, this incongruence did not rule out coevolution, since there was a phylogenetic correspondence between plasmid evolution and serotype e, which is encoded on the bacterial chromosome. In contrast, the history of the mutacin loci appears to be independent of the plasmid, consistent with mutacin's chromosomal linkage. With the discovery of additional informative loci, S. mutans may serve as a useful model to study coevolution with its human host at a variety of levels.
The research was supported by NIDCR grants R01DE013937 and DE11147.
Published ahead of print on 3 November 2006. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»