Previous Article | Next Article ![]()
Journal of Bacteriology, December 2005, p. 8312-8321, Vol. 187, No. 24
0021-9193/05/$08.00+0 doi:10.1128/JB.187.24.8312-8321.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
New York Medical College, Department of Microbiology and Immunology, Valhalla, New York 10595,1 Virginia Commonwealth University, Department of Internal Medicine, Richmond, Virginia 23298,2 University of Bath, Department of Biology and Biochemistry, Bath, United Kingdom BA2 7AY,3 Imperial College London, Department of Infectious Disease Epidemiology, London, United Kingdom W2 1PG4
Received 20 July 2005/ Accepted 4 October 2005
|
|
|---|
|
|
|---|
It has been proposed that the inhibitory activity of agr groups may serve to isolate bacterial populations and facilitate the evolution of new strains or even species (31). This notion has been perpetuated by the observation that a given genetic background is usually represented by a given agr group; seldom is a given genetic background represented by multiple agr groups (56). Associations between agr group and other strain characteristics may include resistance to glycopeptides (agr's I and II) (41, 55), isolation from toxic shock syndrome and from community-acquired methicillin-resistant S. aureus disease (agr III) (23, 52), and isolation from staphylococcal scalded skin syndrome (agr IV) (21). These observations have led to the hypothesis that agr groups delineate fundamental subdivisions within the species (22, 30, 31, 56). However, a study of S. aureus population genetic structure based on multilocus sequence typing (MLST) hinted that the species may be fundamentally subdivided into two groups which each consist of multiple clonal complexes (11). The evolutionary relationships between the different clonal complexes represented by a given agr group have not been considered (56). Furthermore, epidemiological studies generally conclude that agr groups have no obvious influence on strain colonization and competition dynamics in humans (5, 24, 26, 46, 53), which questions the proposal that agr-mediated bacterial interference is an important means of isolating bacterial populations.
The mechanism by which agr groups diversify is also unknown. The P2 operon exhibits a hypervariable region that spans the 3' end of agrB, all of agrD, and the 5' end of agrC, flanked by conserved regions (23). The hypervariable region encodes the agr group specificity, and the separation between the hypervariable and conserved regions is abrupt. It has been noted that this genetic organization is consistent with both site-specific recombinational and hypermutational mechanisms of diversification (23, 30). Initial studies using partial agr sequences reported no evidence for recombination of agr within S. aureus (53) or between staphylococcal species (8). A recent study using partial agr sequences reported possible instances of horizontal genetic transfer of the hypervariable region of agr between strains with agr I and agr II (16).
Here, we analyze nucleotide sequence variation at 14 genes from 27 genetically diverse strains to obtain an improved clone phylogeny, upon which agr groups are mapped. We rigorously test hypotheses of the relationship between agr groups and clone phylogeny. We also obtain complete agr sequences from the diverse strains to study the mechanism of agr diversification.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Genetic characteristics of 27 diverse strains
|
Sequence alignments.
Sequences were aligned using CLUSTALW (51) with default parameters, followed by manual inspection. For sequences of unequal length, alignments were made on the translated amino acid sequences and back-translated to nucleotide sequences using MegAlign 5.00 (DNASTAR Inc.). Assignment of translational start and stop sites of different agr genes was based on the following annotations: agr's I, IV, and I/IV used the COL genome sequence (13), agr II used the N315 genome sequence (25, 41), and agr III used the MW2 genome sequence (1). AgrC of agr II is predicted to be a protein that is
59 amino acids shorter than the AgrC of other agr groups. Phylogenetic analyses conducted with an alternative translational start site for AgrC of agr II (23) and with the N315 annotation (25, 41) produced similar results.
Phylogenetic analyses. Insertion and deletion (indel) polymorphisms were excluded from all analyses. Optimal maximum likelihood models of nucleotide substitution, hereafter called optimal models, were determined using Modeltest 3.06 (36). Rate heterogeneity among sites was examined assuming a discrete gamma distribution with eight rate categories. Maximum likelihood (ML) and maximum parsimony (MP) phylogenetic trees were determined using PAUP*4.0b10 (48). For ML trees, we used the optimal model, a neighbor-joining starting tree, and tree-bisection-reconnection branch swapping unless nearest-neighbor interchange (NNI) is noted. For MP trees, we used parsimony-informative sites only, 20 replicates of random taxa addition, and tree-bisection-reconnection branch swapping. Nonparametric bootstrapping with 200 replicates was performed using a neighbor-joining analysis, with distances derived from the optimal model for the ML trees, or under parsimony with the procedures outlined above for MP trees. A detailed description of our use of the nonparametric Shimodaira-Hasegawa (SH) test of tree topologies (18, 45, 54) and sequence simulations (20, 38, 42), including their application to parametric bootstrap tests, is provided as supplementary text in the supplemental material.
Recombination analyses. Recombinant agr sequences were detected using RDP 2.0b08 (28). This program implements a variety of methods for detecting putative recombination events and breakpoints. The three methods that we utilized, Geneconv (43), MaxChi (47), and Chimaera (37), all make use of patterns of nucleotide substitution as a basis for detecting recombination. These methods are among the most powerful recombination detection methods available and are unlikely to infer a recombination event if one is not present (37). Settings for all methods were that sequences were linear, statistical significance was based at the P < 0.05 level, the Bonferroni correction for multiple comparisons was applied, consensus daughter sequences were found, breakpoints were polished, and only events detected by two out of the three methods were examined.
Nucleotide sequence accession numbers. The agr sequences reported in this paper have been deposited in GenBank with accession numbers DQ157957 to DQ157983.
|
|
|---|
The MLST data set resolved only one node with >70% bootstrap support in both ML and MP analyses (Fig. 1A). The SAS and COMBINED data sets resolved six and eight nodes, respectively, with >70% bootstrap support in both ML and MP analyses (Fig. 1B and C). Interestingly, the only node present on all three trees was the previously identified node that subdivided the species into two groups (Fig. 1A and C). This conserved node received an increased amount of bootstrap support on the COMBINED tree, 72% and 75% for ML and MP trees, respectively, beyond that of the trees from the individually analyzed data sets.
![]() View larger version (25K): [in a new window] |
FIG. 1. ML phylogenetic trees based on (A) MLST, (B) SAS, and (C) COMBINED data sets. Nonparametric bootstrap support for nodes with >70% support in both ML and MP analyses are shown above and below the branches, respectively. The agr group and, in parentheses, the number of isolates screened from that clonal complex are shown adjacent to the COMBINED tree. Arrows indicate the position of the conserved node that subdivides the species into two groups.
|
To assess whether the observed incongruences could be due to statistical error in the data sets, we simulated sequences given the same optimal models and ML trees with branch lengths as the real data sets and tallied the number of times the real ML trees were recovered from the simulated sequences. The results showed that the simulated MLST data sets recovered their underlying tree only 0 to 2% of the time, and their power did not improve as longer sequences were simulated (Fig. 2). In contrast, the simulated SAS and COMBINED data sets recovered their underlying tree 8 to 79% and 35 to 85% of the time, respectively, and their power improved considerably as longer sequences were simulated (Fig. 2). These results suggest that the MLST data set alone was too statistically noisy and inconsistent to resolve a reliable tree. Although the COMBINED data set only attained 35% power, it was a statistically consistent data set and of sufficient power to resolve the conserved node with confidence.
![]() View larger version (21K): [in a new window] |
FIG. 2. Results of sequence simulations to identify statistical error and consistency in the MLST (triangles), SAS (squares), and COMBINED (circles) data sets. Each point shows the probability of obtaining the correct tree given that the tree is true and represents 100 simulated data sets evaluated with ML analysis and NNI branch swapping.
|
Note that the conserved node on the COMBINED tree subdivided the species into two groups that each consisted of multiple agr groups and that all of the nodes with strong bootstrap support were of multiple agr groups (Fig. 1C). Moreover, tree topologies constrained to include three or five monophyletic agr groups were incongruent with the COMBINED data set (SH test, P < 0.05). The tree topology constrained to include two polyphyletic agr groups was identical to the COMBINED tree, as expected (SH test, P = 1.00). These results seriously question the hypothesis that the species is subdivided in a manner that corresponds to agr groups and suggest that, over the long term, agr group is a relatively unstable characteristic of a clonal complex.
Parametric bootstrap tests of the competing hypotheses relating agr groups to clone phylogeny were conducted by simulating sequences assuming that the hypotheses were true. The differences in log likelihood (
) of an unconstrained tree and trees constrained to include three or five monophyletic agr groups based on the real COMBINED data set were enormous compared to the
from COMBINED data sets simulated under each hypothesis (
= 706 [P < 0.01] and
= 602 [P < 0.01], respectively). These tests therefore soundly rejected the hypotheses that the species is composed of three or five monophyletic agr groups but failed to reject the hypothesis that the species is composed of two polyphyletic agr groups as expected (
= 0, P = 1.00).
Our explanation for these results is that S. aureus is not fundamentally subdivided in a manner that corresponds to agr groups. Individual clonal complexes are predominately of a single agr group, which accounts for the previous observations that link agr group and genetic background. However, clonal complexes are assembled into broader subspecies groups that do not reflect a simple clonal descent of agr group within the species.
Patterns of genetic variation at the agr locus. To study genetic variation at the agr locus, we obtained complete agr sequences from all 27 diverse strains. Using the translated sequences of the P2 operon, we assigned unique amino acid sequences as agr alleles to reflect potential functional differences. Unique combinations of agr alleles were found for 22 of the 27 diverse strains, indicating that the different clonal complexes generally encode agr's that differ at the amino acid level (Table 1). agrA alleles were shared among agr groups, whereas agrB and agrC alleles were unique to agr groups (Table 1). agrD alleles defined agr groups, with the noted exception.
The housekeeping genes that flank the agr locus encode a putative carbon-nitrogen hydrolase and fructokinase. We excluded the sequences from the P3 operon from phylogenetic analyses, because this region is predicted to fold into elaborate RNA secondary structures (2) that would complicate the analyses. We included the coding nucleotide sequences from the P2 operon for analyses. Characteristics of the FLANKING and CODING data sets are provided in Table S1 in the supplemental material.
The FLANKING data set resolved four nodes with >70% bootstrap support in both ML and MP analyses (Fig. 3A). The CODING data set resolved three of the four recognized agr groups with >70% bootstrap support in both ML and MP analyses, and it resolved a novel agr group that branched in between agr's I and IV, here called agr I/IV (Fig. 3B). agr I received 100% bootstrap support in the MP analysis but only 58% bootstrap support in the ML analysis. A node that included agr's I, IV, and I/IV was resolved with >70% bootstrap support in both ML and MP analyses. The FLANKING and CODING data sets were statistically incongruent with each other's trees and with the MLST, SAS, and COMBINED trees (SH test, P < 0.05). However, the conserved node that subdivides the species into two groups was apparent on the FLANKING tree even though it did not attain sufficient bootstrap support and it did not include ST55 (Fig. 3A).
![]() View larger version (27K): [in a new window] |
FIG. 3. ML phylogenetic trees based on (A) FLANKING and (B) CODING data sets. Nonparametric bootstrap support for nodes with >70% support in both ML and MP analyses are shown above and below the branches, respectively. The arrow in panel A indicates the position of the conserved node that subdivides the species into two groups.
|
![]() View larger version (14K): [in a new window] |
FIG. 4. ML phylogenetic trees based on the (A) CODINGI,IV,I/IV, (B) CODINGII, and (C) CODINGIII data sets. Nonparametric bootstrap support for nodes with >70% support in both ML and MP analyses are shown above and below the branches, respectively. Arrows indicate the position of the conserved node that subdivides the species into two groups.
|
Detection of recombination. We sought additional evidence for recombination at the agr locus by examining polymorphisms in the agr sequences. To reduce the risk that clustering of polymorphic sites might be due to selective constraints on this functionally interacting locus, we conducted recombination analyses on the CODING data sets of the individual agr groups using only third-codon positions. Even though this approach to detecting recombination is conservative, numerous putative recombination events were detected with high levels of statistical significance (P << 0.05). Example recombinant sequences from each agr group are presented in Fig. 5. Evidence for recombination was detected by all three recombination tests in the examples presented in Fig. 5A to C and by two of three recombination tests in the example presented in Fig. 5D.
![]() View larger version (52K): [in a new window] |
FIG. 5. Alignment of polymorphic nucleotide sites from third codon positions from the CODING data sets of separate agr groups: (A) 146 sites from the CODINGI,IV,I/IV data set; (B) 79 sites from the CODINGI,IV,I/IV data set; (C) 64 sites from the CODINGII data set; and (D) 44 sites from the CODINGIII data set. Dots indicate nucleotide identity with the top sequence. Underlining indicates putative recombination breakpoints identified from the MaxChi tests. A map of the agr locus is shown below each alignment, with vertical bars marking the bounds of the following genes (left to right): agrB, agrD, agrC, and agrA.
|
The sequence characteristics of agr I/IV deserve special consideration. Our finding of clustered polymorphisms from third codon positions within agr I/IV that matched agr's I and IV in different regions of the locus (Fig. 5A) strongly supports a hypothesis of its recombinant origin. In contrast, if agr I/IV were an intermediate between agr's I and IV, the polymorphisms from third codon positions should not be clustered. Inspection of the entire agr I/IV coding sequence (see Fig. S1 in the supplemental material) reveals that it has characteristics of agr I from the 5' end of agrB through agrD, characteristics of agr IV from the middle of agrC through agrA, and unique characteristics in portions of agrB and in the 5' end of agrC.
|
|
|---|
Recombination is a well-known phenomenon that influences the evolution of strains and genes (10). With recombination, the bacterial chromosome is essentially subdivided into mobile linkage blocks that can reflect different evolutionary histories. Since the inference of strain relatedness can differ depending on which linkage block is investigated (40), it is important to be able to detect recombination. Fortunately, powerful statistical tools are available for detecting recombination (37). The identification of clustered polymorphisms from third codon positions within each agr group provided the strongest evidence for recombination at the agr locus.
Recognition that S. aureus has a novel agr group with characteristics of both agr's I and IV might help to reconcile the conflict as to whether agr IV induces agr I activity (21, 27) or inhibits agr I activity (17, 29). The activity of a true agr I may be different from that of agr I/IV. The genetic variation characterized here makes it possible to distinguish between these related alleles. We are aware of two other reports of a novel agr. Takeuchi et al. (50) isolated two strains with an agr similar to agr I across the 3' end of the locus but similar to agr's II and III across the 5' end of the locus (14, 15). Goerke et al. (16) isolated one strain with an agr similar to agr I across agrD but similar to agr IV elsewhere in the locus. We compared single sequences of each of our agr groups with the novel agr Ic sequence reported by Goerke et al. (16) and found agr I/IV to be most similar to agr Ic, differing at
28 nucleotide sites across three regions of clustered polymorphisms (data not shown). Thus, we believe that agr Ic is a recombinant variant of agr I/IV rather than an evolutionary intermediate. We note that recombination at the agr locus followed by selection for particular variants provides a simple mechanism by which novel agr alleles can originate. Since agr dysfunction might be a selectable trait under certain conditions (41, 44), variant agr alleles in various stages of divergence would not necessarily have to be functional to be maintained in the population.
Ancestral polymorphism is seldom discussed within the context of a single species (19) or with respect to bacteria, but this phenomenon could be a cause of phylogenetic incongruence in studies aimed at characterizing intraspecific bacterial variation. Lineage sorting is the elimination of ancestral polymorphisms from a species. When speciation begins in sexual eukaryotes, the sister species will share ancestral polymorphisms. Genetic drift will stochastically shift the frequencies of these shared ancestral polymorphisms until each sister species becomes fixed for a given allele. Once genetic drift has led to monophyletic alleles at each locus, lineage sorting is complete; until this process is finished, lineage sorting is incomplete. With incomplete lineage sorting, incongruence between the species tree and gene trees will arise because different alleles will reach monophyly at different times (34, 49, 57). Bacterial populations cannot exist in a true state of incomplete lineage sorting because a diverging subspecies group of bacteria will have a single ancestral clone rather than a population of ancestors. However, recombination of ancestral polymorphisms into a diverging subspecies group of bacteria may mimic the phylogenetic pattern produced by incomplete lineage sorting. No standard statistical approaches are available for testing the hypothesis of ancestral polymorphism. Rather, efforts are made to rule out the other causes of phylogenetic incongruence.
New model for the evolution of agr. The hypothesis of Novick and colleagues proposes that the divergence of agr groups in S. aureus preceded the development of the nucleotide polymorphisms currently used for strain typing and, therefore, that the species is phylogenetically structured according to agr groups (56). Their hypothesis is based on the observation that no multilocus sequence type (or other means of strain typing) generally occurs in more than one agr group (56). We have confirmed that groups of closely related multilocus sequence types (or clonal complexes) tend to be of a given agr group. However, we have also shown that clonal complexes themselves belong to at least two subspecies groups and that agr's I to III occur in both of these subspecies groups. Thus, because of recombination, agr's I to III are not monophyletic, and the hypothesis that the species is phylogenetically structured according to agr groups is refuted. We propose a new model for the evolution of agr that takes the evolutionary relatedness of the clonal complexes and their agr's into account.
T1 is the speciation event that led to the origin of S. aureus (Fig. 6). It is not clear what agr groups existed at T1 or their relative order of appearance. Related staphylococcal species have different agr groups from those found in S. aureus (8), and there is some evidence for cross-species activity of agr (33). Visual inspection of trees generated from partial nucleotide sequences of agrB, agrD, and agrC from many staphylococcal species (8) shows that agr's I and III generally cluster together, whereas agr II is basal and sometimes clusters with the agr's of other species. Thus, between T1 and T2 may be another node that represents a common ancestor for agr's I and III.
![]() View larger version (17K): [in a new window] |
FIG. 6. Model for the evolution of agr. The clone phylogeny (or species tree) is shown as a bold outline. The agr phylogenies are shown as thin lines within the clone phylogeny. Different agr groups are shown as shaded nodes. Major agr recombinations are shown with arrows. Historical events T1 through T4 are shown alongside the clone phylogeny and are described in the text.
|
An alternative hypothesis is that the divergence of the two subspecies groups preceded the divergence of the agr groups. We believe that under such a hypothesis the branch lengths of the individual agr trees would be a function of the relative order that the agr groups appeared and the time required for transfer of agr to the other subspecies group. Thus, under such a hypothesis the branch lengths of the individual agr trees could be of a variety of different lengths and would not be expected to reflect the branch lengths of the clone phylogeny. We note that recombinations involving the agr's of ST45 and ST93 may be more recent, and their longer branches (Fig. 4) may reflect a longer period of evolution within subspecies group 1. There is no requirement for our model to make assumptions about which subspecies group or agr group is ancestral, but we favor the notion that subspecies group 1 and agr I may be ancestral.
T3 is the divergence of agr's I and IV (Fig. 6). It is not proven that the divergence of agr's I and IV followed after the divergence of the two subspecies groups. That is, the event shown at T3 may have occurred before T2. However, the facts that agr IV is largely found within subspecies group 1 (a single isolate to the contrary was reported by Peacock et al. [35]) and the similarity between agr's I and IV is much greater than their similarity to agr's II and III supports the arrangement shown.
T4 is the recombination event between agr's I and IV, resulting in agr I/IV (Fig. 6). Since unique polymorphisms have already developed within agr I/IV and since this agr group is found in multiple clonal complexes, this recombination event is probably not very recent.
Ironically, S. aureus populations may yet become structured according to agr group even though it is not currently structured in this manner. The probability that a species tree and its gene trees will be congruent is related to T/2Ne, where T is the internode divergence time and Ne is the effective population size (34). In our model, T would be the time interval between T1 and T2. Longer internode times and smaller populations favor lineage sorting, whereas shorter internode times and larger populations favor incomplete lineage sorting (34). Eventually, genetic drift and selection may cause different agr groups to become fixed within the two subspecies groups in the same manner that different agr groups have become fixed within different staphylococcal species.
Concluding remarks.
S. aureus is a species that currently has a relatively clonal population structure, in which variation at housekeeping genes is estimated to occur
15 times more often by point mutation than by recombination (11). Therefore, individual clonal complexes as well as broader subspecies groups are expected to have the same agr group due to simple clonal descent. However, we have presented a case here that recombination has been involved in distributing agr groups across the species. The variation of agr at the amino acid level may provide for a variation in agr activity beyond that of the consensus activities of four interference groups. Since agr influences the expression of many virulence genes (30, 31), small phenotypic differences encoded by different agr alleles might be selectable into larger evolutionary differences. It has been reported that the regulatory effects of agr can differ among strains (3). Thus, it is reasonable to hypothesize that recombination of agr between clonal complexes could occasionally result in novel, advantageous patterns of virulence gene expression. It may be that host-pathogen interactions affected by agr-mediated virulence gene expression are more important in the evolution of S. aureus than pathogen-pathogen interactions affected by agr-mediated bacterial interference.
This work was supported by the Wellcome Trust. E. Feil is funded by an MRC Career Development Award. M. C. Enright is a Royal Society University Research Fellow.
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»