Previous Article | Next Article ![]()
Journal of Bacteriology, January 2004, p. 110-121, Vol. 186, No. 1
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.1.110-121.2004
and Debra E. Bessen*
Department of Ecology & Evolutionary Biology, Yale University, New Haven, Connecticut
Received 24 June 2003/ Accepted 29 September 2003
|
|
|---|
4% of the codons underwent strong diversifying selection. Horizontal acquisition of one ska lineage from a commensal Streptococcus donor species was also evident. Together, the data suggest that new phenotypes can be acquired through interspecies recombination between orthologous genes, while constrained functions can be preserved; in this way, orthologous genes may provide a rich and ready source for new phenotypes and thereby play a facilitating role in the emergence of new niche adaptations in bacteria. |
|
|---|
Both population and experimental studies have been used to better understand the molecular basis for tissue-specific adaptations among GAS. Organisms exhibiting high fitness for just one of the tissue sites have an increased frequency of tissue-specific adaptive alleles in their gene pool relative to the frequency in the other subpopulations. The emm pattern is a genetic marker that distinguishes many throat- and skin-tropic strains of GAS (5, 7, 15); the emm pattern is defined by the chromosomal arrangement of emm subfamily genes. emm pattern A-C strains are usually recovered from the URT, whereas emm pattern D isolates are mostly found in association with impetigo. As a group, the emm pattern E strains display no clear-cut preference for tissue site of infection. Despite niche separation, there is an ample flow of neutral housekeeping genes between emm pattern groups (27), and there are high rates of genetic recombination within the species as a whole (18). In instances where neutral housekeeping alleles are randomly distributed with respect to ecologically distinct populations (27), genetic variation that is strongly associated with the different populations may be directly responsible for adaptation to an ecological niche, and thus, emm gene products (or closely linked genes) may have a direct role in tissue tropism.
The emm genes encode M surface proteins, which display extensive heterogeneity in terms of structure and function (14, 20). More than 150 distinct emm types are recognized, where an emm type is based on nucleotide sequence differences near the 5' end of the emm gene (17). Plasminogen (Plg)-binding group A streptococcal M protein (PAM) is encoded by an emm gene that is uniquely associated with emm pattern D strains (40). Many, but not all, emm pattern D isolates contain PAM, and a high-affinity Plg-binding site is localized to the central portion of the M protein surface fibril. By using an experimental model for impetigo that measures net bacterial reproductive growth at a superficial skin site, a role for PAM in impetigo has been demonstrated (41). When considered together, the experimental, epidemiological, and population genetics findings provide strong evidence that PAM contributes to the establishment of tissue tropism for the skin.
Host Plg presented in a PAM-bound form interacts with streptokinase, a GAS-secreted Plg activator, yielding bacterium-bound plasmin activity; plasmin is a broad-spectrum proteinase involved in blood clot dissolution and cellular migration. Insertional inactivation of the gene encoding streptokinase (ska) also leads to attenuated infection in the experimental model for GAS impetigo (41). It is postulated that during impetigo lesion formation, the combined action of streptokinase and PAM-bound Plg leads to fibrinolysis, which retards scabbing and prevents the lesion from drying out. This, in turn, expands the window of opportunity for GAS reproduction and transmission to new hosts.
In this report, the evolution of streptococcal virulence genes involved in tissue-specific adaptations is examined in depth. The nucleotide sequences of ska genes derived from GAS isolates characterized for the presence of PAM were determined, and phylogenetic analysis was performed.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. GAS isolates studied
|
Statistical and computational analyses. (i) Phylogenetic trees. All trees were constructed by the neighbor-joining method by using MEGA, version 2.1; the Kimura two-parameter distance measure was used for nucleotide sequences, and the Poisson-corrected distance measure was used for amino acid sequences. The maximum-likelihood method was employed for trees analyzed by PAML (phylogenetic analysis by maximum likelihood) (see below) by using PAUP, version 4.0 beta 10. The desired evolutionary model of DNA substitution and the parameters were optimized by using hierarchical likelihood ratio tests (24) with MODELTEST, version 3.0 (36).
(ii) Gene conversion. Geneconv, version 1.81, was used to detect gene conversion events among full-length ska alleles; the default settings were used (37). Bonferroni-corrected Karlin-Altschul P values that were less than 0.05 are reported below for global fragments. The analysis included full-length ska sequences (n = 13), as defined in Fig. 1; strain 89-465 was not included because of the presence of an indel, and strain D998 was not included because it was nearly identical to D633.
![]() View larger version (12K): [in a new window] |
FIG. 1. Phylogenetic tree for full-length ska alleles. The relationships of 1,320-bp nucleotide sequences of the ska gene derived from 15 strains of GAS are indicated by an unrooted radial tree constructed by the maximum-likelihood method, in which the rate matrix was optimized to a submodel of GTR+G+I (K81uf). Bootstrap values of 90% (1,000 replicates) are indicated at the nodes. Taxon designations indicate GAS strains that are listed in Table 1, except for MGAS8232 (SPyM18-2042; GenBank accession no. AE009940). The ska genes derived from strains MGAS315 (SPyM3-1698; GenBank accession no. AE014074) and SF370 (SPy1979; GenBank accession no. AE004092) exhibit 100% nucleotide identity with the genes of strains 88-019 (emm type 3) and 86-779 (emm type 1), respectively. Bar = 0.05 substitution per site. The tree topology was very similar when the neighbor-joining method was used. The ska lineage corresponding to PAM-positive strains is indicated by the dotted circle. The GenBank accession numbers for 14 new ska sequences are AY234128 to AY234141.
|
(iv) PAML.
A maximum-likelihood approach was used to examine selection pressures acting on ska. The ratios of nonsynonymous nucleotide substitutions (dN) to synonymous nucleotide substitutions (dS) (
ratios) were determined codon by codon by using several models of codon substitution that differ in how the
ratios are allowed to vary along the sequence. Six models of codon substitution were used (see below). All models were implemented with the codeml program of the PAML package (version 3.13) (11, 42, 44, 45, 49, 50). Nested models were compared by using the likelihood ratio test; in this test twice the difference in log likelihood (ln L) between two models was compared to the value obtained under a
2 distribution, and the degrees of freedom was equal to the difference in the number of parameters used in each model. Positive selection could be inferred when a group of codons having a
ratio of more than 1 was identified and the likelihood of the codon substitution model in question was significantly higher (P < 0.01) than the likelihood of a nested model which did not take positive selection into account. Bayesian methods implemented (automatically) in PAML identify any codons under positive Darwinian selection.
The M0 model assumes that all codons are subject to the same selection pressure, so that a single
ratio value is estimated. Model M1 divides codons into two categories; one category represents the codons that are invariant (p0), with
0 fixed at 0, and the other represents codons that are neutral (p1), with
1 set to 1. The M2 model accounts for positive selection by addition of a third category of codons (p2) with
2, which can take on any value (including 1) estimated from the data; however, this model cannot simultaneously account for sites with 0 <
ratio < 1 and sites with an
ratio of >1. The M3 model estimates
ratios for three codon site classes and provides a more sensitive test for positive selection, such that all
ratios are estimated from the data and all values may be greater than 1. The M7 model uses a discrete ß distribution, whose shape varies depending on the parameters p and q, to model
ratios of codons; in the M7 model, no class of codons can have an
ratio of >1. Model M8 also uses a ß distribution, but an extra class of codons is incorporated, in which the
ratio can be more than 1. A likelihood ratio test of a comparison of the M7 and M8 models is much less affected by the presence of recombination than tests for the other comparisons (1).
(v) Tests for independence. Tests for independence, used to establish nonrandom relationships (linkage disequilibrium), were performed with Fisher's exact test (DnaSP, version 3.52).
|
|
|---|
Based on extensive structural and functional studies, streptokinase is recognized as having three principal domains,
, ß, and
(46). The lengths of the three domains are approximately equal (146, 144, and 123 amino acid residues, respectively) (Fig. 2). The ß-domain of streptokinase displayed the highest level of predicted amino acid sequence divergence when the 15 ska genes (Fig. 1) were compared. During plasmin formation, the ß-domain of at least one form of streptokinase has direct molecular contact with Plg and docks Plg as an initial step in the formation of the streptokinase-plasmin(ogen) activation complex in the fluid phase (10, 31).
![]() View larger version (9K): [in a new window] |
FIG. 2. Domain structure of streptokinase. The sequence positions of the three principal domains of streptokinase ( , ß, and ) are illustrated (46). The maximal nucleotide (nt) sequence divergence and the maximal amino acid (aa) sequence divergence between ska alleles shown in Fig. 1 are indicated for each of the three major streptokinase domains. Since strain 89-465 has a deletion within the -domain, it was not included in the -domain analysis.
|
The phylogeny of the portion of the ska locus encoding the ß-domain was examined for GAS strains representing a broad spectrum of genetic diversity (Fig. 3). For 90 GAS isolates, representing 78 emm types, 64 distinct (partial) ska alleles encoding the ß-domain were identified (Table 1). Two major sequence clusters that had strong bootstrap support were clearly evident (clusters 1 and 2) (Fig. 3). Both major clusters, clusters 1 and 2, contained several smaller subclusters of alleles having strong bootstrap support.
![]() View larger version (22K): [in a new window] |
FIG. 3. Phylogenetic tree for the ß-domain-encoding region of ska. The relationships of the nucleotide sequences of a 423-bp portion of ska encoding amino acid residues 173 through 316 of the streptokinase protein (the first residue of the leader peptide is designated residue 1) (Fig. 2), derived from 90 strains of GAS, are indicated by a neighbor-joining tree. For visual clarity, the tree is midpoint rooted. Bootstrap values of 90% (500 replicates) are indicated at the nodes. The designations indicate the ska alleles, which are listed in Table 1. Bar = 0.05 substitution per site. The GenBank accession numbers for 64 new partial ska sequences are AY234261 to AY234324.
|
The relationship between the ß-domain-encoding region of ska and the emm pattern marker for tissue site preference was examined. Each of the three emm pattern groups (pattern A-C [throat preference], pattern D [skin preference], and pattern E [no preference]) was represented by numerous strains having cluster 1 alleles (Tables 1 and 2). Strikingly, all nine of the emm pattern D isolates having a cluster 1 ska allele lacked PAM.
|
View this table: [in a new window] |
TABLE 2. Relationship between ß-domain form of ska, emm pattern group, and PAM site
|
In summary, nearly all PAM-positive emm pattern D strains (18 of 19 strains; 95%) had a subcluster 2b ska allele (Table 2). The vast majority of emm pattern D strains lacking a PAM site had a cluster 1 ska allele (9 of 11 strains; 82%). emm pattern D strains harboring ska cluster 1 genes also tended to be strains belonging to rarely recovered emm types (www.cdc.gov/ncidod/biotech/strep/strepindex.html). The association between subcluster 2b ska alleles and emm pattern D strains with a PAM site was highly significant (P = 0.00004, as determined by Fisher's two-tailed exact test), which was indicative of a strong linkage disequilibrium. None of the strains harboring a subcluster 2b ska allele was known to be recovered from the URT (Table 1).
Epistasis and linkage of subcluster 2b ska and pam. The finding that there is a strong linkage disequilibrium between the subcluster 2b form of the streptokinase ß-domain and the presence of PAM strongly suggests that the corresponding genotypes are coinherited. Coinheritance could arise by clonal descent within a population exhibiting low rates of recombination and/or through tight physical linkage (i.e., close proximity on the genome) between the ska and emm (pam) loci. Alternatively, coinheritance could be maintained by epistasis, driven by phenotypic interactions between streptokinase and PAM that give rise to an essential adaptive function. Epistasis can occur against a background of high levels of genetic recombination.
Statistical tests were used to estimate the level of recombination within the GAS population by examining neutral loci. The genetic background of each of the 90 GAS isolates (Table 1) was defined for allelic profiles (sequence types [ST]) based on seven housekeeping loci (16), which yielded 87 unique emm type-ST combinations (data not shown). Previous studies have shown that the associations between housekeeping loci of GAS are random, based on a maximum-likelihood method for measuring the extent of congruency between housekeeping gene tree topologies (18, 27). As observed in previous studies performed with slightly different sets of GAS strains (18, 27) and a linkage distance cutoff of 0.55, no significant congruence between gene trees was observed for this particular set of GAS isolates, and the differences in the likelihoods of the trees fell within the 99th percentile of the random distribution of random tree topologies for all 42 possible pairwise comparisons of housekeeping genes (data not shown). Therefore, when deep phylogenetic relationships were considered, the rates of recombination among housekeeping loci are relatively high for this particular set of GAS isolates. There is no evidence that throat- and skin-tropic strains of GAS comprise distinct evolutionary lineages (27).
Despite the strong linkage disquilibrium observed between subcluster 2b ska forms and PAM, several individual ska alleles (n = 12), as defined by the ß-domain-encoding region, show a history of horizontal movement between GAS strains having distantly related STs (linkage distance, >0.6) (Fig. 4). In addition, for one clone, as defined by seven of seven identical housekeeping alleles, there were isolates that had highly divergent ska alleles (ska44 and ska54); this finding is also indicative of horizontal movement of ska between different GAS strains. Of the 18 strains having a PAM site, a subcluster 2b ska allele, and unique emm type-ST combinations, 12 differed from all other isolates by a linkage distance of >0.6 (Fig. 4). Although some of the isolates having both PAM and a subcluster 2b ska allele are close genetic relatives, the majority of the strains are genetically distant in terms of their neutral housekeeping genes.
![]() View larger version (26K): [in a new window] |
FIG. 4. Unweighted pair group method with arithmetic averages dendrogram based on housekeeping loci. A matrix of pairwise differences in allelic profiles between strains was constructed based on the proportion of housekeeping loci having shared alleles (16). The relationships between housekeeping gene allelic profiles at seven loci are shown for 78 GAS strains having unique emm type-ST combinations. For the 90 GAS isolates listed in Table 1 having 87 unique emm type-ST combinations, clonal complexes having the same emm type are reduced to one representative strain; clonal complexes are defined as groups of clones that share five or more of the seven housekeeping alleles. The branch labels indicate the ska allele corresponding to the ß-domain-encoding region (Fig. 3; Table 1) for each GAS strain. The various symbols indicate sets of identical ska alleles that are distributed among GAS strains that differ at three or more housekeeping loci. The arrows indicate branch tips representing isolates having both a PAM site and subcluster 2b ska allele (n = 18). Isolates having identical emm types and STs also tend to have identical or nearly identical ska alleles (Table 1), as follows: emm1-ST28, ska66; emm5-ST99, ska68; emm6-ST37, ska25; and emm44/61-ST31, ska59 and ska77. Isolates that have the same emm type and differ at only one or two housekeeping alleles (clonal complexes) also tend to have identical ska alleles (emm3-ska22, emm19-ska65, emm1-ska66). The multilocus sequence typing raw data were published previously for all isolates except the nine strains whose designations begin with SS (16).
|
-domain-ß-domain junction were most likely to involve alleles corresponding to the PAM-positive cluster and strain MGAS8232, whereas crossover sites spanning the ß-domain-
-domain junction involved alleles of numerous distant taxa (data not shown). The gene conversion findings for ska are consistent with the findings for housekeeping genes, indicating that GAS display high levels of genetic recombination.
The complete genome sequences of several GAS strains, containing either a cluster 1 or subcluster 2a ska allele, show that the distance between emm and ska ranges from
33 to 38 kb (3, 19, 33, 39; www.sanger.ac.uk). By using a PCR-based mapping approach, the genomic content and distance between the emm and ska loci in a PAM-positive, subcluster 2b ska-positive, emm pattern D strain (Alab49) were found to be very similar to those of the GAS strains whose complete genome sequences are known (data not shown).
Although the genes encoding PAM and streptokinase are not too far apart on the genome, the combined findings for random associations between housekeeping genes, intragenic recombination between ska genes of different strains, the horizontal movement of ska alleles to distant strain backgrounds, and high sequence diversity among PAM from different strains argue strongly against coinheritance due to physical proximity. In summary, the ß-domain-encoding region of subcluster 2b ska maintains strong linkage disequilibrium with PAM-positive emm pattern group D. Combined with experimental evidence that streptokinase and PAM play key roles in impetigo (41), the findings suggest that the linkage between PAM and the subcluster 2b form of the streptokinase ß-domain arises from strong coselective pressures due to epistasis.
Positive selection within the streptokinase ß-domain.
The relative proportion of dN and dS, leading to a change and no change in amino acid residues, respectively, can provide insight into the role of natural selection in the evolution of a gene. It is widely assumed that
ratios of more than 1 signify diversifying (positive) selection. The average
ratio for full-length ska genes (Fig. 1) is 0.449, suggesting that purifying (negative) selection has been a major force in ska gene evolution when all codons are considered together. However, this ratio does not consider individual codons, and it was of interest to ascertain whether specific codons of ska were under diversifying selection. By using a statistical approach,
ratios were determined codon by codon. Maximum-likelihood analysis of the selection pressures acting on ska by using the tree topology of Fig. 1 and allowing for heterogeneous
ratios among sites provided evidence that there has been diversifying selection within streptokinase (Table 3).
|
View this table: [in a new window] |
TABLE 3. Parameter estimates for maximum-likelihood analysis of selection pressures acting on streptokinase
|
0 = 0.045), 31.4% are under very weak diversifying selection (
1 = 1.049), and 4.0% are under strong diversifying selection (
2 = 5.538) (Table 3). All models that allow for positively selected sites (M2, M3, and M8) indicated that there are such sites, and
4% of the codons are under strong positive selection (
ratio, >5).
Since ska genes were found to undergo intragenic recombination and tests for positive selection by the maximum-likelihood method assumed a phylogenetic tree, the
ratio was also estimated from a star phylogeny (45). For the full-length ska genes of the strains shown in Fig. 1 but with a tree in which all sequences diverge from a single node, there was still evidence of significant positive selection. For the M3 model with the star phylogeny, 10.1% of amino acid sites were under strong diversifying selection (
2 = 5.134), which was less conservative than the values estimated with the maximum-likelihood tree (Table 3). Therefore, intragenic recombination between distantly related ska genes does not appear to weaken the findings for positively selected codons.
The Bayes approach can be used to identify specific amino acid sites likely to be under positive selection. For the M3 model (
ratio, >1), 46 codons exceed the 99% posterior probability threshold (Table 3). For the M2 and M8 models, 16 and 19 codons, respectively, exceeded this threshold. All of the positively selected codons identified by the M2 model were a subset of the codons identified by both the M3 and M8 models; all codons identified by the M8 model were a subset of the M3 model codons. M3 is more sensitive than the other models and detected more codons under positive selection, because it incorporates more codon site classes (50).
Of the 46 positively selected codons suggested by the M3 model (Table 3), 35 (76%) are in the ß-domain-encoding region and comprise 24% of the total ß-domain residues. For the M2 and M8 models, 75 and 79% of the positively selected sites, respectively, lie within the ß-domain-encoding region. Thus, diversifying selection appears to have played a major role in the evolution of the streptokinase ß-domain. The strong purifying selection observed within the
- and
-domains may be the consequence of functional constraints.
Lineage-specific, fixed amino acid differences in the ß-domain. Of the codons identified to be under diversifying selection based on the 15 full-length ska alleles (Table 3), the ß-domain-encoding regions of 64 partial ska alleles (Fig. 3) were assessed for fixed amino acid differences between any two of the three major sequence (sub)clusters. At 11 amino acid sites, all subcluster 2a and 2b predicted products were identical to each other, but they differed from all cluster 1 ska products (residues 174, 183, 191, 195, 197, 199, 208, 226, 228, 231, and 234). Cluster 1 and subcluster 2a forms also displayed a fixed amino acid difference at residue 243.
At three codon sites (residues 279, 280, and 282), all subcluster 2b products have a different amino acid sequence than all subcluster 2a products. Site 282 contains a Lys in subcluster 2a streptokinase forms that has been shown by site-specific mutagenesis to be important for Plg activation in the fluid phase (10).
In summary, at least some of the amino acid residues that evolved under diversifying selection (Table 3) also appear to have contributed to the lineage-specific differences observed for the ß-domain-encoding region of ska (Fig. 3).
Interspecies spread of ska-related alleles. Human isolates of GCS and GGS, which are classified as Streptococcus dysgalactiae subsp. equisimilis, are the closest known genetic relatives of GAS. GCS and GGS are considered to be more commensal-like than GAS, primarily inhabiting the URT, although GCS and GGS can be recovered in association with disease. Since GAS and S. dysgalactiae subsp. equisimilis show evidence for recent horizontal exchange of housekeeping alleles (26), it was of interest to assess the relationships between the major phylogenetic lineages of GAS ska genes and orthologous streptokinase genes derived from GCS and GGS (designated skcg).
The nucleotide sequences the ß-domain-encoding region of the skcg genes of 34 human isolates of GCS and GGS, representing 34 distinct STs as defined by housekeeping alleles (26), were determined. The results obtained are shown in a neighbor-joining tree in Fig. 5 and include results for several cluster 1 and subcluster 2a and 2b ska alleles (Fig. 3) for comparison. Among the 34 GCS and GGS isolates, 19 distinct skcg alleles corresponding to the ß-domain-encoding region were identified. The levels of nucleotide sequence identity among the 19 skcg alleles ranged from 94.8 to 99.8%, indicating that there was a relatively high degree of homogeneity. This finding is in marked contrast to the data for the ska-encoded ß-domains of GAS, in which the maximal nucleotide sequence divergence exceeds 40% (divergence between a cluster 1 allele and a subcluster 2b allele) (data not shown).
![]() View larger version (22K): [in a new window] |
FIG. 5. Phylogenetic tree based on the ß-domain-encoding regions of skcg and ska. The relationships of the nucleotide sequences of a 423-bp portion of skcg encoding the ß-domain, derived from 34 strains of GCS and GGS, are indicated by a neighbor-joining tree that was obtained by using the Kimura two-parameter distance measure. Bootstrap values of 90% (1,000 replicates) are indicated at the nodes. Also included in the analysis were 21 ska alleles from cluster 1 and subclusters 2a and 2b (Fig. 3). Subcluster 2a ska alleles are indicated by boldface type. The designations indicate skcg and ska alleles. Bar = 0.05 substitution per site. The GenBank accession numbers for 19 new partial skcg sequences are AY234242 to AY234260.
|
The data strongly support the idea that streptokinase alleles underwent interspecies transfer and that most subcluster 2a ska alleles and skcg alleles have a relatively recent common ancestor.
|
|
|---|
Linkage disequilibrium can be maintained within recombining populations of bacteria through host immune selection (21, 22). Two antigenic epitope regions within the outer membrane protein, PorA, of Neisseria meningitidis provide an example of how a strongly cross-protective immune response can lead to the emergence of nonoverlapping combinations of antigenic variants. Like GAS, N. meningitidis is highly prevalent and usually found in association with asymptomatic carriage and displays high levels of genetic recombination, as shown by HK loci (18). However, a host protective response to just one of the two PorA epitope regions leads to loss of antigenic variants associated with a strain, and over time the bacterial population can acquire a discrete nonoverlapping structure. However, unlike the outer membrane protein PorA, streptokinase is secreted and diffusible, and thus, host immunity to streptokinase may be far less effective in leading to loss of the entire bacterial cell. On the basis of these findings along with epidemiological and experimental findings (40, 41), we favor the idea that the linkage disequilibrium observed between streptokinase (subcluster 2b) and PAM results from a direct biological interaction.
It is important to emphasize that while emm pattern D strains are associated significantly more often with impetigo than with pharyngitis (5, 7, 15), the link between emm pattern D strains and the skin is not absolute. This is probably because all (or most) GAS strains can persist in both the throat and the skin to at least some small degree; this is particularly true for the URT, where colonization or secondary infection following impetigo is not uncommon (8). Also, neither PAM nor subcluster 2b ska is essential for streptococcal impetigo, because many emm pattern E strains are frequently recovered from impetigo lesions (5). Therefore, pattern E strains, which uniformly lack a high-affinity Plg-binding protein (40), appear to use an entirely different molecular strategy for causing this disease. Presumably, nonbullous impetigo caused by Staphylococcus aureus involves a different molecular strategy as well. Thus, coselection of PAM and subcluster 2b ska is the result of a strong adaptive advantage for GAS reproduction and transmission at the skin, even though bacterial adaptation to the skin can occur by an alternate (although undefined) route.
The evolutionary history of the ska lineages within GAS is the result of a series of genetic events, the order of which is not entirely certain. All 34 GCS and GGS isolates have skcg alleles that are highly homologous to subcluster 2a ska alleles, which is indicative of a recent common ancestor. Furthermore, the 34 GCS and GGS isolates do not appear to have undergone a recent bottleneck, since they are highly variable in terms of the complement of housekeeping alleles (26). Therefore, the most plausible model is that an skcg allele from a GCS or GGS donor strain underwent lateral transfer to a GAS recipient strain, yielding a subcluster 2a ska allele. Thus, the ancestral form of ska within GAS most likely evolved into either the cluster 1 or subcluster 2b ska lineage. Given the high level of sequence divergence, it seems likely that either cluster 1 or subcluster 2b ska, whichever is not the ancestral form, was also acquired by an interspecies transfer event rather than having been derived from the other form. Since the subcluster 2b ska lineage allele is somewhat homologous to skcg, it may have been acquired earlier by GAS from a GCS or GGS donor strain as an ancestral skcg allele and may have subsequently evolved along a separate path within GAS. Alternatively, subcluster 2b ska may have been acquired by GAS from another closely related (but unidentified) streptococcal species.
It is plausible that PAM and subcluster 2b ska on occasion may have been packaged together and mobilized between GAS via bacteriophage-mediated generalized transduction. However, since the Plg-binding region of PAM displays extensive sequence diversity, which could have arisen only after an extended period of evolution, it is unlikely that cotransfer of PAM and subcluster 2b ska occurred to any significant extent in recent history. The intergenic region between the emm and ska loci of a PAM- and subcluster 2b ska-positive isolate was very similar in terms of both distance and gene content to the intergenic regions of GAS strains containing either cluster 1 ska or subcluster 2a ska but was markedly different from the emm-skcg region of GCS (20a). Thus, importation of both PAM and subcluster 2b ska in a single step from another bacterial species donor is also unlikely. Combined with evidence for intragenic recombination within ska, the data best support the idea that epistasis had an important role in the observed linkage disequilibrium between PAM and subcluster 2b ska.
The data for skcg alleles from GCS and GGS strongly support the idea that one or more of the three major ska lineages present in contemporary isolates of GAS originated in another bacterial species and recently was laterally transferred to GAS. Orthologous genes can arise by sequence divergence under ecological or sexual isolation conditions. Such isolation can promote speciation following multiple rounds of periodic selection for mutants that are fitter for a particular niche (12, 13). Sites within an ancestral gene that are critical for adaptation to a new niche undergo positive selection. Portions of the ancestral gene that are subject to strong purifying selection tend to have lower levels of nucleotide sequence diversity than regions experiencing strong diversifying selection. Homologous recombination between the donor and recipient (target) genes is favored in stretches where there is low sequence diversity. Through interspecies recombination between orthologous genes, new phenotypes can be acquired, while constrained functions can be preserved. Newly acquired orthologous genes potentially provide a rich and ready source for new bacterial phenotypes, which in turn may provide an adaptive advantage under certain ecological conditions.
The first direct encounter between PAM and a subcluster 2b ska product may have occurred following evolution of ancestral ska in discrete phylogenetic lineages. Therefore, PAM did not necessarily shape the environment in which the subcluster 2b ska lineage evolved. The increased fitness at the skin resulting from the PAM gene and subcluster 2b ska being brought into direct contact, by residing within a single genome, may simply have been a chance event. Thus, the strong epistatic coselection observed for PAM and subcluster 2b ska is not necessarily a driving force for the positive Darwinian selection that was detected at many of the codon sites for the streptokinase ß-domain. Based on our data, the epistatic coselection observed for pam and subcluster 2b ska and the diversifying selection observed for ska could have been either coupled or independent.
Several of the ska codons identified as being under diversifying selection also represent fixed amino acid differences among the three major lineages of the ß-domain-encoding region of ska. Therefore, at least some of the diversifying selection pressures acting on ska likely contributed to the evolution of discrete lineages. Furthermore, one or two of the three major ska lineages likely evolved within distinct bacterial species. The unique environment provided by each bacterial species or GAS strain can account for the differential selection pressures encountered during the evolution of each ska lineage. During infection, streptokinase has direct interactions with mammalian host Plg, the mammalian host immune response (43), as well as with bacterial proteases (41) and Plg bound via different bacterial proteins (30, 34, 35). Any of these host or bacterial factors has the potential to provide positive selection pressure on the ska gene.
In most structural studies of streptokinase the workers have utilized the product of an skcg gene (46), which is most closely related to the subcluster 2a form of streptokinase. In the fluid phase, the ß-domain of streptokinase is engaged in direct molecular contact with kringle 5 of human-derived Plg (10, 31). One possibility is that subcluster 2b forms of streptokinase are highly adapted to Plg when it is presented in a form that is bound by PAM, which occurs via kringle 2 (48). GAS also express low-affinity Plg-binding proteins on the cell surface (30, 35). The molecular interactions of the ß-domain of streptokinase with Plg may be different for a fluid-phase form and a bound form and may be dependent on the type of Plg-binding protein as well. GCS and GGS express Plg-binding proteins that are structurally distinct from PAM and all other known GAS proteins (34). Another possible selective influence is the possibility that one of the GAS ska forms had a long history of coevolution with Plg in another mammalian host. Streptokinase-mediated activation of Plg derived from nonhuman sources can be less effective than activation of human Plg (38). There are numerous streptococcal species that infect other animals whose streptokinase genes have yet to be analyzed.
It is potentially significant that the subcluster 2a form of ska, present in several throat-tropic strains of GAS (emm pattern A-C), probably originated from GCS and GGS, which are commensals of the URT in humans. Several of the GAS strains harboring the subcluster 2a form of ska, corresponding to emm types 1, 3, 6, and 18, also appear to be responsible for a significant proportion of recent cases of GAS pharyngitis in the United States (23, 25, 28, 29). It remains to be established whether ska facilitates colonization in the throat.
Population genetics and phylogenetics are powerful tools that can be used to guide future experimental studies. For example, site-specific mutagenesis at codons under diversifying selection provides a rational approach for studying the effect of each adaptive change on the in vitro functional activity and immunogenicity of streptokinase. Isogenic mutants, generated by directed allelic replacement of the parental ska gene with an ska allele of another lineage, can be used to measure biological properties of GAS by using in vivo models for infection or colonization. Studies on swapping ska alleles are planned.
The molecular basis for niche adaptation by bacteria can be complex. Experimental findings, epidemiological surveys, population genetics, and evolutionary inferences can all contribute to a comprehensive understanding of this complex phenotype. Epistatic coselection arising between bacterial proteins (PAM and subcluster 2b streptokinase) acting on a common host factor (Plg) appears to contribute to tissue-specific adaptation of emm pattern D GAS at the skin. Recombination between orthologous genes may also play a facilitating role in the emergence of new adaptive phenotypes in bacteria.
This work was supported by grants AI-28944, AI-53826, and GM-60793 from NIH and by a grant-in-aid from the American Heart Association to D.E.B. A.K. was a recipient of a Brown-Coxe postdoctoral fellowship.
Present address: Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO 63110. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»