Previous Article | Next Article ![]()
Journal of Bacteriology, July 2004, p. 4285-4294, Vol. 186, No. 13
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.13.4285-4294.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom,1 Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri,2 Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut,3 Respiratory Diseases Branch, Centers for Disease Control and Prevention, Atlanta, Georgia,4 Department of Microbiology and Immunology, New York Medical College, Valhalla, New York5
Received 17 December 2003/ Accepted 1 April 2004
|
|
|---|
|
|
|---|
Numerous typing schemes have been used to characterize and measure the genetic diversity among isolates of S. pyogenes. Perhaps the most common tool used today is emm typing (3, 13), which is based on sequence at the 5' end of a locus (emm) that is present in all isolates. The targeted region of emm displays the highest level of sequence polymorphism known for a widely distributed S. pyogenes gene; >150 emm types have been described to date (B. Beall, http://www.cdc.gov/ncidod/biotech/strep/emmtypes.htm). emm encodes the M protein, which forms the basis of a serological typing scheme (28). For many M proteins, the type-specific epitopes elicit strong host protective immunity (23).
There are four major subfamilies of emm genes, which are defined by sequence differences within the 3' end, encoding the peptidoglycan-spanning domain (22). The chromosomal arrangement of emm subfamily genes reveals five major emm patterns, denoted emm patterns A through E (6); strains with patterns B and C are rare and are currently grouped with emm pattern A strains (referred to as pattern A-C strains). A given isolate of S. pyogenes has one, two, or three emm genes lying in tandem on the chromosome, and each gene differs in sequence from the others. In strains having three emm genes, the determinants of emm type lie within the central emm locus.
The emm pattern A-C strains are usually recovered from cases of pharyngitis, whereas emm pattern D strains are most often isolated from impetigo lesions (4, 6, 10). As a group, emm pattern E strains are readily found at both primary tissue sites. For example, in tropical Australia, 84% of isolates recovered by population-based surveillance of an aboriginal community experiencing high rates of streptococcal impetigo and no cases of pharyngitis were either emm pattern D or E (4). In Rome, 98% of pharyngitis isolates were of emm types associated with emm pattern A-C or E (10). Thus, emm pattern can serve as a genotypic marker for tissue site preferences among S. pyogenes strains.
Multilocus sequence typing (MLST) is a relatively new tool for molecular typing of bacteria (8, 33). A principal advantage of MLST over gel-based methods is that the sequence data, which are generated for several neutral housekeeping loci, are unambiguous, electronically portable, and readily queried via the Internet (www.mlst.net). In this report, MLST and emm pattern determination are performed for many previously untested emm types of S. pyogenes. When these data are combined with data from previous reports (10, 12, 31), it is found that the large majority of known emm types (http://www.cdc.gov/ncidod/biotech/strep/emmtypes.htm) are represented. An analysis of the relationships among emm type, emm pattern, and the genetic relatedness defined by MLST is presented.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. New GAS isolates associated with this report
|
emm sequence typing. emm type, which closely corresponds to M serotype, was ascertained by nucleotide sequence determination as previously described (3, 12, 29); a unique emm type is defined as having <95% sequence identity to any other known type over the first 160 bp of sequence, allowing for small indels. A complete and current listing of GAS emm types is posted at ftp://ftp.cdc.gov/pub/infectious_diseases/biotech/emmsequ/ and http://www.cdc.gov/ncidod/biotech/strep/emmtypes.htm). emm pattern was determined by a PCR-based method, as previously described (4).
MLST. Internal fragments of seven housekeeping genes (gki, gtr, murI, mutS, recP, xpt, and yqiL) were amplified and sequenced with primers and under conditions described previously (12). For each locus, distinct allele numbers were assigned to each unique sequence, generating a seven-integer allelic profile for each isolate. Isolates with identical allelic profiles were assigned to the same sequence type (ST). A complete database of alleles, allele sequences, and STs is maintained on the Internet at www.mlst.net.
Additional nucleotide sequence determination.
Using bacterial DNA as a template, PCR amplification products were generated (annealing temperature, 50 or 55°C) with the following oligonucleotide primers: for the cpa locus, 5'-GGA TAT GAG ATT GCC GAA CCT ATT ACT TTT AAA G-3' (forward) and 5'-GGA GCC TGT TTA TCT TCC ATT CGA ATA ATA TCC AC-3' (reverse) (product size,
600 bp); for the prtF1 locus, 5'-TGC GCG GGT TCT ATC GGT TTT GGT CAA GTA-3' (forward) and 5'-AAT TAG TTT T(T/C)T CA(G/A) (T/A)GC (T/C)TC ACG CAT TAA-3' (reverse) (product size,
360 bp). The same primers were used for nucleotide sequence determination.
Computational analysis. Sequence (nucleotide and amino acid) alignments and percent sequence identity calculations were performed with Clustal W (DNAStar; version 5.05). The eBURST algorithm was applied with software available at http://eburst.mlst.net (15). Average distances between STs was calculated by the START-distance matrix method (www.mlst.net). For tests for independence, Fisher's two-tailed exact test was used (DnaSP; version 3.99).
Nucleotide sequence accession numbers. The new housekeeping allele sequences generated as part of this report were submitted to GenBank and assigned accession numbers AY520918 through AY521006. The new allele sequences associated with the cpa and prtF1 loci were submitted to GenBank and assigned accession numbers AY579608 through AY579635.
|
|
|---|
Markers for tissue site preference. emm pattern serves as a useful genotypic marker for tissue site preferences of individual strains and clones. Of the 158 emm types represented within the complete set of 495 isolates, emm pattern was established for one or more isolates of 156 emm types (Table 2). Of the 76 emm types for which emm pattern was determined for two or more isolates, 74 (97%) of the emm types included isolates belonging to a single emm pattern group (i.e., A-C, D, or E). Only two emm types (54 and st854) were found in association with two emm pattern groups. Therefore, isolates of a given emm type usually have the same emm pattern grouping.
|
View this table: [in a new window] |
TABLE 2. emm types according to emm pattern marker for tissue site preference
|
The relationship between emm pattern subpopulations and genetic diversity, as defined by MLST, was also evaluated. Of the 220 STs resolved by MLST, emm pattern was determined for at least one representative of 202 STs. The classical throat strains (emm pattern A-C) displayed the least genetic diversity in their allelic profiles, accounting for only 18% of the 202 STs examined. STs associated with patterns D and E were most abundant, representing 36 and 47%, respectively, of the total number of STs. The data show that emm pattern E strains, as a group, display the most diversity in ST, whereas pattern A-C strains display the least. Pattern D strains are intermediate in their overall diversity of STs.
Relationships among STs. Of the 220 STs of GAS, the average distance from an ST to all other STs was 6.21 housekeeping alleles, calculated by the START-distance matrix method. The mean distance of an ST to the ST with the most similar allelic profile was 2.35 housekeeping alleles. Thus, many STs are distally related to all others.
eBURST is an algorithm that can be used to subdivide MLST data into nonoverlapping groups of STs with a user-defined level of similarity in their allelic profiles (15). The most stringent definition of an eBURST group, where all STs assigned to the same group must share alleles at at least six of the seven MLST loci with at least one other ST in the group, identifies clusters of closely related genotypes that are considered to be descended from the same founder and that are defined as clonal complexes (15). To obtain a population snapshot, the group definition is set at zero of seven shared housekeeping alleles. Thirty-one clonal complexes were observed among the 220 STs with eBURST, and most of these were small clusters of two or three linked STs (Fig. 1). eBURST identifies the most likely founder of a clonal complex and provides bootstrap support for the assignment. For the 220 GAS STs, a founder ST was assigned in only 11 of the 31 clonal complexes; 65% of the clonal complexes were doublets where the direction of evolution is unknown. However, the bootstrap support was <70% for all founder STs, except for ST65 (99% confidence). For each of the 31 clonal complexes, all STs had emm types belonging to the same emm pattern group (Table 2).
![]() View larger version (11K): [in a new window] |
FIG. 1. Population snapshot by eBURST. The entire S. pyogenes database of 495 isolates is displayed as a single eBURST diagram, by setting the group definition to zero of seven shared alleles, which places all isolates in a single group. Each dot represents an ST, and the size of the dot reflects the number of GAS isolates in each ST for the set of 495 isolates under study. STs that differ by a single locus are linked with a solid line; clusters of linked isolates correspond to clonal complexes. Founder STs are labeled (arrows), although, except for ST65, the bootstrap support for the founders was low. The distribution and spacing of unlinked STs and clonal complexes in a population snapshot are not relational and provide no information about the genetic distances between them.
|
Among the 48 SLV pairs identified by eBURST among the 220 STs (Fig. 1), 20 allelic changes were designated recombination events, based on multiple nucleotide differences among alleles at the variant locus (data not shown). The remaining 28 SLV pairs had a single nucleotide difference among the alleles. Of these, in eight cases both alleles of the SLV pair were present in one or more distantly related STs. Thus, 28 of the allelic changes were considered to be due to recombination and
20 were considered to be due to point mutation, and housekeeping loci in GAS are estimated to change by recombination at least 1.4 times more frequently than by point mutation.
Of the 28 allelic changes classified as recombination events, all involved emm pattern D or E strains (16 and 12 genetic events, respectively). Further studies are required to obtain a more precise estimate of the ratio of recombination to mutation and to firmly establish whether recombination is a more common mode of evolutionary change at housekeeping loci in emm pattern D and E strains than in pattern A-C strains.
Association of multiple emm types with a single ST. The great majority of STs were found in association with a single emm type (208 of 220; 95%). Only 12 STs included isolates of two or more different emm types (Table 3); these are referred to as emm-variable STs. However, the 12 emm-variable STs involved a disproportionately large fraction of the total number of emm types (30 of 158, 19%). Three emm-variable STs were associated with emm pattern A-C strains, eight were associated with pattern D strains, and only one was associated with pattern E strains. None of the STs were associated with emm types corresponding to different emm pattern groups.
|
View this table: [in a new window] |
TABLE 3. STs associated with more than one emm type
|
The extent of similarity between the emm sequences of those isolates that have the same ST but different emm types was examined, as this may distinguish variation in emm type that has arisen by the accumulation of point mutations from that arising by horizontal gene transfer. For many of these emm-variable STs, the different emm types have <50% nucleotide sequence identity, and, for all emm types associated with the same ST, the emm type sequences were
91% identical in nucleotide sequence and
84% identical in the corresponding amino acid sequence of the M protein (Table 3). However, close examination of sequence alignments suggests that emm type st1RP31 arose from emm type 30 via intragenic recombination resulting in small deletions; both strains are ST65. Furthermore, emm type sts104 appears to have arisen via fusion of the leader-coding region of emm4 with a downstream emm gene (enn4), on an ST39 genetic background, although the emm type sts104 strain was successfully mapped as emm pattern E. Aside from the two exceptions noted, the large number of sequence differences between emm types strongly suggests that horizontal transfer of emm followed by intergenomic recombination is the primary mechanism underlying emm-variable STs, rather than intragenomic recombination or divergence by point mutation.
Analysis of other adaptive loci in emm-variable STs.
If recombinational replacement of emm type is a recent event, then other loci distant from emm on the genome should display little or no sequence variation among isolates that have the same ST but which differ in emm type. The FCT (fibronectin-collagen-T antigen) region of the GAS genome encodes surface proteins that bind host extracellular matrix proteins (fibronectin and collagen). The FCT region displays high overall levels of genetic diversity and lies
300 kb from the emm region (5, 17, 27, 32). Two FCT region genes, prtF1 and cpa, were examined for sequence diversity in GAS isolates sharing the same ST but differing in emm type (Table 4).
|
View this table: [in a new window] |
TABLE 4. Genetic diversity at other adapative loci for isolates of differing emm types sharing an ST
|
Many emm pattern D strains harbor a cpa gene, rather than the prtF1 gene, within their FCT regions. Of 20 pattern D isolates, belonging to eight STs and representing 20 different emm types, the nucleotide sequence was determined for an internal portion (5' end region) of the cpa gene for 18 strains (Table 4). The partial cpa genes formed three discrete sequence clusters, with two alleles in each major cluster. The percent nucleotide sequence identity among cpa alleles belonging to the same sequence cluster was high (>99%). However, the percent nucleotide sequence identity among alleles belonging to different cpa sequence clusters was much lower, ranging from 62 to 68%; the amino acid sequence identity among different clusters ranged from 49 to 60%. The sequence data suggest that, like prtF1, the cpa locus has a history of being subject to strong diversifying selection.
In contrast to what was found for prtF1, strains having distinct emm types but the same ST were not necessarily uniform in their cpa genes (Table 4). Although four of the seven emm-variable STs examined had identical cpa alleles in strains with different emm types (ST3, -123, -174, and -182), two emm-variable STs had cpa genes belonging to distant sequence clusters (ST9 and -11); in a third (ST4), the cpa fragment could not be amplified from one of the two strains with the cpa primers. These findings suggest that, for the emm pattern D subpopulation, the emergence of strains of the same ST, but with different emm types, may in some cases be more complex than a one-step recombinational replacement of emm.
|
|
|---|
At least 58% (28 of 48) of the recent changes at housekeeping loci in GAS appear to be due to recombination, and this value may be substantially greater, since many alleles among SLVs that differ at a single nucleotide site may have arisen by recombination involving a very similar donor allele rather than by point mutation. The best estimate at present is that recombination changes alleles of housekeeping loci at least 1.4 times more commonly than point mutation. The major contribution of recombination to allelic change is consistent with previous findings that demonstrated a complete absence of congruence among the gene tree topologies for the seven MLST loci, for GAS genotypes representing all emm pattern groups (14). The lack of congruency between loci suggests that, in the long term, recombination has eliminated all phylogenetic signal from gene trees. This finding is further supported by a lack of strong bootstrap support in a phylogenetic tree based on concatenated housekeeping alleles (25).
The emm pattern A-C subpopulation (throat specialists) of S. pyogenes may differ from the skin specialists (emm pattern D) and generalists (emm pattern E) in the relative impact of recombination compared to point mutation in genetic diversification at housekeeping loci. Those recombinational changes at MLST loci that can clearly be discerned appear to have been much more common in emm pattern group D and E strains than in pattern A-C strains. This trend was also observed in an analysis of congruence among housekeeping gene tree topologies, where 5, 0, and 1 of the 42 possible pairwise tree comparisons were significantly congruent for the emm pattern A-C, D, and E subpopulations, respectively (25). Although allelic changes by recombination were less readily detected among the emm pattern A-C strains using eBURST, it is important to emphasize that recombination was observed in all of the emm pattern-defined subpopulations according to several analytic methods (25).
The total number of STs within each clonal complex identified by eBURST was rather low and probably reflects our sampling strategy. In general, eBURST may identify few clonal complexes, and few large clonal complexes, in populations where sampling has largely been designed to uncover the genetic diversity within the species (11, 16, 34), as in this work, where a small number of isolates or a single isolate of most emm types was examined. Thus, a more optimal sampling of GAS will be required for identifying many additional clonal complexes, for defining their founding genotypes, and for exploring the patterns of descent, in order to provide a better assessment of the impact of recombination and mutation.
The data suggest that a significant proportion of emm pattern A-C and D strains, but not pattern E strains, have a recent history of recombinational replacement of emm type, yielding STs that are associated with multiple, divergent emm types. These events may be relatively recent, as no variation in a gene that is believed to be under diversifying selection (prtF1) was detected in isolates of the emm-variable STs of emm pattern A-C. Pattern D strains generally lack prtF1 but instead harbor cpa, which is located at the same approximate position within the genome (5, 32). Not all pattern D strains sharing the same ST and harboring divergent emm types had the same cpa allele; distant sequence clusters of cpa genes were observed on the same ST background in association with different emm types. Thus, in some cases, diversification at the rapidly evolving cpa locus may have occurred subsequent to the recombinational replacement of the emm gene. Analysis of additional loci may aid in obtaining a more complete understanding of the recent evolutionary history of these strains.
Recombinational replacement of emm type, which may occur during coinfection of a single host tissue site by multiple GAS strains, can potentially provide an avenue for immune escape. The ability of a strain to successfully be transmitted to a new human host diminishes as protective immunity arising from infection gradually builds among the host population (1, 19, 20). For many GAS strains, the type-specific epitopes of the M protein elicit strong protective immunity (2, 9, 23, 24, 28). If the emm type of a parent (recipient) strain is replaced with a new emm type from an unrelated donor strain, the new genotype may have a strong selective advantage if the host population is largely nonimmune to the emm type of the donor strain and immune to the emm type of the parent strain. The ability to recover multiple emm types in association with a single ST through epidemiologic sampling, as shown in this report, may reflect past strain-to-strain competition mediated through herd immunity. Patients with impetigo often differ from those with pharyngitis in their immune response to specific S. pyogenes antigens (26). This may be the result of fundamental differences in the host immune response to infection at these two tissue sites, which in turn, may provide a basis for differential selection pressures on the subpopulations of strains. Examination of the relationships between alleles at neutral (housekeeping) and adaptive (e.g., emm, prtF1, and cpa) loci of GAS may allow one to make reasonable predictions on the strength of host immune selection acting on each adaptive locus.
Of the 48 SLV pairs identified by eBURST, 17 pairs were represented by an ST that was also a recipient for recombinational replacement of emm type. In fact, all except 1 of the 12 emm-variable STs (ST182) were represented among the clonal complexes identified by eBURST. Among the 17 SLV pairs represented by an emm-variable ST, nine (53%) of the genetic diversification events at housekeeping loci were attributed to recombination; however, in most cases, it remained unclear as to whether the emm-variable ST was the likely ancestral ST. Frequent acquisition of genes via horizontal transfer could be due to high prevalence of the recipient strain within the human host population, with increased opportunities to be present within mixed infections, or, alternatively, could be due to intrinsic properties that render certain STs highly efficient as recipients of recombinational and/or lateral gene transfer events. It is perhaps of relevance here that some strains of S. pyogenes appear to be naturally transformable and that, furthermore, the locus (sil) that confers the competence phenotype has a limited distribution among strains (21). Generalized transduction may be an important mechanism for horizontal transfer leading to homologous recombination in S. pyogenes.
A comprehensive catalogue of STs and emm patterns for the majority of known emm types of GAS, as presented in this report, provides a foundation for addressing questions on the population substructure of this biologically diverse bacterial pathogen. emm pattern D and E strains account for >80% of emm types, and therefore, from a global standpoint, these strains are of medical importance. The STs of emm pattern E isolates are rarely associated with one or more emm types; divergent emm types associated with the same ST were a far more common feature of pattern A-C and D emm types; however, the genetic mechanisms underlying the emergence of population structures of the emm pattern A-C versus emm pattern D subpopulations seem to be distinct. Genetic diversification by recombination appeared to be the dominant mechanism in emm pattern D and E strains but was less readily detectable among pattern A-C strains. When genetic diversification is combined with differential effects of host immune selection on each of the emm pattern-defined subpopulations, distinct population substructures can emerge.
This work was supported by the National Institutes of Health (GM60793, to D.E.B. and B.G.S.; AI053826, to D.E.B.), the American Heart Association (grant-in-aid, to D.E.B.), and the Wellcome Trust (to B.G.S.). B.G.S. is a Wellcome Trust Principal Research Fellow.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»