ABSTRACT
Comparisons of the 1.84-Mb genome of serotype M5 Streptococcus pyogenes strain Manfredo with previously sequenced genomes emphasized the role of prophages in diversification of S. pyogenes and the close relationship between strain Manfredo and MGAS8232, another acute rheumatic fever-associated strain.
Streptococcus pyogenes (alternatively referred to as group A Streptococcus) is responsible for diverse diseases in humans, including pharyngitis, toxic shock syndrome, impetigo, and scarlet fever, and the postinfection sequela acute rheumatic fever (ARF) (7). We sequenced the genome of a serotype M5 strain of S. pyogenes, strain Manfredo, which was isolated from an ARF patient in 1952 in the United States (19). The genomes of 11 other S. pyogenes strains have been sequenced, including strains that were associated with various clinical conditions and representatives of the following eight serotypes: M1 (10, 22), M2 (4), M3 (5, 14), M4 (4), M6 (2), M12 (4), M18 (20), and M28 (11). Based on multilocus sequence typing (MLST), which has been used to investigate the genetic relationships of S. pyogenes (8), the 12 sequenced S. pyogenes strains can be placed into nine sequence types (ST) that are distributed throughout the S. pyogenes population (Fig. 1). Strain Manfredo (ST99) is most closely related to the serotype M6 strain MGAS10394 (ST382) (3) and the serotype M18 strain MGAS8232 (ST42) (20). In each case, three of the seven MLST alleles are identical to those of Manfredo (www.mlst.net ), and only two of these alleles are present in all three strains, suggesting that although these strains are clearly related, they are not clonal.
Phylogenetic diversity of the sequenced S. pyogenes strains: unrooted neighbor-joining tree constructed using concatenated sequences of the seven loci used in MLST for a representative selection of STs from the S. pyogenes MLST database (www.mlst.net ). The tree was constructed using ClustalX (23) and Phylip (9) with the Kimura two-parameter method, and the plot was generated using NJplot (17). The positions of all 12 strains for which complete genome sequences are available are shown.
The Manfredo sequence was assembled, finished, and annotated as described previously (12, 15), using Artemis to collate data and facilitate annotation (18). The genome consists of a single 1,841,271-bp circular chromosome, which contains 1,819 protein-coding sequences (CDSs). Approximately 14% of these CDSs (254 CDSs) are contained in prophages. A comparison of all the other S. pyogenes sequenced strains except serotype M3 strain SSI-1 showed that the genomes are colinear (14), with multiple prophage insertions throughout the chromosomes (2). In the case of Manfredo and SSI-1 there is a large central inversion (∼1.3 Mb) (Fig. 2), which probably resulted from reciprocal recombination between rrn-comX regions that are similar distances from the terminus of replication (14) (Fig. 2). Notably, this large inversion is visible in the comparison of Manfredo with MGAS10394 and MGAS8232 (Fig. 2), which are more closely related to Manfredo than SSI-1 is (Fig. 1). This suggests that the inversion occurred independently in Manfredo and SSI-1. There is an additional smaller (∼200-kb) rearrangement near the terminus of strain SSI-1 (compared to MGAS315) (Fig. 2), due to reciprocal recombination between prophages across the replication axis (14). This smaller inversion was not found in the Manfredo chromosome, which lacks prophages inserted at equivalent sites (Fig. 2). It seems clear that intrachromosomal recombination is an important mechanism contributing to the evolution of both S. pyogenes genomes and prophages and has the potential to generate novel recombinant prophages with alternative cargos (14).
Comparison of the genome structures of S. pyogenes: pairwise comparisons of the S. pyogenes MGAS315, SSI-1, Manfredo, MGAS10394, and MGAS8232 chromosomes displayed using the Artemis Comparison Tool (ACT) (6). The sequences were aligned using the predicted replication origins (oriC) (left), with the terminus of replication in the center. The colored bars separating the genomes (red and blue) represent matches identified by BLASTN (1). Red lines link matches in the same orientation, and blue lines link matches in the reverse orientation. Regions of the chromosomes containing prophages are indicated by pink boxes.
The Manfredo genome contains five prophages (φMan.1, φMan.2, φMan.3, φMan.4, and φMan.5) (Fig. 2) that exhibit mosaic relationships with other S. pyogenes prophages. To illustrate the relationships of the prophages, for each of the sequenced S. pyogenes strains all of the resident prophages were concatenated (joined end to end) and compared using DOTTER (21), which displays nucleotide similarity as a dot matrix plot (Fig. 3). In Fig. 3, individual colored bars on the x and y axes represent the concatenated prophage sequences for each of the strains indicated, and vertical and horizontal lines indicate the junctions between individual prophages on the x and y axes, respectively. Diagonal lines indicate sequence similarity; self-matching of the concatenated sequences generated the continuous central diagonal line. Lines on either side of the continuous central diagonal line indicate regions where there is extended sequence identity in the forward (parallel) and reverse (perpendicular) orientations for intersecting sequences. Of particular note is the fact that the distribution of prophages and the overall genetic relatedness between the strains are clearly not congruent. Divergent prophages are inserted at exactly the same sites in closely related strains, while identical sites in divergent strains can be occupied by highly conserved prophages. For example, φMan.5 and φ10394.8 are clearly distinct prophages (Fig. 3) occupying the same sites in the closely related serotype M5 strain Manfredo and serotype M6 strain MGAS10394 (Fig. 1), while φMan.4 and φ10750.1 are similar prophages (Fig. 3) at the same chromosomal location in distantly related serotype M5 strain Manfredo and serotype M4 strain MGAS1075 (Fig. 3).
Comparative nucleotide sequence analysis of S. pyogenes prophages: dot matrix showing the relatedness of the nucleotide sequences of prophages generated with DOTTER (21). The prophages used in the comparison (in order) were joined end to end and were obtained from Manfredo (φMan.1, φMan.2, φMan.3, φMan.4, and φMan.5), SSI-1 (SPsP1, SPsP2, SPsP3, SPsP4, and SPsP5) (14), SF370 (370.1, 370.2, 370.3, and 370.4) (10), MGAS315 (φ315.1, φ315.2, φ315.3, φ315.4, φ315.5, and φ315.6) (5), MGAS8232 (φspeA, φspeC, φspeL/M, φ370.3-like, and φsda) (20), MGAS10394 (φ10394.1, φ10394.2, φ10394.3, φ10394.4, φ10394.5, φ10394.6, φ10394.7, and φ10394.8) (3), MGAS6180 (φ6180.1, φ6180.2, φ6180.3, and φ6180.4) (11), MGAS5005 (φ5005.1, φ5005.2, and φ5005.3) (22), MGAS2096 (φ2096.1 and φ2096.2) (4), MGAS9429 (φ9429.1, φ9429.2, and φ9429.3) (4), MGAS10270 (φ10270.1, φ10270.2, φ10270.3, 10270.4, and 10270.5) (4), and MGAS10750 (φ10270.1, φ10270.2, φ10270.3, and φ10270.4) (4). The colored bars indicate the extents of the concatenated prophages for the strains, and the vertical and horizontal lines indicate the extents of the individual prophages. The green arrows indicate prophages referred to in the text.
Four of the five Manfredo prophages encode putative virulence factors that are identical or very similar (>98% amino acid identity) to proteins encoded by CDSs present in prophages in other S. pyogenes strains. These factors include for φMan.1, streptococcal phage DNase SpyM50534; for φMan.2, streptococcal phage DNase SpyM50691; for φMan.3, exotoxin H SpyM51021 and exotoxin I (pseudogene) SpyM501024; and for φMan.4, streptococcal phage DNase SpyM501263 and exotoxin C SpyM501264. The fifth prophage, φMan.5, does not contain any CDSs with similarity to CDSs encoding known virulence factors and appears to be a satellite phage. The potential for prophage recombination events to generate novel combinations of virulence genes is highlighted by the fact that mosaic structures are evident in the regions carrying the virulence determinants. For example, prophages φMan.2 and φ8232.4, which are inserted at the same attachment site in their respective genomes, exhibit extended similarity to each other but also include clearly divergent sequences (Fig. 3). The sequences at left ends of these prophages are divergent and encode streptococcal phage DNases with less than 30% amino acid identity. Notably, the streptococcal phage DNase of φ8232.4 is almost identical (99.624% amino acid identity) to the streptococcal phage DNase encoded by φMan.1, a prophage that displays far less sequence conservation with φ8232.4 than φMan.2 displays (Fig. 3).
Excluding genes associated with prophages, ∼68% of the CDSs in Manfredo have orthologs in all of the other sequenced strains. Therefore, taking into account the contribution that prophages make, between 14 and 18% of the genome is composed of CDSs that are not conserved in one or more of the sequenced strains. Some of the CDSs in this variable component may contribute to the clinical differences between the strains. For the 12 sequenced strains, there is very strong evidence linking both serotype M5 strain Manfredo and serotype M18 strain MGAS8232 with ARF (7, 20), and the genetic relationship between these two strains is interesting (Fig. 1). However, reciprocal FASTA (16) analysis identified only a single CDS that was conserved in Manfredo and MGAS8232 but absent in the other S. pyogenes strains. This CDS encodes a surface-anchored protein (SpyM50104) that is a shortened allelic variant of a fibronectin-binding protein commonly found in the variable FCT (9) region of S. pyogenes genomes. The fibronectin-binding protein variants of Manfredo and MGAS8232 appear to be truncated compared to the other variants, but they retain functionally important N-terminal signal and C-terminal sortase processing motifs (VPXTG) that predict that they are likely to be expressed on the cell surface. However, while the conservation of this locus in strains that have been very clearly associated with ARF may be worthy of further investigation, it must be emphasized that there is no evidence at this stage that the locus influences the pathology of these strains. It must also be emphasized that it is by no means clear that these are the only two representatives of the 12 sequenced strains that have the capacity to cause ARF, since establishing epidemiological relationships between ARF and individual S. pyogenes strains is notoriously difficult (13). Alternatively, it is possible that subtle differences in the conserved components of the genome may be more important in distinguishing ARF isolates. Investigating the distribution and functional effects of single-nucleotide polymorphisms (SNPs) in ARF strains and related non-ARF strains may hold the key to unraveling the causes of this complex disease. To this end, 14,962 SNPs were identified in the comparison of ARF strains (Manfredo and MGAS8232), in contrast to the 15,460 SNPs identified in a comparison of similarly related strains (Manfredo and MGAS10394). To pinpoint the functionally significant SNPs among the genetic noise, wider population studies are required.
Nucleotide sequence accession number.
The sequence and annotation of the Manfredo genome has been deposited in the EMBL database under accession number AM295007.
ACKNOWLEDGMENTS
We acknowledge the support of the Wellcome Trust Sanger Institute core sequencing and informatics groups.
This work was supported by the Wellcome Trust through its Beowulf Genomics initiative.
FOOTNOTES
- Received 4 August 2006.
- Accepted 19 September 2006.
- Copyright © 2007 American Society for Microbiology