Previous Article | Next Article ![]()
Journal of Bacteriology, April 2005, p. 2698-2704, Vol. 187, No. 8
0021-9193/05/$08.00+0 doi:10.1128/JB.187.8.2698-2704.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Department of Biological Sciences, University of South Carolina, Columbia, South Carolina
Received 1 November 2004/ Accepted 14 January 2005
|
|
|---|
|
|
|---|
Comparisons of complete genomes of different Staphylococcus aureus isolates illustrated that, in addition to a conserved core of shared genes, different isolates have independently acquired different sets of large mobile elements carrying genes responsible for virulence or drug resistance (11). The best-studied such element is the staphylococcal cassette chromosome mec (SSCmec) element, which confers resistance against the ß-lactam family of antibiotics (3, 10). However, no study has so far addressed the extent to which the set of genes shared among S. aureus genomes have come to exhibit divergent sequence patterns as a result of homologous recombination between genomes.
Comparisons of pairs of closely related bacterial genomes have revealed the presence of certain orthologous gene pairs that show anomalously high divergence at synonymous nucleotide sites (12, 13). Since synonymous mutations do not affect amino acid sequence, they are generally expected not to be subject to strong natural selection. In some species of Bacteria, there is evidence of selection on codon usage, but selection coefficients are estimated to be quite low (2, 9). Thus, synonymous mutations accumulate mainly as a function of mutation rate and evolutionary time; and in the absence of high intragenomic variation in mutation rate, a gene pair showing much greater than average synonymous divergence between two related genomes is indicative of a homologous recombination event that brought in one of the two orthologs from a genetic background distinct from the rest of the genome (12, 13).
Here we apply this reasoning to a set of orthologous genes (i.e., genes descended from a common ancestor without gene duplication) found in each of five complete genomes of S. aureus. We used phylogenetic analysis to identify sister pairs of genomes and then used multivariate statistical methods to identify pairs of orthologs that were unusually divergent at synonymous sites between these pairs of genomes. Then we used the organismal phylogeny to reconstruct the recombinational events that gave rise to these divergent sequences.
|
|
|---|
In order to assign genes to gene families, we applied the BLASTCLUST program (1), which assembles families by homology search using a single-link method, to the sets of predicted protein sequences from the above-mentioned genomes. We used a different set of homology search criteria depending on the goals of the search. In order to identify a set of putative orthologs found in a single copy in both all S. aureus genomes and all outgroup genomes, we used an E value of 106 for the BLASTP homology search and scored a match between two sequences only if they were at least 30% identical over at least 50% of their length. Using these criteria, we found 506 families represented by a single member in each of the nine genomes. In order to identify a set of putative orthologs found in a single copy in all S. aureus genomes, we used a stricter set of criteria as follows: we used an E value of 106 for the BLASTP homology search and scored a match between two sequences only if they were at least 50% identical over at least 70% of their length. These criteria identified 2,129 gene families represented by a single member in each S. aureus genome (see Table S1 in the supplemental material).
Phylogenetic analyses. Homologous sequences were aligned at the amino acid level with the CLUSTALW program (23), and phylogenetic analyses were applied to the concatenated amino acid sequence of the 506 protein families represented by a single copy in both the S. aureus and outgroup genomes. The following methods of phylogenetic reconstruction were used: (i) the maximum parsimony (MP) method, implemented in the PAUP* program (22); (ii) the quartet maximum-likelihood method (QML), implemented in the PUZZLE 5.2 program (21); and (iii) the neighbor-joining (NJ) method (19), implemented in the MEGA2 program (15). The NJ tree was based on the gamma-corrected amino acid distance, with the shape parameter (a = 0.64) estimated by the PUZZLE 5.2 program. In the MP and NJ trees, the reliability of the internal branches was assessed by bootstrapping (7); 1,000 bootstrap pseudosamples were used. In the QML tree, the percentage of puzzling steps supporting a given branch provides a conservative test of the reliability of the branch, analogous to bootstrapping (21).
Nucleotide substitution. After the amino acid sequence alignment was imposed on the DNA sequences, the number of synonymous nucleotide substitutions per synonymous site (dS) and the number of nonsynonymous nucleotide substitutions per nonsynonymous site (dN) were estimated by a maximum-likelihood method (27) using the software package PAML (26). Genes with anomalously high degrees of synonymous divergence were identified by k-means clustering (12). We conducted nonhierarchical k-means clustering using McQueen's algorithm (14). This is a method of creating clusters of observed multivariate data points such that variability within clusters is minimized and variability between clusters is maximized. All statistical analyses were conducted with the Minitab statistical package, release 13 (http://www.minitab.com/).
|
|
|---|
![]() View larger version (13K): [in a new window] |
FIG. 1. Topology of phylogenetic trees constructed by the NJ, MP, and QML methods based on 506 orthologous genes present in S. aureus and outgroup genomes (151,846 aligned amino acid residues). Numbers on the branches represent the percentage of 1,000 bootstrap samples supporting the branch in both NJ and MP trees; the values were the same in both trees and also were identical to the proportion of puzzling steps supporting the branches in the QML tree.
|
|
View this table: [in a new window] |
TABLE 1. Numbers of synonymous and nonsynonymous substitutions per site at 2,127 orthologous loci in comparisons between S. aureus genomes
|
One possible explanation for such homogeneity at synonymous sites among genomes is that these 108 genes were recently transferred by recombination events among genomes. An alternative hypothesis is that these genes are subject to some unusual constraint at synonymous sites that substantially reduces the rate of synonymous substitution. In order to decide between these hypotheses, we compared the 108 genes without synonymous differences to the other genes in our sample (Table 2). Both median and mean length (number of aligned codons) were significantly lower in these 108 genes than in the other genes (Table 2). Mean percent G+C at third codon positions was slightly but significantly lower in the 108 genes than in the other genes (Table 2). And both mean and median dN in the comparison between other genomes at the outgroup MRSA252 were significantly lower in the 108 genes than in the other genes (Table 2).
|
View this table: [in a new window] |
TABLE 2. Comparison of orthologous genes identical at synonymous sites among five S. aureus genomes with other orthologous genes
|
|
View this table: [in a new window] |
TABLE 3. Median and mean dS for clusters of genes identified by k-means clustering
|
In spite of the high dS values, the corresponding values of dN were not exceptionally high (Table 4). As with dS, there were significance differences among clusters with respect to both median and mean dN (Table 3). The fact that the dN values were moderate implies that the anomalously high dS values were not caused by faulty alignment. Although somewhat divergent in amino acid sequence, the genes in clusters 2 and 3 were extraordinarily divergent at synonymous sites only.
|
View this table: [in a new window] |
TABLE 4. Median and mean dN for clusters of genes identified by k-means clustering
|
![]() View larger version (17K): [in a new window] |
FIG. 2. Hypothetical scenarios for recombination events in the history of S. aureus genomes.
|
|
View this table: [in a new window] |
TABLE 5. Candidates for intergenomic recombination identified by k-means clustering
|
The genes for cassette chromosome recombinases A and B and a linked gene of unknown function (Table 5) in MSSA476 were highly divergent at synonymous sites from the orthologous genes in the other genomes. This pattern suggests a recombination into MSSA476 from an unknown source (Fig. 2D). Alternatively, it is possible that MSSA476 possesses the ancestral form of this set of linked genes and that the sequences of these genes in MW2, Mu50, N315, and MRSA252 have resulted from independent events of recombination from an unknown source. Similarly, N315 possessed a sequence at the locus encoding enterotoxin P (Table 5) that was highly divergent at synonymous sites from the orthologous genes in the other genomes, suggesting a recombination into that genome from an unknown source (Fig. 2E).
Finally, there were cases in which certain genomes showed sequences identical at both synonymous and nonsynonymous sites to those of the distantly related genome MRSA252 but highly divergent from the other genomes. Such a pattern suggests a recent recombination from MRSA252 or from a genome closely related to MRSA252 (Fig. 2F and G). Alternatively, independent events of recombination may have occurred in both MRSA252 and these other genomes. In the case of three genes of unknown function, a sequence identical to that of MRSA252 was shared by Mu50 and N315 (Table 5), suggesting recombination into the ancestor of these two genomes (Fig. 2F). In the gene for a phage anti-repressor (Table 5), the MRSA252-like sequence was found only in N315, suggesting recombination into that genome (Fig. 2G).
|
|
|---|
These 108 loci included a number of known genes encoding short, highly conserved proteins: for example, 25 of 108 loci (23.1%) encoded ribosomal proteins (see Table S2 in the supplemental material). Thus, it seems likely that most of these genes are subject to strong functional constraints at the amino acid level. The absence of synonymous substitution suggests that there may be additional strong constraints on synonymous codon usage. However, a number of the predicted proteins were of unknown function (see Table S2 in the supplemental material). Thus, it is possible that some of these predicted genes do not really correspond to protein-coding genes but rather represent noncoding sequences that are conserved for other reasons. A possible example was the set of predicted orthologous loci represented by MW060 in the MW2 genome; the predicted protein in this case is only 26 amino acids long (see Table S2 in the supplemental material).
A second group of unusual genes were 45 genes that showed anomalously high levels of synonymous substitution. At these loci, the value of dS in one or more comparisons among the four most closely related genomes (MW2, MSSA476, Mu50, and N315) was 1 or even 2 orders of magnitude higher than those seen at typical loci. There is no known form of natural selection that can cause such an elevation in the rate of synonymous substitution (12). The only known form of selection affecting synonymous sites is selection on synonymous codon usage, but this selection is likely to be purifying and thus to reduce the rate of synonymous substitution rather than enhance it (20). Therefore, any gene that has an unusually high dS in the comparison between two closely related genomes is likely to have been recombined into one of the two genomes by homologous recombination from a more distantly related genome (12, 13).
In the case of the 45 genes identified by unusually divergent values of dS in the present data set, we examined the pattern of synonymous substitution in order to reconstruct the hypothetical recombination event or events. The most common type of event involved independent recombination into the common ancestors of each of the two sister pairs (Fig. 2A). Several of these genes showing evidence of this pattern were potentially important for pathogenesis, including genes encoding staphylocoagulase, exotoxins, and Ser-Asp fibrinogen-binding bone sialoprotein-binding proteins SdrC and SdrE (Table 5). The latter in particular are known to play a key role in hematogenous tissue infection of humans (24). In a number of cases, the genes involved in putative recombination events were linked together in a cluster, suggesting that recombination events can span a number of linked loci. A striking example involved the linked aroA, aroB, and aroC genes (Table 4), which function in the synthesis of aromatic amino acids (18).
Putatively recombinant genes detected by our approach included proteins apparently introduced by mobile genetic elements such as phage (phage tail length tape measure protein and phage anti-repressor; Table 5) and the SCCmec element. The latter included the linked genes for cassette chromosome recombinases A and B, in which MSSA476 had a sequence highly divergent at synonymous sites from those of the other genomes (Table 5). The latter pattern suggested that either (i) MSSA476 received these genes by recombination or (ii) MSSA476 possesses the ancestral form of this set of linked genes and the sequences of these genes in MW2, Mu50, N315, and MRSA252 have resulted from independent events of recombination from the same source. Given the phylogeny of the genomes (Fig. 1), at least three independent recombination events (to MW2, to MRSA252, and to the ancestor of Mu50 and N315) would be required under the latter hypothesis. Thus, this hypothesis is not as parsimonious as the hypothesis of a single recombination event into MSSA476. However, the less parsimonious hypothesis is attractive because of the association of the cassette chromosome recombinases with the SSCmec island, which is found in MW2, Mu50, N315, and MRSA252 but not in MSSA476.
It was also of interest that both Mu50 and N315 possessed three genes that were identical at synonymous sites to the corresponding genes of the distantly related MRSA252 genome but highly divergent from the corresponding genes of MW2 and MSSA476 (Table 5). Similarly, the phage anti-repressor gene of N315 was identical at both synonymous and nonsynonymous sites to that of MRSA252 but divergent from homologues in more closely related genomes (Table 5). These patterns suggest recombination events in which the donor sequence was MRSA252 or a closely related genome (Fig. 2F and G). Again a less parsimonious alternative is possible, which would involve independent recombination events from an unknown source into both MRSA252 and these other genomes.
Although much less divergent at nonsynonymous sites than at synonymous sites (Tables 3 and 4), the genes introduced by recombination did show differences at the amino acid level. The results thus indicate that homologous recombination in S. aureus can be a source of genes encoding proteins having amino acid sequence differences and thus potential functional differences. The proteins encoded by these recombinant loci include some with known or likely functions in pathogenesis (e.g., staphylocoagulase, exotoxin, Ser-Asp fibrinogen-binding bone sialoprotein-binding protein, fibrinogen and keratin-10 binding surface-anchored protein, fibrinogen-binding protein ClfA, and enterotoxin P). Thus, the results support the hypothesis that exchange of homologous genes among S. aureus genomes plays a role in the evolution of pathogenesis in this species.
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»