Previous Article | Next Article ![]()
Journal of Bacteriology, November 2007, p. 7808-7818, Vol. 189, No. 21
0021-9193/07/$08.00+0 doi:10.1128/JB.00796-07
Copyright © 2007, American Society for Microbiology. All Rights Reserved.
,
,
Alexis Delétoile,1,
Martine Lefevre,1
Ivan Ciznar,3
Karel Krovacek,4
Patrick Grimont,1 and
Sylvain Brisse1*
Unité Biodiversité des Bactéries Pathogènes Emergentes, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France,1 Department of Crop Systems, Forestry and Environmental Sciences, University of Basilicata, Potenza, Italy,2 Slovak Medical University, Faculty of Public Health, 83303 Bratislava, Limbova 12, Slovak Republic,3 Department of Biomedical Sciences and Veterinary Public Health, Faculty of Veterinary Medicine and Animal Science, SLU, Box 7036, 750 07 Uppsala, Sweden4
Received 23 May 2007/ Accepted 1 August 2007
|
|
|---|
|
|
|---|
The level of genetic diversity of P. shigelloides and its population structure are currently unknown. The existence of clones that may have distinct ecological specialization or epidemiology is an important question for the control of P. shigelloides infections. Strain diversity in this species has been demonstrated by pulsed-field gel electrophoresis (49) and random amplified polymorphic DNA analysis (28). However, these methods are difficult to standardize and provide limited information on the phylogenetic relationships among strains. To our knowledge, no other molecular method has been applied to characterization of P. shigelloides strains. A serotyping scheme based on characterization of the somatic (O) and flagellar (H) antigens has been available since 1996 and was last updated in 2000 (4, 5). This scheme describes about 100 O types and 50 H types, with a high number of combinations of these types. Correlation between some serovars and sources has been suggested (27).
Bacterial species differ widely in the rate of homologous recombination (19, 25, 50, 57). This rate is particularly relevant for understanding strain evolution. For example, high rates of recombination break down the link between the genomic background of strains and surface antigens that are important for vaccine design (43) or strain typing (8). In addition, homologous recombination renders bacterial clones unstable over time (2, 25, 31), with important consequences for the interpretation of molecular markers as applied to epidemiological follow-up of strains. For all Enterobacteriaceae species for which the impact of recombination on strain evolution was estimated previously, a low or limited impact of recombination on clonal diversification and population structure was inferred (see Discussion). However, it is not known whether a low recombination rate is a general feature of the Enterobacteriaceae family.
The population structure of bacterial species and strain evolution are best studied using standardized methods based on nucleotide sequences (1, 39, 40, 51). Multilocus sequence typing (MLST) is now widely used to study strain evolution and typing in many different species (15, 54). The resulting unambiguous and portable data allow users from different laboratories to compare data, which is necessary to get a comprehensive overview of strain diversity and distribution. In addition, these methods are suitable for studying strain phylogeny and population structure (20, 39).
In this study, we used MLST based on five genes (fusA, leuS, pyrG, recG, and rpoB) to investigate the diversity of a collection of 77 P. shigelloides strains isolated from diverse sources and different countries. We found a high level of diversity and demonstrated that P. shigelloides strains undergo high levels of homologous recombination, which eliminates congruence between trees based on single genes and disrupts correspondence between serotypes and genomic background.
|
|
|---|
![]() View larger version (33K): [in a new window] |
FIG. 1. Allelic profiles, STs, O and H serotypes, sources, and countries of isolation of the 77 strains of P. shigelloides. The unweighted-pair group method using average linkage dendrogram was constructed from a distance matrix consisting of pairwise distances between allelic profiles.
|
PCR primers. For four genes (fusA, leuS, pyrG, and recG), we designed primers suitable for P. shigelloides strain amplification by decreasing the degeneracy of universal oligonucleotides (46), using sequenced genomes of four members of the Enterobacteriaceae (Escherichia coli, Salmonella enterica, Klebsiella pneumoniae, and Yersinia pestis). For the rpoB gene, we designed primers VIC4 and VIC6 (Table 1). The five genes selected belong to a set of evolutionarily conserved genes with a single copy in 95% of sequenced genomes (46). The primers consist of two regions, a 5' consensus region with no degeneracy and a 3' degenerate region. Primer characteristics are shown in Table 1.
|
View this table: [in a new window] |
TABLE 1. Primers and conditions used for PCR amplification of internal portions of five protein-encoding genes
|
Serotype characterization. Serotype characterization was performed at the National Reference Laboratory for Vibrionaceae Komárno and Faculty of Public Health, Bratislava, Slovak Republic, using standard protocols (4, 5) and specific Plesiomonas sera.
Data analysis. Editing and analysis of chromatogram traces were performed using BioNumerics v4.5 (Applied Maths, Sint-Martens-Latem, Belgium). Each base of the selected template region was confirmed by at least two chromatograms (forward and reverse); if there were ambiguities for a sequence, the sequence analysis was repeated. MLST data were analyzed by the standard MLST approach; for each gene, an allele number was attributed to each allelic variant, and the sequence type (ST) of a strain corresponded to the combination of the allele numbers of the five genes. A clonal complex (CCs) was defined using eBURST (22) as a group of strains having at least four common alleles (i.e., no more than one difference) with at least one other strain of the group. Minimum spanning tree, unweighted-pair group method using average linkage, and Pearson correlation analyses were performed using BioNumerics v4.5. Nucleotide diversity and amounts of polymorphisms were calculated using DNAsp, version 4 (44).
To test for phylogenetic congruence among the genes, the 64 distinct STs were used. Neighbor-joining trees were generated using PAUP* v4 (55) for each gene individually and for the concatenated sequence of the five genes. As described by Feil et al. (21), for each gene the differences in log likelihood between the tree for that gene and the trees constructed using the other genes were computed using PAUP*, with branch lengths optimized. These differences were compared to those obtained for 100 randomly generated trees.
The relative contributions of recombination and mutation were calculated using the MultiLocus Analyzer software, which was specifically designed for this purpose (S. Brisse, unpublished) using the simplest implementation of the clonal diversification method (23, 29). Briefly, all pairs of distinct alleles for STs differing by only one or two alleles were inspected to determine the number of nucleotide differences. Differences of only a single nucleotide polymorphism were considered to be caused by mutation; differences in more than one nucleotide were attributed to recombination. No correction was made for single nucleotide differences possibly introduced by recombination. For comparison purposes, we used MLST data sets available for Neisseria (first 200 profiles available at pubmlst.org, all except 4 corresponding to Neisseria meningitidis), Streptococcus pneumoniae (first 200 profiles available at www.mlst.net), E. coli (first 200 profiles from Mark Achtman's website, http://web.mpiib-berlin.mpg.de/mlst/), and Yersinia pseudotuberculosis (50 STs from Mark Achtman's website).
Nucleotide sequence accession numbers. The sequences of fusA, leuS, pyrG, and rpoB obtained for 31 Enterobacteriaceae species have been deposited in the GenBank/EMBL database under accession numbers EU010012 to EU010119. The sequences of the five MLST genes and corresponding STs are available at Institut Pasteur's MLST website, www.pasteur.fr/mlst.
|
|
|---|
The phylogenetic relationships of P. shigelloides with other members of the family Enterobacteriaceae have not been precisely defined. In fact, based on 16S rRNA sequences, P. shigelloides was located on an external branch compared to other family members (41, 45), leading some authors to suggest that P. shigelloides could belong to a new taxonomic family (45). In order to address this question, 30 other species representing the major phylogenetic clades in the family Enterobacteriaceae (A. Deletoile, P. Grimont, and S. Brisse, unpublished) were amplified by PCR using our broad-range primers and the fusA, leuS, pyrG, and rpoB sequences were determined. Phylogenetic analysis of the concatenated sequences (Fig. 2) using three other gamma-proteobacterial species (Pasteurella multocida, Haemophilus influenzae, and Shewanella oneidensis) as outgroups clearly showed that P. shigelloides constitutes a unique branch, which is nested deep within the tree. Thus, this result firmly establishes the phylogenetic position of P. shigelloides within the family Enterobacteriaceae (11, 32).
![]() View larger version (24K): [in a new window] |
FIG. 2. Unrooted neighbor-joining tree of 31 Enterobacteriaceae species (type strains were sequenced) and three other Gammaproteobacteria constructed using the concatenated sequences of four loci (fusA, leuS, pyrG, and rpoB). Sequences of the following strains were retrieved from public databases: Y. pestis CO92 (accession no. NC_003143), P. multocida subsp. multocida Pm70 (NC_002663), S. oneidensis MR-1 (NC_004347), and H. influenzae Rd KW20 (NC_000907). All P. shigelloides strains were placed on the same branch as the type strain. The numbers at the nodes correspond to bootstrap values obtained with 1,000 replicates that were more than 80%.
|
|
View this table: [in a new window] |
TABLE 2. Polymorphism observed in five protein-encoding genes among 77 P. shigelloides strains
|
Allelic and genotypic variation in P. shigelloides. The number of distinct alleles per gene ranged from 8 (pyrG) to 57 (leuS). When the five genes were combined, 64 STs were distinguished (Fig. 1). Two STs contained three strains, and 11 STs contained two strains. The genotypic diversity index (Simpson's index) was 99.7%, which is among the highest values found so far in MLST analyses of bacterial species.
Relationships based on allelic profile codes can be more reliable than nucleotide-based phylogenies if homologous recombination occurs, as import of a single divergent allele would strongly affect the phylogenetic position of the recipient strain. A clonal complex (CC) can be defined as a group of profiles in which each profile differs by no more than one gene from at least one other profile of the group (22). STs with a single gene difference in five genes are very likely to share a common ancestry. eBURST analysis (not shown) and minimum spanning tree analysis (Fig. 3) of allelic profiles revealed only two CCs, each with only two STs. CC1 (ST1 and ST2) and CC3 (ST3 and ST4) comprised four and three strains, respectively. These two CCs each formed a single branch in the nucleotide-based phylogeny (see Fig. S1 in the supplemental material). All other STs differed by at least two of five genes from any other ST in our strain collection, and the relationships shown in Fig. 3 should not be considered reliable, as a high number of alternative branchings exist (not shown).
![]() View larger version (24K): [in a new window] |
FIG. 3. Minimum spanning tree analysis of the 77 strains of P. shigelloides based on the number of allelic mismatches among MLST profiles. The colors of the circles represent the serotypes of the strains. (A) O antigen. (B) H antigen. Each circle corresponds to one ST; the size of the circle indicates the number of strains in the ST. The numbers on the lines between circles are the numbers of allelic differences between the corresponding profiles.
|
In some cases, there was concordance between MLST and serotyping (Fig. 3). Of the 11 pairs of strains with the same ST, 5 were of the same serotype. CC3 (which includes ST3 and ST4) included three strains with the same serotype (O66:H3).
However, in many instances, MLST and serotyping data did not coincide entirely. First, serotyping subdivided pairs or triplets of strains within a particular ST. Five of the 11 STs containing pairs of strains were heterogeneous with regard to serotype, and one ST contained strains with the same H antigen but distinct O antigens. ST1 included strains with three distinct serotypes (O40:H6, O66:H2, and O90,17:H6). Second, MLST subdivided groups of strains with the same serotype (O12:H4, O2:H1, O22:H3, O23:H1, O35:H11, O52:H3, O60:H2, O66:H3, [O90, and O17]:H6), and the distinct STs of strains of a given serotype were not closely related. Overall, the adjusted Rand coefficients for ST on the one hand and O antigen, H antigen, and both antigens on the other hand were 0.17, 0.06, and 0.26, respectively (12).
Pairs or triplets of strains sharing the same ST were always obtained in the same country. Hence, we did not identify any international ST. In addition, the two identified CCs were only found in a single country, Finland.
Strains sharing the same ST did not always come from the same host. For example, ST1 strains were isolated from a fox and two humans, ST3 strains were isolated from two distinct bird species, and ST36 strains were isolated from a cat and a human. Some pairs of environmental strains also had the same ST (Fig. 1). CC1 and CC3 were not homogeneous with respect to host (human and a fox for CC1 and three distinct host species for CC3). MLST thus deserves further evaluation, using epidemiologically well-defined samples, to determine its capacity to define the routes and modes of contamination with P. shigelloides strains.
Evidence for homologous recombination. With time, homologous recombination causes progressive disruption of the phylogenetic signal harbored by gene sequences (18). As a consequence, phylogenies of distantly related strains derived from sequences of independent genes may not be congruent. The lack of congruence can be conveniently tested by estimating the likelihood of obtaining sequence data for a given gene by evolution with the phylogeny reconstructed using another gene (21). We calculated the congruence of the five genes with each other. Remarkably, the congruence between trees reconstructed using the five genes was no better than that between each tree and random trees, with only one exception: the likelihood of the recG data on the leuS tree was slightly higher than that on all random trees (data not shown but available upon request). These results are similar to those obtained for N. meningitidis and S. pneumoniae (21), two species with high rates of homologous recombination. Our data show that the five genes have distinct evolutionary histories in P. shigelloides and indicate that in this species homologous recombination is frequent enough to eliminate the phylogenetic signal among distantly related strains.
Short-term impact of homologous recombination. The clonal diversification method provides a quantitative estimate of the contribution of recombination relative to mutation to the generation of genotypic diversity (23, 29). For each pair of allelic profiles that are closely related, the number of nucleotide changes in the alleles that differ is counted. In the simplest implementation, a single nucleotide difference is considered to be likely caused by mutation, whereas more than one mutation in the same portion of a gene is considered to result from recombination, as it is unlikely that two mutations would occur in the same gene while the other genes remain identical. We applied this method considering all evolutionary links between profiles sharing at least three of five alleles (i.e., number of allelic mismatches [D] = 2, corresponding to a maximal profile distance of 2/5 [40%]). The results (Table 3) indicate that P. shigelloides alleles are seven times more likely to be changed by recombination than by mutation. In addition, nucleotides are 77 times more likely to be changed by recombination than by mutation, as recombination events introduce 11 nucleotide changes at once on average. These allelic and nucleotide recombination/mutation ratios are comparable to those found at the maximal profile distance (43%) (three of seven distinct loci) in S. pneumoniae (allelic and nucleotide ratios of 7 and 66, respectively) and slightly lower than those in N. meningitidis (11 and 196, respectively) (Table 3). Remarkably, the recombination rate in P. shigelloides is much higher than that observed in the other Enterobacteriaceae species (E. coli and Yersinia enterocolitica) for which data sets are publicly available (Table 3). When only single-locus variants (D = 1) were used, the ratios were expectedly lower (7.5 and 49 for S. pneumoniae and 5.6 and 108 for N. meningitidis), consistent with previous reports (23, 24). In contrast, the nucleotide ratio for E. coli, previously estimated to be around 50 (29), was estimated to be around 5. The difference may be explained by the low number of values (only 12 strains were examined) that were used to obtain the previous estimate (29).
|
View this table: [in a new window] |
TABLE 3. Recombination/mutation ratios for allelic and nucleotide replacements
|
![]() View larger version (49K): [in a new window] |
FIG. 4. Relationship between allelic profile distance (x axis) and average number of nucleotide differences (y axis) in the distinct alleles. The error bars indicate the standard deviations for N. meningitidis and P. shigelloides. A positive trend was observed for S. aureus, E. coli, and Y. pseudotuberculosis, three species that are generally considered clonal. In contrast, P. shigelloides did not show a positive trend, similar to the recombining bacteria N. meningitidis and S. pneumoniae.
|
|
|
|---|
We developed an MLST scheme based on five genes for evolutionary analysis and strain typing in P. shigelloides. Although most MLST schemes are based on seven genes, there is no particular reason for choosing this number of loci, especially if a lower number provides enough discrimination. Allelic diversity at five genes was enough to discriminate most P. shigelloides strains, and the proposed MLST scheme was one of the most discriminatory among those currently described (www.mlst.net; pubmlst.org; www.pasteur.fr/mlst; web.mpiib-berlin.mpg.de/mlst/). Among the advantages of the MLST method for strain typing are its accuracy and reproducibility. In addition, for P. shigelloides, the capacity to discriminate strains appears to be better than that of serotyping, and all strains could be analyzed, in contrast to serotyping analysis, where a number of strains were untypeable. We propose that MLST could become the reference method for strain tracking and global epidemiology of strains of this pathogen. To this end, a publicly available MLST website was constructed (www.pasteur.fr/mlst).
A high level of nucleotide diversity was found, even though the diversity of the sample of P. shigelloides strains is probably lower than that of the global population of the species. The allelic divergence at several loci was very high, and the nucleotide diversity values were as high as those observed for a global collection of E. coli and were much higher than those for, e.g., S. aureus (21).
The observed level of genotypic diversity, based on allelic profiles, was very high. First, genotype frequencies were widely distributed, with 49 of 77 strains representing unique STs and with no ST comprising more than three strains, resulting in a genotypic diversity index of 99.7%. Second, all but two pairs of genotypes differed by at least two alleles. Therefore, many evolutionarily intermediate genotypes are not represented, and one can expect that future studies using less limited strain samples will reveal a very high number of novel genotypes.
One possible factor contributing to the high level of genotypic diversity is rapid diversification due to high recombination rates; with the same mutation rate, genotypes diversify much more quickly in a species with a high recombination rate than in clonal species. For example, several thousand genotypes have already been described for Neisseria with a high recombination rate (pubmlst.org). We estimated that due to recombination, nucleotides in P. shigelloides evolve approximately 80 times more quickly than they would if the evolutionary process was exclusively clonal. Because the number (n = 2) of comparisons of single-locus variants was insufficient for P. shigelloides, we used comparisons of profiles differing at either one or two loci, in contrast to previous estimates (19, 23, 24) based on single-locus variants only. Therefore, the deduced rate of recombination was expected to be higher, as double-locus variants are less closely related than single-locus variants, hence increasing the probability of double or multiple mutations in the same portion of a gene. Besides, because we used only five genes instead of the seven genes used for the comparative species, the allelic distance of double-locus variants is 40%, whereas the allelic distance is only 29% for profiles that are defined by seven genes. In turn, triple-locus variants in seven-gene profiles correspond to 43% divergence. Therefore, the recombination ratio calculated using a D value of 3 may provide a more appropriate comparison. Based on this ratio, P. shigelloides shows a much higher rate of recombination than other Enterobacteriaceae species and is comparable to S. pneumoniae in both its allelic and nucleotide ratios, while N. meningitidis shows a 2.5-fold-higher nucleotide ratio. When only single-locus variants were used, we obtained values similar to those previously reported for N. meningitidis (23) and S. pneumoniae (24).
Bacterial species vary widely in their rates of homologous recombination (25, 50), and they range from very clonal species, such as Staphylococcus aureus (19), to species with high rates of recombination, such as Helicobacter pylori (53). Pathogens belonging to the Enterobacteriaceae family for which data are available (E. coli, K. pneumoniae, S. enterica, and Y. pseudotuberculosis) have not been shown to be species with high rates of recombination. S. enterica and Y. pseudotuberculosis have been considered highly clonal based on multilocus enzyme electrophoresis data (7, 16, 47), whereas E. coli shows intermediate levels of homologous recombination among housekeeping genes (48, 56, 57). Klebsiella sequence data sets that are publicly available (pubmlst.org/kpneumoniae; S. Brisse, unpublished) show very low levels of recombination. Our clonal diversification analysis shows that recombination has an impact on clonal diversification in these species (except Klebsiella), but the impact is relatively limited compared to that in P. shigelloides. Therefore, P. shigelloides stands out among Enterobacteriaceae species described so far as the member of the family that recombines most. The biological properties of P. shigelloides that could explain the higher impact of homologous recombination are currently unknown. Recombination could be favored by ecological factors, such as the free-living capacity of P. shigelloides, which can favor colocalization of donor and recipient strains, or of transducing phages and bacteria, for DNA exchange. Alternately, colonization of unidentified hosts could result in the high density of bacteria, phages, and/or DNA necessary for horizontal transfer. Other species with high rates of recombination, such as N. meningitidis and S. pneumoniae, are naturally transformable. It is not known whether P. shigelloides is naturally transformable, which would be a likely contributing factor. However, natural transformation in itself is not enough to explain high rates of recombination, as some naturally transformable species, such as H. influenzae, do not exhibit a highly recombinant population structure (21).
High rates of recombination increase the speed of genome diversification, hence affecting the interpretation of genomic differences that are observed among strains. Our results show that there is no strong correspondence between serotype and ST. This result can have several explanations, such as cross-reactivity of unrelated antigens, parallel evolution of antigenic structures, or distinct rates of evolution. However, because recombination reduces the linkage between a given genomic background and individual genes, horizontal transfer of the genetic determinants of the antigens (O-antigen gene cluster or flagellin genes) between distinct genetic backgrounds probably contributes to the weak correspondence between ST and serotypes. In addition, rapid evolution of these genetic determinants by homologous recombination can also confuse the concordance of the two kinds of markers, as classically described for S. pneumoniae (14, 17, 52). Because housekeeping genes are unlikely to be positively selected for variation, detection of recombination in these genes provides an indication that recombination is relatively frequent in the population (25). In contrast, surface-exposed antigens are likely to be under diversifying selection, and hence variants created by recombination may be more frequently retained in the population.
We found no meaningful intraspecies structure within P. shigelloides. Given the homogenizing effect of recombination, this result is consistent with our high recombination rate estimates. Horizontal transfer of genes may also result in the lack of association of genotypes or CCs with particular hosts or sources, as host adaptation factors could be transmitted frequently between genomic backgrounds. However, in N. meningitidis, some STs or CCs are known to be associated with specific epidemic or virulence properties (13, 37), with important consequences for infection control and vaccination strategies. The MLST system described here should be useful for future investigations of these important public health-related questions in P. shigelloides.
Financial support was provided by Institut Pasteur and by a generous gift from the Conny-Maeva Charitable Foundation.
Published ahead of print on 10 August 2007. ![]()
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
A.S. and A.D. contributed equally to this work. ![]()
Present address: Department of Environmental Sciences, Parthenope University of Naples, Via Acton 38, 80122 Naples, Italy. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»