Previous Article | Next Article ![]()
Journal of Bacteriology, November 2008, p. 7060-7067, Vol. 190, No. 21
0021-9193/08/$08.00+0 doi:10.1128/JB.01552-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
J. Doyle,1,
P. I. Fields,2
R. V. Tauxe,1,2 and
J. M. Logsdon Jr.1,4*
Program in Population Biology, Ecology and Evolution, Emory University, Atlanta, Georgia 30322,1 Division of Foodborne, Bacterial and Mycotic Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia 30333,2 Laboratorio Nacional de Referencia de Salmonella y Shigella, Centro Nacional de Microbiologia, Instituto de Salud Carlos III, 28220 Majadahonda, Madrid, Spain,3 Department of Biology, Roy J. Carver Center for Comparative Genomics, University of Iowa, Iowa City, Iowa 522424
Received 26 September 2007/ Accepted 15 August 2008
|
|
|---|
|
|
|---|
The taxonomic classification and nomenclature of the Salmonella have been controversial for decades. Some clarity was obtained with the judicial opinion (no. 80) by the Judicial Commission of the International Committee on Systematics of Prokaryotes (12). The classification used in this study follows that proposed by Le Minor and Popoff (16), Reeves et al. (27), and Tindall et al. (36), the last classification being a combination of the two former methods. The nomenclature reflects the differentiation of the Salmonella subspecies based on phenotypic traits, such as carbon source utilization. This has also been validated to a considerable extent by DNA-DNA hybridization (5). Subspecies determination is performed by the presence or absence of 11 biochemical traits (18). Currently, the Salmonella are divided into two species, Salmonella enterica and Salmonella bongori (8, 12, 28). Salmonella enterica is further divided into six subspecies that were categorized by Tindall et al. (36) as follows: Salmonella enterica subsp. enterica (subsp. I), Salmonella enterica subsp. salamae (subsp. II), Salmonella enterica subsp. arizonae (subsp. IIIa), Salmonella enterica subsp. diarizonae (subsp. IIIb), Salmonella enterica subsp. houtenae (subsp. IV), and Salmonella enterica subsp. indica (subsp. VI). Subspecies VII was described by Boyd et al. (2) by multilocus enzyme electrophoresis (MLEE) data. However, this subspecies is not identifiable by unique biochemical properties. The group originally identified as subsp. V—Salmonella subsp. bongori—is now recognized as the separate species Salmonella bongori (27). In this study, we represent the S. enterica subspecies with Roman numerals (i.e., I to VII).
In addition to the taxonomic classification of subspecies, the salmonellae are further subdivided by serotype using a subtyping method based on two surface structures, the O antigen of the lipopolysaccharide and the flagellar or H antigen. This method has been invaluable to understanding the epidemiology of Salmonella. The combination of the subspecies, 46 O groups, and 114 H antigens accounts for all recognized serotypes of Salmonella (23, 24).
The most frequently encountered subspecies is Salmonella enterica subsp. I. Found primarily in mammals, this subspecies is the most common cause of human disease (4). The other six subspecies of Salmonella enterica, as well as Salmonella bongori, are found primarily in nonhuman hosts and cause only occasional disease in humans. Of the 2,541 total serotypes, 1,504 are in Salmonella enterica subsp. I (24). Of the reported 36,183 Salmonella isolates reported to the national Salmonella surveillance system in 2005, approximately 1% of infections annually are due to subspecies of Salmonella other than subsp. I (4).
Many salmonellae, but not all, express two independent yet coordinately regulated flagellin loci (fliC and fljB) with distinctive protein and antigenic structures. This expression of two separate antigens is unique to Salmonella and was recognized before the nature of flagella was known (13) and described as "phases." Thus, salmonellae possessing the capacity to express two antigens are termed "diphasic" and capable of "phase variation" with respect to their flagellar antigen. The expression of these two loci is regulated by a switch mechanism (hin) so that only one variety of flagellin protein is expressed at a time (33). This diphasic characteristic, however, is limited to four of the Salmonella enterica subspecies (I, II, IIIb, and VI), whereas subspecies IIIa, IV, and VII, as well as Salmonella bongori, have only one flagellin locus (fliC) and are considered "monophasic." Specific serotypes within the diphasic subspecies can also be monophasic.
A variety of methods have been used to examine the phylogenetic history of Salmonella in previous studies. In 1973, Crosa et al. (5) used DNA disassociation by DNA-DNA hybridization to define the species and subspecies of Salmonella and differentiate them from other members of the Enterobacteriaceae. Two more recent studies have helped clarify the phylogeny of the Salmonella subspecies. Boyd et al. (2) defined the relationships of Salmonella based on MLEE and DNA sequence analysis of housekeeping and invasion genes. Porwollik et al. (26) used microarray analysis of gene presence/absence to compare Salmonella subspecies and serotypes. These studies, presented in summary in Fig. 1a to d, resulted in similar conclusions but with some notable exceptions. The phylogeny based on MLEE data (Fig. 1a) conflicted at many points with the DNA sequence-based phylogenies (Fig. 1b and d) and a more recent phylogeny based on microarray data (Fig. 1c). The MLEE data grouped the diphasic (i.e., containing two flagellin loci) subsp. II with subsp. IV and VII, both of which are monophasic; this study also groups subsp. IIIa (monophasic) with I, IIIb, and VI, which are all generally diphasic. If correct, this topology would require multiple acquisitions or losses of the second flagellin locus. The sequence-based study of the invasion genes (Fig. 1d) and housekeeping genes (Fig. 1b) divided the monophasic subspecies from the diphasic subspecies; however, these trees have slight topology differences. The microarray study by Porwollik et al. (26) is in close agreement with the housekeeping gene data set; however, one prominent difference is the relationship of subsp. IIIa to the diphasic subspecies.
![]() View larger version (30K): [in a new window] |
FIG. 1. Summary of previous phylogenetic studies. (a) Tree based on MLEE data from Boyd et al. (2). (b) Tree based on housekeeping gene sequence from Boyd et al. (2). (c) Tree based on gene acquisition data from microarray analysis from Porwollik et al. (26). (d) Tree based on invasion gene sequence from Boyd et al. (2).
|
The goal of this study was to clearly define the species and subspecies phylogeny of Salmonella based on DNA sequence analysis. Inconsistent topologies may have been attributed to a small representative number of taxa from each subspecies, which may have generated inconsistencies between these different studies. To decrease the possibility of incorrect phylogenies with this study, we increased the number of Salmonella taxa studied, ranging from 16 to 20 isolates to 69 with these data; we were able to resolve the phylogenetic relationships among the subspecies with respect to the division of the monophasic and diphasic subspecies. We selected four genes for analysis, recA, mdh, phoP, and gapA, based on representation across the genome, conservation across the salmonellae, and analysis by previous studies. We present here a robust phylogeny of the salmonellae by analysis of these four genes (3,459 bp for each isolate) from 67 isolates of common serotypes and two published genomes. These new data provide a more comprehensive representation of the species and subspecies of Salmonella. In addition, we also determined the genetic distances between the subspecies of Salmonella and found new evidence of the lateral transfer of the recA gene between two subspecies. This Salmonella phylogeny will create a template or "backbone" on which to overlay questions of gene and genome evolution in Salmonella.
|
|
|---|
PCR amplification and sequencing. All genomic DNA was prepared with the DNeasy kit (Qiagen, Valencia, CA). Amplification and sequencing primers used in this study are listed in Table S2 in the supplemental material; mdh primers are from Boyd et al. (2). A total of 100 ng of purified genomic DNA was used in a PCR analysis to amplify products from the genes. For the PCR analysis, the Ready-To-Go PCR beads were used (GE Biosciences, Piscataway NJ) according to product specifications. This included one Ready-To-Go PCR bead, 1 µl of pooled forward and reverse primers for respective genes at a 0.5 µM concentration each, and 1 µl of genomic DNA at 100 ng/µl. PCR conditions for the individual gene amplification are as follows: recA, 96°C for 2 min, followed by 35 cycles of 96°C for 30 s, 53°C for 30 s, and 72°C for 2 min; mdh, 96°C for 2 min, followed by 35 cycles of 96°C for 30 s, 54°C for 30 s, and 72°C for 1 min; phoP, 96°C for 2 min, followed by 35 cycles of 96°C for 30 s, 53°C for 20 s, and 72°C for 1 min; and gapA, 96°C for 2 min, followed by 35 cycles of 96°C for 30 s, 55°C for 30 s, and 72°C for 1 min. PCR products were purified for sequencing using the QIAquick PCR cleanup kit (Qiagen, Valencia, CA).
Sequencing. All sequencing was performed directly on PCR products using either the CEQ 8000 genetic analysis systems (Beckman Corp., Fullerton, CA) or the ABI 3700 genetic analyzer (Applied Biosystems, Foster City, CA). All respective methods and reagents for these systems were followed.
Analysis of DNA sequences. DNA sequences were confirmed bidirectionally with fourfold coverage. These were aligned and edited with Lasergene 5.0 (DNAStar, Madison WI), and sequence alignments were performed with ClustalX (35), subspecies distance matrices were calculated by MEGA 3.1 (15) and exported into Microsoft Excel, the synonymous substitution distances were determined for the concatenated tree with the four housekeeping genes using MEGA 3.1, and these substitution distances were used to generate a neighbor-joining tree using MEGA 3.1.
Sequence alignments and branch swapping topology tests were performed using MacClade (17). Phylogenetic analysis was performed using MEGA (15) and MrBayes (11), and resulting tree files were viewed and edited in Treeview (21). Polymorphism, substitution, and G+C content calculations were generated in DnaSP (see Table S3 in the supplemental material) (30).
LGT. We assessed the possibility of lateral gene transfer (LGT) events in the individual trees of recA and mdh. Phylogenetic trees were generated from these data using the maximum likelihood and distance methods for both the nucleotide and amino acid sequences. Topology tests were imposed on four trees for each gene using Tree-Puzzle (31) as follows: (i) the consensus tree, (ii) the best tree generated by Tree-Puzzle, (iii) the tree generated by exchanging the hypothetical LGT branch to the node represented in the four concatenated gene tree (swap tree), and (iv) a negative control tree generated by relocating the Salmonella bongori branch to the subsp. I node (unlikely to occur). Comparisons for the statistical analysis are against the best tree. The expectation, if LGT has occurred, is that the swap tree would fail all tests, as would the negative control. The best and consensus trees act as positive controls and should pass all tests.
The following tests were run on these trees and statistically evaluated: (i) a one-sided Kishino-Hasegawa test based on pairwise Shimodaira-Hasegawa tests (10, 14), (ii) a Shimodaira-Hasegawa test (32), (iii) an expected likelihood weight test (34), and (iv) a two-sided Kishino-Hasegawa test (14).
Mean distance matrices for each subspecies were calculated in MEGA 3.1 using the following model: codon:modified Nei-Gojobori (Jukes-Cantor); transition/transversion ratio = 2, uniform rates, and a no. of sites of 1,150. These were used to generate an average topology for the subspecies (see Fig. 4).
![]() View larger version (30K): [in a new window] |
FIG. 4. (a) Representation of topology tests based on Bayesian consensus and best trees for recA and mdh. Swap refers to branch relocation to concatenated tree location for topology test. Control indicates branch relocation of S. bongori to subsp. I branch. (b) Statistical analyses of Bayesian trees testing the hypothesis of a recA and mdh LGT event. Cons, consensus tree; best, best tree; swap, tree with relocation of branch based on concatenated tree; control, tree relocation of Salmonella bongori branch to subsp. I.
|
![]() View larger version (20K): [in a new window] |
FIG. 5. Neighbor-joining tree generated from the average subspecies distance matrix. Dates labeled on each node are represented in millions of years ago.
|
Nucleotide sequence accession numbers. GenBank accession numbers of deposited sequences are as follows: recA, DQ644868 to DQ644934; mdh, DQ644734 to DQ644800; phoP, DQ644801 to DQ644867; and gapA, DQ644634 to DQ644700.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Description of the four genes analyzed in this studya
|
![]() View larger version (48K): [in a new window] |
FIG. 2. Phylogeny based on individual gene sequences. Bayesian consensus of 900 trees from the individual nucleotide sequences. Support values of all major nodes are listed; internal branches with values higher than 0.75 are reported. Bars, 0.1 substitutions per site.
|
![]() View larger version (28K): [in a new window] |
FIG. 3. Phylogeny based on concatenated gene trees. Bayesian consensus of 900 trees representing the phylogeny of Salmonella based on the four housekeeping genes, recA, mdh, phoP, and gapA. The node support value represented by an asterisk denotes the weakly supported node prior to removal of the subspecies IIIa recA sequence.
|
The individual tree for mdh demonstrated clustering of subsp. IV and VII with subsp. II; however, this relationship was unsupported (Fig. 2b). This indicated a possible transfer of mdh between these subspecies. The four topology tests were performed on the individual mdh tree by swapping the subsp. IV and II branches to their predicted locations based on the concatenated tree. This analysis, performed on the mdh individual tree, failed to reject the null hypothesis of no LGT event (Fig. 4).
The recognition of an LGT event of recA implies that an ancestral lineage of the recA gene from subsp. IIIa existed and was displaced by an LGT event. In an attempt to locate an ancestral vertically acquired lineage of recA, we identified 21 other subsp. IIIa strains with unusual biochemical or MLEE properties showing similarities to subsp. IV or Salmonella bongori that had been described either by the National Salmonella Reference Laboratory at the CDC or by Reeves et al. (27). We found that all 21 subsp. IIIa isolates had obtained the laterally transferred recA sequence from the subsp. IIIb lineage (unpublished observations).
Because the recA gene was transferred from subsp. IIIb to subsp. IIIa, the recA sequence data from all serotypes in subsp. IIIa were removed from the concatenated analysis, and the resulting tree was reestimated (Fig. 3). With the removal of the recA sequence from the concatenated sequences of subsp. IIIa, both neighbor-joining and Bayesian methods generated a consensus tree with strong support at the major nodes for each subspecies. This consensus tree also demonstrates that for all serotypes tested within each subspecies, each resides in the clade with the other members of the same subspecies. This tree supports Salmonella bongori as the most ancestral lineage of the Salmonella when rooted with E. coli and Shigella flexneri sequences. Subspecies IIIa was the earliest diverging lineage of the Salmonella enterica species, followed by subsp. IV. Subspecies VII was found in all analyses as a sister group, close to subsp. IV, as previously described by Boyd et al. (2).
This analysis also resolved the four diphasic subspecies, IIIb, II, VI, and I, as being monophyletic and separate from the three monophasic subspecies, IIIa, IV, and VII, with Salmonella bongori representing the earliest diverging lineage of the Salmonella.
Evolutionary dates. Given the importance of the relationship of the subspecies to specific niches, we attempted to relate the major steps in the evolution of Salmonella to identifiable time periods in geologic history. Each major subspecies node in the concatenated tree topology was labeled with the approximate date of divergence based on the substitution rate from Berg and Martelius (1) as well as the rate generated in this study of 6.32 x 10–10 (Fig. 5). Both rates were set to a reference point of the divergence of Salmonella and Escherichia coli from their common ancestor at the accepted date of 100 million years ago (MYA) (7, 19). These estimates predict that the divergence of Salmonella enterica from Salmonella bongori was between 40.0 and 63.4 MYA (during the Eocene period); subsp. IIIa diverged between 21.5 and 34.0 MYA, and subsp. IV diverged from the diphasic subspecies between 14.2 and 22.4 MYA (Fig. 5) during the Miocene epoch.
|
|
|---|
Many previous studies have tried to clarify the complex phylogeny of the species of Salmonella. The phylogeny defined in our study was in close agreement with the tree based on the invasion genes in Boyd et al. (2), with the exception of the relationships among diphasic subsp. I, II, VI, and IIIb (Fig. 1c). The housekeeping gene tree of Boyd et al. had one branch difference where subsp. II and IIIb formed a clade in the previous study. This study confirms the clear separation of the monophasic and diphasic Salmonella subspecies and resolves conflicts among previous studies.
The specific phenotypic change from the monophasic to diphasic state is the result of the acquisition of the hin and fljBA flagellin operon (33). The MLEE data from Boyd et al. (2) result in a tree that places subspecies I, VI, IIIa, and IIIb together, separate from II, IV, and VII. If correct, this would suggest that the diphasic Salmonella subspecies would have evolved either twice or earlier than reported otherwise. The study by Porwollik et al. (25), based on the presence or absence of genes using microarray methods, presents a tree where subsp. IIIa is in close proximity (a sister group) to the diphasic subspecies, as opposed to subsp. IIIa, which represents a basal lineage of Salmonella enterica among the monophasic subspecies. This finding may be reflective of our finding of a transfer event of recA from subsp. IIIb to subsp. IIIa and perhaps additional genes from that region or other parts of the genome have been transferred between these subspecies.
In our first analysis of the concatenated gene sequences, the node of the tree separating subsp. IIIb and other diphasic subspecies was consistently unsupported, which suggested that one or more of the genes might have inconsistencies in its evolutionary history. We determined the topology using Bayesian and neighbor-joining methods for the individual trees, using the nucleotide sequences and possible combinations of the genes to ascertain potential horizontal transfer events. The tree topology for both individual trees for recA and mdh were inconsistent with the concatenated trees. To conclude if horizontal exchanges had occurred, we used branch-swapping topology tests (Fig. 4). recA was determined to have transferred from subsp. IIIb to subsp. IIIa, while mdh was not supported as a recent transfer. Identifying a probable lateral transfer event of the recA gene from subsp. IIIb to subsp. IIIa demonstrates that these events may happen within Salmonella, and care should be taken when inferring any phylogeny based on a limited number of genes.
The resulting consensus tree (Fig. 3) gives a comprehensive picture of the phylogeny of the Salmonella subspecies. The dates we infer (Fig. 5) also prompt new questions as to the relationships of those subspecies to the new niches that may have opened during that time period. An example of this is the issue of Salmonella's acquisition of the second flagellin operon at the divergence point of subsp. IIIb from subsp. IV. According to the evolutionary dates calculated by this study, the acquisition of the second flagellin operon occurred during the Miocene epoch, which is characterized by the rapid expansion of the grasslands and hoofed mammals (20). The origin of this evolutionary novelty may have opened a series of new niches for Salmonella.
The isolates in this study were selected based on frequency of human infections as well as an attempt to balance representation between subspecies and the two species of Salmonella. Other studies have used the SARCs A, B, and C. Our study also chose to examine if the phylogenetic information realized from the SARCs would hold true with a more expanded collection. A limitation of this study is that all of these isolates were human clinical isolates submitted to the Centers for Disease Control and Prevention and Centro Nacional Microbiologica, Spain, and limited to these geographic locations. Further study of a globally representative set of serotypes may lead to interesting further conclusions. Serotypes that are rarely associated with human infection are underrepresented and may limit illustration of the true history of Salmonella. It is possible that the subsp. IIIb transfer of recA to subsp. IIIa may have occurred in only one lineage and survived among strains that more frequently infect humans. We subsequently examined the recA allele from 21 other subsp. IIIa isolates with atypical biochemical properties or MLEE patterns and from nonhuman sources. This pursuit for an ancestral recA for subsp. IIIa was not successful, as all 21 isolates contained the subsp. IIIb-originated allele. It remains to be seen if an allele from subsp. IIIa containing its ancestral recA gene survived to the present time, though we cannot exclude the possibility that such a lineage persists undetected in remote niches and never appears in humans, even incidentally.
This study demonstrates the phylogeny of Salmonella species and subspecies using the sequences of four housekeeping genes. After correcting for a single previously unsuspected LGT event, a strong consensus phylogenetic tree emerged. This tree resolves the previously observed variant trees generated with smaller strain collections and other methods. This consensus phylogeny indicates that the species and subspecies of Salmonella are evolving as separate lineages and that the intrasubspecies similarities are due to their common ancestry. Intragenic transfer and LGT within subspecies may play a role in the isolation and divergence of the subspecies as described by Brown et al. (3); however, recombination within Salmonella subspecies was not assessed in this study.
The information presented here will serve as a template of the salmonellae for further studies investigating when other major evolutionary events occurred in the history of Salmonella.
Published ahead of print on 29 August 2008. ![]()
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
B.C.W. and J.D. contributed equally to this study. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»