Previous Article | Next Article ![]()
Journal of Bacteriology, April 2009, p. 2864-2870, Vol. 191, No. 8
0021-9193/09/$08.00+0 doi:10.1128/JB.01581-08
Copyright © 2009, American Society for Microbiology. All Rights Reserved.
,
Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona 86011-4073,1 Translational Genomics Research Institute, Phoenix, Arizona 85004,2 Microbial Program, Joint Genome Institute, Walnut Creek, California 94598,3 Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California 94550,4 Center for Microbial Ecology and Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan 48824,5 Idaho National Laboratory, Idaho Falls, Idaho 83415,6 National Biodefense Analysis and Countermeasures Center, Frederick, Maryland 21703,7 Bioscience Division, Los Alamos National Laboratory, Los Alamos, New Mexico 875458
Received 7 November 2008/ Accepted 23 January 2009
|
|
|---|
|
|
|---|
Determining the relationships among Brucella species is essential to understanding its ecology, evolutionary history, and host relationships and for developing accurate genotyping methods. Multilocus sequence typing, which assesses single nucleotide polymorphisms (SNPs) and other mutations in housekeeping genes, has revealed considerable variation among Brucella isolates that is taxonomically informative (48). Single SNPs can then be used to identify Brucella species because they are evolutionarily stable and can be incorporated into genotyping methods (13, 14, 41). Multilocus sequencing, however, does not capture enough variation in many species because conserved genomes often have too few polymorphic loci. Highly resolved phylogenies therefore depend on many loci, particularly in highly conserved genomes such as the brucellae.
Fortunately, the ability to create highly accurate, high-resolution phylogenies is rapidly increasing with ongoing developments in new sequencing technologies (17). Because of the relatively small size of their genomes, whole-genome phylogenies for bacteria show the greatest immediate potential for deciphering evolutionary histories at the species or genus level (2, 12, 19, 37). Rather than drawing phylogenetic inferences from a small portion of the genome, entire genomes can now be compared. Moreover, such in-depth work on a single genus differs from studies attempting to draw the tree of life using many but phylogenetically diverse genomes because of the far greater extent of genome coverage from SNP comparisons in similar genomes. Traditional whole-genome phylogenies involve comparisons of homologous genes (10, 27, 42). Among closely related species, SNPs appear to be a better choice for phylogenetics because of their coverage of the entire genome, relative stability over evolutionary time, ease of comparison, and inclusion of intergenic regions (4, 33). The sheer number of SNPs present between the genomes of closely related species can provide hundred to thousands of characters for phylogenetic reconstructions to resolve problems associated with character state conflict and create topologies with fine resolution. Furthermore, selecting only orthologous SNPs rather than including paralogous SNPs for analysis improves phylogenetic inference. Currently, whole-genome comparisons have only been done on a limited scale in Brucella, involving comparisons of two to three genomes (5, 9, 18, 36).
We compared the whole genomes of 13 Brucella isolates of five species: five genomes of Brucella suis from four of the five recognized biovars, three B. melitensis genomes from each of the three recognized biovars, three B. abortus genomes from the most widespread biovar, and one each from B. canis and B. ovis. We utilized only orthologous SNPs that were shared among all genomes. The phylogeny of these genomes was rooted with the closely related soil bacterium, Ochrobactrum anthropi, to polarize each SNP into ancestral or derived states. Finally, in pairwise comparisons of the genomes we utilized a molecular clock based on the accumulation of synonymous mutations to assess the relative age of the genus and divergence times of each species. The present study provides a solid and comprehensive phylogenetic framework that will serve as the basis for a detailed understanding of the evolution and ecology of Brucella, which is crucial for research in nearly all aspects Brucella biology.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Brucella genomes and O. anthropi outgroup used in phylogenetic comparisonsa
|
To root the phylogeny, we performed the same comparison procedures but also included the O. anthropi genome. This allow us to polarize the Brucella SNP characters and precisely identify the most basal taxon (i.e., the root) of the phylogeny. We recognize that the choice of O. anthropi as an outgroup may affect which taxon is most basal due to potential issues of long-branch attraction (21), but it is the most closely related species that is currently known. The Brucella phylogeny itself was constructed using only SNPs shared among all Brucella genomes, increasing the number of shared loci and therefore allowing more detailed and accurate depiction of topology and branch lengths within the genus. Two genomes of B. suis are from the same isolate but were sequenced by two different labs with different sequencing strategies, providing a direct comparison of the sequencing/assembly approaches and a validation of our SNP discovery technique.
Phylogenetic reconstructions. We generated a matrix of the SNP state for each genome that included the SNP position in the B. melitensis 16M genome as a reference and a mismatch cutoff value that indicated the proximity of the closest SNP. For the phylogeny, we excluded all SNPs with a mismatch cutoff of eight bases, meaning that if there were two SNPs within eight bases of each other, neither SNP was included in the analyses. This cutoff level allowed for the exclusion of potential sequencing errors typical in pyrosequencing such as issues with homopolymeric repeats and also excluded potential alignment errors, but it allowed retention of the majority of the data. As discussed more fully in Results, this mismatch cutoff did not affect the topology of the tree. We generated a nexus text file of concatenated SNP sequence for each sample. We analyzed the aligned sequence by using the neighbor-joining (NJ) and maximum-parsimony (MP) algorithms in PAUP* (43). The best substitution model was selected by using ModelTest (38). We used the following conditions for the analyses with the substitution model and parameters selected by ModelTest: NJ, general time reversible model, MP, full heuristic search with a random seed, and 1,000 bootstrap repetitions.
Molecular clock. Estimation of the rate of evolution for a molecular clock requires knowledge of the number of synonymous SNPs (sSNPs), the number of potential sSNP sites, the mutation rate, and the number of generations per year. We first pared down the sequences of the genomes to include only coding regions, using genes from the B. melitensis 16M genome as the reference. The potential sSNP sites were calculated by first finding all three-base codons in the genes and determining which SNPs did not result in an amino acid change. All SNPs from potential SNP sites within these regions were summed to give a total number of sSNPs. The potential sSNP sites for each codon were calculated from a lookup table of codon possibilities and added together to give the number of potential synonymous SNP sites for all codons in the sequence. We chose sSNPs because presumably they are selectively neutral or nearly neutral and therefore allow for a relatively unbiased estimation of SNP accumulation.
We then made pairwise comparisons between all genomes, with the absolute base count being the total number of bases in all of the genes included in the pairwise comparisons and a filtered base count that included only bases of the genes that are shared, excluding indels. Thus, the SNPs used in these comparisons were slightly different than those used in the phylogeny because of the different requirements for SNP inclusion. The following equation was used to roughly determine the age of divergence for each pairwise comparison: the number of sSNPs/(the number of sSNP sites x the mutation rate x the number of generations per year x 2).
We used a synonymous mutation rate of 1.4 x 10–10 mutations per base pair per generation based on mutation rates from Escherichia coli (26). Age estimates are sensitive to the mutation rate because mutation rate estimates can exhibit considerable variation. The number of generations per year of Brucella species in natural hosts is not known, so we have given a range of possible generation times from 50 to 150, with the actual value yet to be determined in the wild. We recognize that this also introduces variation in the age estimates but that these values are between 22 and 43 generations per year in Bacillus anthracis (44) and between 100 and 300 generations per year in E. coli (34), which is consistent with Brucella biology. The "2" in the denominator of the equation is introduced to account for the time to divergence of the two genomes (1).
|
|
|---|
9,000 SNPs among the Brucella species. Phylogenetic analysis including O. anthropi indicated that the B. ovis lineage was the first to split from the rest of the Brucella. Therefore, B. ovis was used to root subsequent trees constructed using only Brucella genomes. The exclusion of Ochrobactrum for SNP discovery within brucellae reduced homoplasy and yielded more SNPs for resolution of the genus. Alignments of the 13 Brucella genomes yielded 20,154 SNPs that were present in all genomes (Table 2). Of this total, 16,803 SNPs were in coding regions, and 3,351 were in noncoding regions. At least 1,398 SNPs were found on a different chromosome in at least one of the genomes (excluding B. suis 686, which has one genome). The reduced data set with a mismatch cutoff of eight bases (i.e., ignoring SNPs that are within 8 bp of one another) gave 17,032 SNPs, 9,021 of which were parsimony informative. In this data set, the incidence of homoplasy or possible sequencing error was extremely low (homoplasy index = 0.0104), excluding SNPs found only on terminal branches. The resulting Brucella phylogenetic tree shows strong differentiation by species (Fig. 1). Phylogenetic trees drawn with data having SNP mismatch cutoffs of 0 to 30 gave an identical topology but with slightly different branch lengths (data not shown), indicating that selecting a mismatch cutoff of 8 bp did not affect relationships. Only one possible tree emerged in NJ and MP analyses. Bootstrap support for MP was 100% for all nodes within the Brucella data set. With the O. anthropi outgroup in the analysis, support for the basal B. ovis clade was 99%. Maximum-likelihood analyses gave similar results (data not shown). |
View this table: [in a new window] |
TABLE 2. Number of SNPs defining the branches of Brucella phylogenya
|
![]() View larger version (14K): [in a new window] |
FIG. 1. Rooted phylogeny of the genus Brucella, including 13 genomes of five species. Tree was constructed by using neighbor joining. MP analysis gave the same topology, and percent bootstrap support based on 1,000 repetitions is shown at each node. The outgroup O. anthropi was used to root this tree but is not shown because of the long branch length.
|
|
View this table: [in a new window] |
TABLE 3. Mean divergence times in years since the last common ancestor for five Brucella speciesa
|
|
|
|---|
SNP genotyping and analysis. Our SNP discovery in these genomes allows for thousands of potential assays to differentiate between the various species. For example, we identified as many as 253 SNPs that distinguish B. canis 23365 from its closest sequenced relative B. suis 40, which includes the previously identified distinguishing mutation in outer membrane proteins (47), that could be used as targets for assays. For most branches, the SNPs defining them will be redundant and interchangeable for genotyping. Due to the extremely low rate of mutational change in SNPs, only a single SNP is necessary to define a particular clade and can then be designated as a canonical SNP (23).
The SNP data set identified in the present study contains a decided lack of evidence for recombination among Brucella species. Using the full data set with no SNP mismatch cutoff, we had one major pattern of shared SNPs (n = 248) that was inconsistent with the phylogeny. The following isolates grouped together: B. abortus 2308 and 9-941, B. canis 23365, B. ovis 25840, and B. suis 1330 and Thomsen. Notably, B. suis 23445 did not fall into this same group even though it is the same strain as Thomsen. None of these SNPs were retained with a mismatch cutoff of eight bases, and we know of no biological mechanism that would cause this pattern to occur. Lateral gene transfer from other organisms cannot be ruled out with this approach, although it is challenging to conceive a scenario where such anomalous results would be limited to a few taxa.
The three SNP differences between B. suis 23445 and B. suis Thomsen, the same type strain, are the result of either sequencing/alignment errors or mutations that have arisen during laboratory passage. Mutational differences from whole genome comparisons of the same strain are known to occur (45). In our case, the exact same archival isolate sample was not used. The true test is to sequence the exact same strain on different platforms (19). Nonetheless, such a small number of differences lends support to the accuracy of the 454 sequencing platform and to our SNP discovery methods.
Brucella phylogeny. Ever since early microbiological work by Wilson (50), researchers have been developing increasingly sophisticated methods of classifying Brucella species. However, despite technical advances in genotyping, most methods have been able to roughly generate the same evolutionary relationships seen in whole genome phylogenies. For instance, the close relationship of B. abortus and B. melitensis and the more distant grouping of B. suis was suggested by restriction mapping (30). The basal position of B. ovis in the Brucella phylogeny was suggested based on the likely inheritance of certain genes (11). Multilocus sequence typing trees of Brucella roughly approximate the whole-genome phylogeny but use only seven housekeeping genes (48). Variable number tandem repeat analyses correctly group and depict the taxonomic relationships of all of the major Brucella clades, such as the close relationship of B. suis biovars 3 and 4 to B. canis and the close but more distant relationship of B. suis biovar 1 (20, 25, 49).
Although each of these approaches has its value, particularly when low-cost genotyping is the goal, only whole-genome sequencing can capture the full extent of genetic variation. Furthermore, only whole-genome phylogenies allow us to gauge the accuracy of previous genetic methods. Understanding the evolutionary framework of the genus Brucella is essential for designing assays that differentiate the various strains or biovars, and only by rooting our phylogeny can we understand the directionality of the evolutionary process. Incorrect conclusions about the relationships among Brucella isolates have inevitably been made because all prior attempts at phylogenetic constructions using data with reduced sets of markers are less accurate approximations of the "true" phylogeny than can be deduced from whole-genome analysis.
B. suis is the most diverse species within the Brucella thus far examined. Exceptional diversity in this clade was expected because our data set contained B. suis from four of the five recognized biovars. Furthermore, a range of genetic analyses have indicated considerable diversification within the B. suis clade and have even suggested likely relationships among the biovars (14, 20, 25). Most studies looking at variation within B. suis have had difficulty differentiating isolates from those of B. canis (3, 6, 14, 15), suggesting the close relationship of these two species. Using whole-genome comparisons, it is clear that B. canis, B. suis 686 (biovar 3), and B. suis 40 (biovar 4) are all highly similar at the nucleotide level. The species B. canis appears to have arisen directly from a B. suis ancestor, making currently defined B. suis isolates paraphyletic. Therefore, no single DNA-based assay will be able to distinguish all of the isolates in B. suis from the other Brucella species because the paraphyly of B. suis will cause such assays to also identify B. canis. Early fragment analysis by Allardet-Servent et al. (3) using restriction endonucleases also suggested that B. canis likely evolved from a strain of B. suis. Interestingly, SNPs were able to readily resolve the relationship of B. suis biovar 3 to the other Brucella species and B. suis biovars even though it contains only one large chromosome rather than the two chromosomes seen in all other Brucella (22). Regardless of the genome arrangement, B. suis biovar 3 is a subclade of B. suis. The genome of the only B. suis biovar not included in our analyses, biovar 5 (strain 513), which was isolated from rodents in the former Union of Soviet Socialist Republics, is likely quite different genetically than the other four biovars (25, 49). Previous research has suggested that B. suis biovar 5 is most closely related to the brucellae of marine mammals (14, 28, 48), but whole-genome-based phylogenetic analyses are needed to confirm this hypothesis.
The radiation of the three recognized biovars of B. melitensis occurred rapidly. These three strains are now clearly differentiated, but diverged at roughly the same time, and have undergone considerable evolution since divergence. The B. abortus clade was minimally differentiated, but the three strains from biovar 1 represent only a small portion of the diversity within this species.
Extremely low levels of character conflict within the tree suggest that all alternate phylogenetic methods give the same topology and similar branch lengths, indicating that the results are not an artifact of the analysis algorithm. Low amounts of genetic variation in Brucella are likely due to the relative youth of the lineage, as well as the lack of evidence of lateral gene transfer among Brucella species, although a few genomic islands consistent with horizontal transfer from other bacteria have been observed (36, 39). The genetic isolation of Brucella species is a result of their limited ecological niche, with fastidious growth only in hosts, few known mechanisms of genetic exchange, and virulence restricted to one or a few hosts (30). Whether the degree of differentiation in Brucella warrants species status for each traditional group has been debated over the years. The data from whole-genome phylogenies presented here resolve this issue; Brucella species are reproductively isolated and, with the exception of B. suis and B. canis, constitute reciprocally monophyletic lineages, separated by relatively long branches within the genus, and thus all species, including B. canis, are deserving of species status. In fact, several biovars within B. suis may be categorized as additional species, although all would be identical based on 16S rRNA, the standard method of bacterial identification.
Age and origin of Brucella species. Previous whole-genome comparisons have indicated the close relationship of B. abortus and B. melitensis (5, 18), but the full phylogeny of the genus with B. ovis as the most basal species has not been previously described. Our rooted phylogeny suggests that brucellosis in animals such as pigs, goats, and cattle emerged from contact with infected sheep. Furthermore, this contact was recent, occurring roughly in the past 86,000 to 296,000 years. Our estimates, however, predate livestock domestication in the Middle East within the past 10,000 years (51), indicating that this disease was endemic within wildlife populations rather than emerging due to domestication. The coevolution of brucellae with their respective hosts (5, 31) is not consistent with the whole-genome phylogeny based on both topology and the likely rate of mutational change. For instance, it has been hypothesized that B. abortus and B. melitensis diverged roughly 20 million years ago with the divergence of their bovine and caprine (goats only) hosts, respectively (31). Similarly, the early differentiation of the genus has been speculated to have occurred 20 to 25 million years ago (29). However, the basal position of B. ovis in our phylogeny is distant from B. melitensis, even though their goat and sheep hosts are very closely related. Our data indicate a much more recent association, meaning independent acquisition of B. abortus, B. melitensis, and B. suis infections in their respective hosts after host speciation. Furthermore, Brucella as a genus is exceptionally monomorphic with relatively few SNPs, which strongly suggests that the entire lineage is considerably younger than previous estimates. Transmittal of brucellae from pigs to canids likely stemmed from infection of wolves or other canids feeding on the ancestor of B. suis 40 within the past 22,500 years. Why other Brucella species have not evolved within canids despite likely infections is unknown.
How the genomes of other Brucella species fit into the phylogeny described here will be extremely revealing for the evolutionary history of the genus. Our phylogeny and analyses provide the paradigm for phylogenetic differentiation among the Brucella. Genomes of other Brucella species such as B. neotomae, B. ceti, B. pinnipedialis, B. microti, and additional biovars of B. abortus will provide a more complete understanding of diversity and relationships in the genus; sequencing of these genomes is in progress (David O'Callaghan, unpublished data). In addition, sequencing of bacteria more closely related than O. anthropi will be able to resolve potential issues of long-branch attraction that can arise when distantly related taxa are used as outgroups (21). Among the many interesting avenues for future research in Brucella are the mechanisms of speciation. How did the various species adapt and become isolated in their respective hosts? Of particular interest is the relationship of marine and terrestrial brucellae, the timing of the emergence of the disease in marine organisms, and the evolutionary history of Brucella species that are currently limited to wildlife populations such as B. neotomae in wood rats, B. suis in caribou, and B. microti in voles. The genus also poses a challenge as to why the various brucellae have exhibited such distinct host preferences in some species but not in others.
We thank Jim Burans and the staff at the National Bioforensics Analysis Center for the 454 pyrosequencing data.
Use of product or trade names does not constitute endorsement by the U.S. Government.
Published ahead of print on 6 February 2009. ![]()
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»