Previous Article | Next Article ![]()
Journal of Bacteriology, April 2009, p. 2474-2484, Vol. 191, No. 8
0021-9193/09/$08.00+0 doi:10.1128/JB.01786-08
Copyright © 2009, American Society for Microbiology. All Rights Reserved.
,
James S. Beckstrom-Sternberg,1
Anders Johansson,3
Ashley Clare,1
Jordan L. Buchhagen,1
Jeannine M. Petersen,4
Talima Pearson,1
Josée Vaissaire,5
Michael P. Dempsey,6
Paul Foxall,7
David M. Engelthaler,2
David M. Wagner,1 and
Paul Keim1,2*
Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona 86011-4073,1 Translational Genomics Research Institute, Phoenix, Arizona 85004,2 Department of Clinical Microbiology, Infectious Diseases and Bacteriology, Umeå University, SE 901 85 Umeå, and Division of CBRN Defence and Security, Swedish Defence Research Agency, SE 901 82 Umeå, Sweden,3 Centers for Disease Control and Prevention, Fort Collins, Colorado 80521,4 Agence Française de Sécurité Sanitaire des Aliments, Laboratoire d'Etudes et de Recherches en Pathologie Animale et Zoonoses, 94700 Maison-Alfort, France,5 Division of Microbiology, Armed Forces Institute of Pathology, Washington, D.C. 20306,6 Affymetrix, Inc., Santa Clara, California 950517
Received 19 December 2008/ Accepted 10 February 2009
|
|
|---|
|
|
|---|
The four subspecies differ in their genetic diversity. The rare F. tularensis subsp. mediasiatica and F. tularensis subsp. novicida subspecies appear to contain significant diversity (18), but due to the scarcity of samples, their true genetic diversity remains unknown. The two dominant and highly pathogenic subspecies, F. tularensis subsp. tularensis and F. tularensis subsp. holarctica, differ substantially in their levels of genetic diversity. F. tularensis subsp. tularensis has two distinct, genetically differentiated subpopulations (A.I and A.II), with separate geographic distributions (11, 18, 19, 29, 30). This division in F. tularensis subsp. tularensis is apparent via a variety of molecular typing methods, including whole-genome single nucleotide polymorphism (SNP) analysis (19), multilocus sequence typing (30), ribotyping (12), regional difference (RD) analysis (8), pulsed-field gel electrophoresis (12, 29), amplified fragment length polymorphism (12, 13), canonical insertion-deletions (22), and multilocus variable-number tandem repeat (VNTR) analysis (MLVA) (11, 18). These subpopulations are correlated with different host and vector distributions (11, 19) and may differ in virulence (29). In addition, subpopulations A.I and A.II have been shown to contain substantial genetic diversity via the relatively high-resolution molecular methods of pulsed-field gel electrophoresis (29) and MLVA (11, 18). In contrast, very little genetic diversity has been identified within F. tularensis subsp. holarctica with any of the molecular methods outlined above. This lack of diversity, combined with F. tularensis subsp. holarctica's wide geographic distribution, has led several authors to suggest that this subspecies recently experienced a genetic bottleneck and expanded across the northern hemisphere (8, 11, 18, 19).
As a consequence, the lack of genetic diversity within F. tularensis subsp. holarctica has made defining population structure within this subspecies especially problematic. Although all evidence points to a recent global expansion of this subspecies, there is little information on this expansion. MLVA provides the greatest discrimination among F. tularensis subsp. holarctica isolates (18, 19), but highly mutable VNTR markers can be compromised for phylogenetic analyses due to the likelihood of convergent evolution and the resulting homoplasy (20). Indeed, MLVA of global F. tularensis subsp. holarctica isolates has identified a few cases where North American and Scandinavian isolates are similar, although the majority of the data indicated that these two populations are distinct (18). This could be due to the hypervariability and fortuitous convergent evolution of the markers used (homoplasy) or may represent multiple dispersals of F. tularensis to North America, Scandinavia, or both. A set of evolutionarily stable, nonhomoplastic molecular markers would provide the means for addressing such questions by defining deeper population structure within F. tularensis and, especially, within F. tularensis subsp. holarctica.
Whole-genome SNP analysis has been shown to be remarkably effective at defining population structure among highly clonal bacterial pathogens lacking much genetic diversity (1, 20, 26, 31), as long as the discovery bias inherent to this method is taken into consideration (26). Because of the vast amount of sequence coverage, this method has the capacity to discover thousands of SNPs, even among very recently emerged genetically homogeneous organisms (20, 26). Once an accurate population structure has been defined using these SNPs, canonical SNPs (canSNPs) that define each branch in the phylogeny (20), whether species specific (9, 10), major lineage specific (31), or strain specific (32, 34), can be selected. This approach maximizes the information obtained, while limiting the number of assays, by eliminating redundancy and can be combined, in a stepwise hierarchical fashion, with more-variable molecular markers to provide highly accurate and highly discriminating phylogenetic analyses (20). The ever-expanding availability of whole-genome sequences for comparison has made this approach feasible for a number of important bacterial pathogens, including F. tularensis.
In this study, we further investigated the phylogenetic structure of F. tularensis via three distinct SNP analyses that together maximized our genomic and isolate coverage: whole-genome SNP analysis, molecular inversion probe (MIP) SNP analysis, and canSNP analysis (Table 1). We discovered SNPs among 13 F. tularensis genomes and used them to construct a highly accurate phylogeny for F. tularensis. We used high-throughput MIP microarray technology to genotype 1,655 SNPs across 95 genetically and geographically diverse F. tularensis isolates to identify additional phylogenetic structure and designate new canonical groups within F. tularensis, with an emphasis on F. tularensis subsp. holarctica. Finally, we defined canSNPs for these groups and screened them across 496 genetically and geographically diverse F. tularensis isolates to identify the worldwide distribution of these groups. Altogether, our phylogenetic and geographic distribution analyses represent the most comprehensive description to date of the worldwide diversity and historical spread of F. tularensis.
|
View this table: [in a new window] |
TABLE 1. F. tularensis SNP analysis summarya
|
|
|
|---|
![]() View larger version (20K): [in a new window] |
FIG. 1. Summary flow chart of three SNP analyses used. The methods for the three principal SNP analyses (whole-genome SNP analysis, MIP SNP analysis, and canSNP analysis) are presented.
|
![]() View larger version (8K): [in a new window] |
FIG. 2. Whole-genome SNP phylogeny of 13 F. tularensis strains. Maximum parsimony was used to construct this phylogeny from 29,774 SNPs discovered among the whole-genome sequences of these 13 strains. Red branches indicate the branches along which SNPs were discovered for the MIP SNP analysis, as only eight whole-genome sequences were available for SNP discovery when the MIP SNPs were selected. This analysis was highly robust due to the large number of characters and the very high consistency index of >0.99.
|
A total of 3,726 putative SNPs were selected for inclusion in the MIP probe panel on the basis of their potential for identifying new subclades and the available space on the microarray (Fig. 1). MIP assay analysis was performed as previously described (15, 16) on whole-genome amplification (WGA) products (Qiagen, Valencia, CA) from 95 genetically and geographically diverse F. tularensis isolates (Fig. 1; also see Table S1 in the supplemental material). Briefly, 4.5 µl of a 1/10 dilution of WGA product was added to the MIP reaction mixture containing the 3,726 F. tularensis probes. Probes were annealed to target DNA, starting at 95°C and ramping slowly to 58°C (–0.2°C every 5 min), and then held at 58°C for a total annealing time of 16 to 24 h. The mixture was then split into four tubes for nucleotide-specific gap filling, ligation, exonuclease digestion of nonligated probes, probe cleavage, amplification, and labeling according to the manufacturer's specifications. Labeled probes were repooled and hybridized to the tag arrays overnight at 39°C. Tag arrays were washed and stained according to the manufacturer's specifications and scanned using a GeneChip AT charge-coupled-device imager, and the data were analyzed using Affymetrix GeneChip targeted genotyping analysis software. As this system was designed for diploid organisms, the resulting genotype data consisted of "homozygous" SNP calls for either of two possible states, undetermined calls and "heterozygous" SNP calls. "Heterozygous" results were treated as undetermined for this haploid organism.
Genotype data from polymorphic MIP SNPs with
80% call data across the 95-isolate screening panel were used to construct a phylogeny detailing subclades discovered among the screened isolates (Fig. 1 and 3). Isolates with missing data were assigned SNP states for those SNPs with undetermined genotype calls. These SNP calls were consistent with the SNP states observed in other redundant SNP markers. This approach was justified given the very low homoplasy observed in the whole-genome sequence SNP analysis and the clonality of F. tularensis. This approach ensures that the overall length of the resulting phylogeny is conserved and not inflated due to missing data being assigned values that would increase the homoplasy. The phylogeny was constructed using the software package PAUP 4.0b10 (D. Swofford, Sinauer Associates, Inc., Sunderland, MA). MIP SNPs within this subset that exhibited homoplasy or theoretically impossible derived states (i.e., unique derived states in nondiscovery strains) (26, 36) were eliminated from the phylogenetic analysis given their unlikelihood and the impracticality of confirming the results via another method. Undetermined and synapomorphic MIP SNPs for which ancestral and derived groups could not be identified, due to undetermined results in root isolates, were also eliminated from the phylogenetic analysis (Fig. 1). Undetermined MIP SNPs were defined as SNPs with no observed derived states across the 95 screened isolates (derived states could have existed among isolates with missing data) or SNPs for which a large percentage of likely "derived isolates" were missing data. These steps were taken to ensure the most conservative approach for subclade discovery. Branch and subclade naming followed the approach suggested by Van Ert et al. (31), in which terminal groups containing discovery strains are referred to as subclades named for the discovery genome and intervening nodes are referred to as subclades named for their neighboring branches. Branches are named based upon their major group affiliation (A, A.I, A.II, B, M, or N for the different subspecies and two major subpopulations of F. tularensis subsp. tularensis) and an arbitrary three-digit number.
![]() View larger version (23K): [in a new window] |
FIG. 3. MIP SNP phylogeny indicating canSNPs, clades, and subclades. Stars indicate terminal subclades defined by one of the eight F. tularensis genomes used for MIP SNP discovery. Circles represent collapsed branch points along the lineages that contain subgroups of isolates. These subclades are named for the two flanking canSNPs. canSNP names and positions are indicated in red. Branch A.II.Br.006 has a branch length of 1 SNP but is presented as a longer, dashed line to avoid obscuring other parts of the tree.
|
MLVA. All 496 F. tularensis isolates were also genotyped using an 11-marker MLVA system (33). This was done in order to determine the level of genetic diversity within each identified subclade.
Geographic distribution of clades and subclades. We mapped the worldwide distributions of the 16 synapomorphic F. tularensis subsp. tularensis and F. tularensis subsp. holarctica subclades that we identified to determine their genetic geographic patterns (Fig. 1). This was done using color codes for each subclade and corresponding pie charts for each major geographic region that indicated both the number of isolates from a given region and the proportion of each subclade found within that region.
|
|
|---|
Whole-genome SNP analysis. Comparisons among 13 F. tularensis whole-genome sequences led to the discovery of 29,933 SNPs within this species (Fig. 1). We used 29,774 of these SNPs to construct a detailed and robust phylogenetic hypothesis for these diverse strains (Fig. 1 and 2). Consistent with a previous analysis (19), F. tularensis subsp. tularensis and F. tularensis subsp. mediasiatica are sister taxa that split at a deeper node into their own lineage whereas F. tularensis subsp. holarctica occupies a lineage by itself. The A.I and A.II subpopulation division within F. tularensis subsp. tularensis is clearly visible, with limited diversity between the two strains included in each of these subpopulations. The single Japanese F. tularensis subsp. holarctica genome (FSC 022) is genetically distinct from the other F. tularensis subsp. holarctica strains and diverged from the lineage leading to the remaining F. tularensis subsp. holarctica isolates (Fig. 2). This is consistent with previous findings of its divergence from other F. tularensis subsp. holarctica isolates (8, 18, 30); this supports the suggestion that Japanese F. tularensis subsp. holarctica isolates may constitute a separate subspecies, identified as Francisella tularensis subsp. japonica (24). F. tularensis subsp. novicida is genetically distant from the other F. tularensis subspecies and was used only to root the tree in accordance with previous analyses (19).
MIP SNP analysis. In order to further refine the phylogenetic topology of closely related strains, we used 1,655 SNPs to genotype 95 genetically and geographically diverse F. tularensis isolates by using a MIP assay. These SNPs represented a subset of the total SNPs discovered among eight F. tularensis whole-genome sequences that were available at the time of MIP SNP discovery. This is one of the first reports of MIP technology applied to pathogens, indicating that this technology is a viable approach for dense SNP genotyping of prokaryotic populations and for the use of WGA templates.
Discovery bias is a major consideration when limited but comprehensive whole-genome sequence discovery is coupled with high-capacity genotyping methods. Whole-genome sequence SNP discovery will discover only SNPs that occur on the evolutionary path between any two compared genomes (26, 36). Hence, our MIP SNP tree, based upon SNPs discovered via pairwise comparisons among eight discovery genomes, represents a combination of 28 linear (pairwise) phylogenies and possesses eight termini (Fig. 3). We also identify 13 intervening nodes (Fig. 3), which group isolates together at the point where they started their own evolutionary pathways. These secondary branches are collapsed because their independent evolutionary history was not captured during the MIP SNP discovery process (Fig. 2 and 3) (26, 36). As expected, the MIP SNP tree topology parallels that of the whole-genome sequence SNP tree seen in Fig. 2, except for branch lengths (Fig. 2 and 3). Branch lengths in Fig. 2 are highly accurate since they are based upon all SNPs discovered among the 13 strains. Because only a subset of the whole-genome SNPs were screened using the MIP assay (Fig. 1), the MIP SNP tree contains inaccurate branch lengths. However, the relative positioning of subclades and overall topology are highly accurate.
Novel genomes can easily be placed onto the phylogenetic tree by in silico genotyping of the SNPs used in this analysis. For example, FSC 022, the Japanese F. tularensis subsp. holarctica whole-genome sequence that was publicly released after the MIP design and not included in the MIP SNP discovery process, is a member of subclade B.Br.001/002 on the MIP SNP tree (Fig. 3; also see Table S1 in the supplemental material). This is the subclade representing the collapsed Japanese F. tularensis subsp. holarctica (F. tularensis subsp. japonica) branch that can be observed in Fig. 2. Similarly, the F. tularensis subsp. novicida U112 genome is categorized with the F. tularensis subsp. novicida subclade (N) on the MIP SNP tree (Fig. 3; also see Table S1 in the supplemental material). Three other examples are the FSC 200 and 257 strain genomes, which can be categorized as members of subclade B.Br.013/014, and the FSC 033 strain genome, which belongs to subclade A.I.Br.001/002 (Fig. 3). When additional F. tularensis genome sequences become available, they could be placed into this phylogenetic structure in a similar manner.
Three of the eight terminal subclades, B.Br.LVS, A.II.Br.WY96-3418, and A.II.Br.ATCC 6223, were found to contain only the corresponding discovery strain after the MIP SNP analysis across the 95 genetically and geographically diverse strains (Fig. 3; also see Table S1 in the supplemental material), indicating that the SNPs identifying these terminal groups were likely strain specific. As such, SNPs for these branches (A.II.Br.005, A.II.Br.007, and B.Br.014) (Fig. 3) were not converted into high-throughput assays or screened against the larger, 496-strain panel. Potential strain-specific SNPs were not identified for terminal subclades B.Br.OR96-0246, B.Br.FTNF002-00, and M.Br.FSC 147, since the corresponding discovery strain was not included in the MIP assay screening panel (Fig. 3; also see Table S1 in the supplemental material). Although the corresponding discovery strain was included in the MIP assay screening panel for terminal subclades B.Br.OSU18 and A.I.Br.SCHU S4 (Fig. 3; also see Table S1 in the supplemental material), the use of an incomplete sequence and the lack of a second A.I strain at the time of MIP SNP discovery, respectively, may have hampered the discovery of potential strain-specific SNPs for these two strains.
canSNP and MLVA genotyping analyses. Figures 4 and 5 graphically depict the distribution of 484 of the 496 isolates in 16 subclades of F. tularensis subsp. tularensis and F. tularensis subsp. holarctica identified in the MIP SNP analysis (Fig. 4 and 5, columns N). The remaining 12 isolates were either F. tularensis subsp. mediasiatica or F. tularensis subsp. novicida, were members of subclade M.Br.FSC 147 or N, respectively (see Table S1 in the supplemental material), and are not depicted in Fig. 4 or 5. Subclade placement for the 496 isolates was determined by their canSNP genotypes across 23 canSNPs (see Table S1 in the supplemental material). canSNP genotypes for the 95 isolates screened using the MIP assay were consistent with the available MIP assay results for those SNPs, although some missing MIP assay data prevented a full comparison. The position of each of the 23 canSNPs is illustrated in Fig. 3, and the canSNP genotype for each of the 18 synapomorphic subclades is depicted in Table 2. canSNPs for branches A/M.Br.001, M.Br.001, A.Br.001, A.I.Br.001, A.II.Br.001, and B.Br.001 (Fig. 3) were identified elsewhere and were genotyped using TaqMan minor groove binding allelic discrimination assays (D. Birdsell, A. J. Vogler, D. M. Wagner, and P. Keim, unpublished data). canSNPs for branches A.I.Br.002, A.II.Br.002-004, A.II.Br.006, and B.Br.002-013 (Fig. 3) were selected from among the MIP SNPs and were genotyped using the Melt-MAMAs described here (see Materials and Methods). As mentioned previously, canSNPs for branches A.II.Br.005, A.II.Br.007, and B.Br.014 (Fig. 3) were not selected or screened, due to the probable strain-specific nature of these SNPs.
![]() View larger version (23K): [in a new window] |
FIG. 4. Distribution of F. tularensis subsp. tularensis subclades throughout North America. The F. tularensis subsp. tularensis portion of the canSNP phylogeny from Fig. 3 is presented along with a map indicating the frequencies and geographic distribution of F. tularensis subsp. tularensis subclades across North America. The number of isolates (N) and number of MLVA genotypes (G) within each subclade are indicated. Subclades are color coded to facilitate mapping. For the purposes of mapping and determining isolate and MLVA genotype totals, strains belonging in strain-specific subclades (A.II.Br.ATCC 6223 and A.II.Br.WY96-3418) are included within the subclade immediately basal to their actual subclade. Colors within the mapped pie charts correspond to the subclade color designations. For explanation of stars and circles, see the legend to Fig. 3.
|
![]() View larger version (31K): [in a new window] |
FIG. 5. Worldwide distribution of F. tularensis subsp. holarctica subclades. The F. tularensis subsp. holarctica portion of the canSNP phylogeny from Fig. 3 is presented along with a map indicating the frequencies and geographic distribution of F. tularensis subsp. holarctica subclades throughout the world. The number of isolates (N) and number of distinct MLVA genotypes (G) within each subclade are indicated. Subclades are color coded to facilitate mapping. For the purposes of mapping and determining isolate and MLVA genotype totals, the strain belonging in the strain-specific subclade B.Br.LVS is included within the subclade immediately basal to its actual subclade, B.Br.013/014. Colors within the mapped pie charts correspond to the subclade color designations. Gray regions indicate the known distribution of F. tularensis, by country (19). For explanation of stars and circles, see the legend to Fig. 3.
|
|
View this table: [in a new window] |
TABLE 2. F. tularensis canSNP genotypesa
|
Geographic distribution of clades and subclades. Figures 4 and 5 depict the geographic distribution of each of the subclades. North America, Sweden, and France are well represented in this study, with scattered representatives from other locations. Despite these sampling biases, major genetic and geographic trends are still apparent.
The subclades identified within the A.I and A.II subpopulations (two and four, respectively) showed no additional geographic associations, just the previously described eastern North American/Californian and western North American mountain trends for the A.I and A.II subpopulations, respectively (Fig. 4) (11, 19, 29). The two subclades A.I.Br.SCHU S4 and A.I.Br.001/002 are found throughout the eastern United States and in California. Subclade A.I.Br.001/002 is also found in Alaska and British Columbia, Canada, on the basis of the limited samples available from these locations (Fig. 4; also see Table S1 in the supplemental material). The presence of both of these subclades in California and the eastern United States is consistent with previous hypotheses that F. tularensis subsp. tularensis may have been introduced to these areas from the south central United States, where both A.I subclades are also found (Fig. 4) (11). These hypotheses are based upon a center-of-diversity argument wherein the geographic region with the greatest observed genetic diversity (the south central United States) is assumed to be the location where the tested group (A.I) originated, diversified, and then dispersed from. Two Slovakian isolates, the only known F. tularensis subsp. tularensis isolates from outside North America (14), also fell in subclade A.I.Br.SCHU S4 (see Table S1 in the supplemental material). This is consistent with previous reports regarding the close resemblance of these isolates to strain SCHU S4 (6, 18).
Subclades within subpopulation A.II exhibited slightly stronger geographic associations than those within subpopulation A.I. This is consistent with previous findings of a positive correlation between genetic and geographic distance for this subpopulation (11). The basal subclade to the strain-specific subclade A.II.Br.ATCC 6223, A.II.Br.004/005, appears restricted to Utah, the origin of strain ATCC 6223 (Fig. 4; also see Table S1 in the supplemental material). Most of the subclade A.II.Br.003/004 isolates are from California, although single isolates from Nevada, Texas, and Ontario, Canada, also belong in this subclade (Fig. 4; also see Table S1 in the supplemental material). Finally, A.II.Br.001/002 and A.II.Br.006/007 (which includes subclade A.II.Br.WY96-3418) lack any strong geographic associations, being found throughout the geographic range of subpopulation A.II (Fig. 4).
We discovered 10 subclades within F. tularensis subsp. holarctica, most of which show geographic correlations (Fig. 5). The most basal F. tularensis subsp. holarctica subclade, B.Br.001/002, contains Japanese F. tularensis subsp. holarctica isolates. The next subclade, B.Br.002/003, contains two isolates from California and represents a previously unknown group within F. tularensis that is genetically distinct and diverged between the Japanese F. tularensis subsp. holarctica isolates and the terminal F. tularensis subsp. holarctica clade (Fig. 5; also see Table S1 in the supplemental material). The remaining F. tularensis subsp. holarctica subclades are closely related and likely reflect the radiation event (hereafter referred to as the B radiation) wherein F. tularensis subsp. holarctica dramatically spread throughout the northern hemisphere (Fig. 5). Subclade B.Br.FTNF002-00 corresponds to a previously described clone of F. tularensis subsp. holarctica found in France and the Iberian Peninsula (7). Interestingly, the subclade immediately basal to subclade B.Br.FTNF002-00, B.Br.010/011, contains the same Swedish isolate (F0228) (Fig. 5; also see Table S1 in the supplemental material) that was found to share RD23 with French and Iberian Peninsula isolates (7), making RD23 specific for branch B.Br.010 (Fig. 3). Subclades B.Br.012/013 and B.Br.013/014 (which includes subclade B.Br.LVS), marked by branch B.Br.012 (Fig. 3), are found in Sweden and Eurasia, respectively (Fig. 5). Subclades B.Br.008/009 and B.Br.OR96-0246, marked by branch B.Br.008 (Fig. 3), and subclade B.Br.OSU18 are found in North America (Fig. 5). Subclade B.Br.OSU18 is not specific to North America, however, because it contains seven Scandinavian isolates as well (Fig. 5; also see Table S1 in the supplemental material). Likewise, subclade B.Br.007/008, basal to subclades B.Br.008/009 and B.Br.OR96-0246, is found in Scandinavia (Fig. 5), making the three subclades identified by B.Br.007 (Fig. 3) nonexclusive for North America.
Many of these F. tularensis subsp. holarctica subclades correlate with groups previously identified using MLVA (see Table S1 in the supplemental material) (18). Specifically, B.Br.OSU18 corresponds to MLVA clade B.II; B.Br.012/013 and B.Br.LVS correspond to MLVA clade B.III; B.Br.007/008, B.Br.008/009, and B.Br.OR96-0246 (marked by branch B.Br.007) (Fig. 3) correspond to MLVA clade B.IV; B.Br.001/002 corresponds to MLVA clade B.V; and B.Br.010/011 and B.Br.FTNF002-00 (marked by branch B.Br.010) (Fig. 3) correspond to the Spain, France, and Sweden MLVA clade. Members of subclade B.Br.013/014 were found in two clades, B.I and B.III, in the previous MLVA (see Table S1 in the supplemental material) (18).
|
|
|---|
In proposing a North American origin for the B radiation, one caveat must be mentioned. Specifically, it must be noted that we lack a comprehensive set of isolates from Asia. Given the possibility of an Asian origin for F. tularensis subsp. holarctica, as evidenced by the basal positioning of the Japanese F. tularensis subsp. holarctica isolates and our limited isolate coverage for Asia, it is possible that we are missing considerable diversity from Asia. Analysis of additional Asian isolates could reveal additional intermediate lineages that could preclude a North American origin for the B radiation. However, our current data set points to a North American origin, and we will discuss potential transmission patterns for the B-radiation clade on the basis of this hypothesis.
A North American origin for the B radiation is supported by the North American source of the subclade (B.Br.002/003) immediately basal to the divergence of the radiation clade (Fig. 5). This suggests that the main F. tularensis subsp. holarctica lineage was present in North America prior to the emergence of the B-radiation clade. The "Californian" B.Br.002/003 subclade appears to be a rare subclade that diverged from the main F. tularensis subsp. holarctica lineage after the divergence of the Japanese F. tularensis subsp. holarctica lineage but before the divergence of the ancestor to the B-radiation clade (Fig. 3 and 5). The presence of this subclade in California (Fig. 5) suggests that the main F. tularensis subsp. holarctica lineage was present in North America prior to the emergence of the B-radiation clade.
A North American origin for the B radiation is further supported by the likely North American source of the first subclade to diverge within the main B radiation. B.Br.OSU18, the first subclade to diverge within the main B radiation, contains both North American and Scandinavian isolates (Fig. 5; also see Table S1 in the supplemental material), but the basal B.Br.002/003 subclade argues for a North American rather than a Scandinavian emergence. In addition, the B.Br.OSU18 subclade is relatively rare in Scandinavia compared to other subclades (Fig. 5), and the relative lack of MLVA diversity in Scandinavian B.Br.OSU18 isolates compared to that in North American B.Br.OSU18 isolates suggests a recent introduction into the Old World. There were four MLVA genotypes among the seven Scandinavian B.Br.OSU18 isolates that differed only at the extremely diverse VNTR locus Ft-M03 (data not shown), a locus that has previously been shown to be diverse among even related outbreak isolates (18). This is in contrast to the North American B.Br.OSU18 isolates, which exhibited 22 MLVA genotypes among 45 unrelated isolates and differed at loci other than Ft-M03 (data not shown), indicating a higher level of diversity consistent with an older population. These frequency and diversity results are not likely due to sampling bias, given the relatively large sample sets analyzed from both North America and Sweden. Additional SNP discovery using one of these Scandinavian isolates and one or more North American B.Br.OSU18 isolates would confirm a North American origin for this subclade if subsequent SNP screening placed the North American B.Br.OSU18 isolates basally to the Scandinavian B.Br.OSU18 isolates. This, in turn, would support our hypothesis of a North American origin for the B-radiation clade.
Following a North American origin, the B-radiation clade would have to have spread and diversified throughout North America and been introduced to Eurasia, where it also must have spread, resulting in the current geographic distribution of F. tularensis subsp. holarctica. This clade may have spread throughout North America very rapidly, which would explain the wide distribution of subclade B.Br.OSU18 throughout North America (Fig. 5). However, the relatively large amount of MLVA diversity within this subclade (Fig. 5) suggests that additional geographic patterns likely exist. Additional whole-genome sequencing and SNP discovery efforts within subclade B.Br.OSU18 would likely reveal additional subclades. Mapping the geographic distributions of these novel subclades could provide additional insight into the spread of the B-radiation clade in North America and confirm or refute a rapid spread. A widespread basal subclade with geographically isolated distal subclades would indicate a rapid spread followed by local differentiation.
An introduction to Eurasia would have to have occurred and most likely is marked by branch B.Br.005 (Fig. 3), as nearly all of the Eurasian isolates are contained in the subsequent subclades (Fig. 5). Following its introduction to Eurasia, the B-radiation clade could have diverged and spread, giving rise to subclades B.Br.007/008, B.Br.010/011, B.Br.FTNF002-00, B.Br.012/013, and B.Br.013/014 (Fig. 3 and 5). It is difficult to determine the exact distributions of these subclades due to the limited samples available from much of Europe and Asia. However, some patterns can be identified. Subclade B.Br.FTNF002-00 clearly represents a previously described clone (7) that was introduced to and spread throughout France and the Iberian Peninsula (Fig. 5). The lack of MLVA diversity among our expanded set of French B.Br.FTNF002-00 isolates (Fig. 5; also see Table S1 in the supplemental material) corroborates previous findings (7) and indicates that the spread of this clade throughout France and the Iberian Peninsula was likely a very recent event. The relative nearest to this clade, isolate F0228, is the only isolate within subclade B.Br.010/011 and may indicate a Scandinavian origin for the clade found in France and the Iberian Peninsula (Fig. 5; also see Table S1 in the supplemental material). Indeed, Scandinavia may be the origin point for many of the Eurasian subclades, as indicated by the canSNP diversity that we observed among the Scandinavian isolates. Subclades B.Br.007/008, B.Br.010/011, and B.Br.012/013 are exclusively found in Scandinavia, and subclade B.Br.013/014 is also present there (Fig. 5). This could easily be the result of sample bias, given the limited samples available from other European and Asian countries, but it may also suggest a Scandinavian origin for Eurasian F. tularensis subsp. holarctica. An expanded analysis of additional Central European and Asian isolates and additional SNP discovery could provide further insight into these possibilities. The presence of subclade B.Br.007/008, B.Br.010/011, or B.Br.012/013 in other locations could weaken the hypothesis of a Scandinavian origin for Eurasian F. tularensis subsp. holarctica and might suggest alternative scenarios for the spread of F. tularensis subsp. holarctica throughout Eurasia. Additional SNP discovery within subclade B.Br.013/014, the one subclade known to be present in Central Europe and Asia (Fig. 5), could also provide additional insight. Given the significant amount of MLVA diversity within subclade B.Br.013/014 (Fig. 5), the discovery of several additional canSNP groups is likely and could easily reveal additional geographic patterns and clues as to the spread of F. tularensis subsp. holarctica throughout Eurasia.
Additional, subsequent dispersals to both North America and Eurasia are required for a full explanation of the current distribution of F. tularensis subsp. holarctica. An additional introduction from North America to Scandinavia may have occurred, as evidenced by the seven Scandinavian isolates found in subclade B.Br.OSU18 (Fig. 5; also see Table S1 in the supplemental material). As discussed previously, these isolates likely reflect an introduction from North America to Scandinavia, rather than vice versa, given the small percentage of Scandinavian isolates belonging in this subclade (Fig. 5) and the greater MLVA diversity found among the North American B.Br.OSU18 isolates. Finally, there may have been a reintroduction event into North America from Eurasia, as evidenced by subclades B.Br.008/009 and B.Br.OR96-0246, which are the only North American groups in the Eurasian clade marked by branch B.Br.005 (Fig. 3 and 5). This reintroduction event is likely marked by branch B.Br.008, since it contains both subclade B.Br.008/009 and subclade B.Br.OR96-0246 (Fig. 3).
In summary, the clades, subclades, and lineages described in this study represent stable molecular groups that can be used to further classify F. tularensis global populations. A set of 23 canSNPs clearly defines four subspecies, two F. tularensis subsp. tularensis subpopulations, and 18 new synapomorphic subclades. As demonstrated here for F. tularensis subsp. holarctica, these groups provide an excellent means of tracking the origins and spread of this pathogen on a global scale. Discriminating among isolates within each of these groups for further epidemiological tracking requires the use of a higher-resolution molecular subtyping method, such as MLVA (18, 33). Combining these SNP and MLVA markers in progressive hierarchical resolving assays using nucleic acids, the approach suggested by Keim et al. (20), will provide highly accurate and highly discriminating phylogenetic analyses for F. tularensis where deeper phylogenetic relationships are defined by canSNP markers and strain discrimination is provided by MLVA. Subclades containing relatively high levels of MLVA diversity, such as B.Br.OSU18 and B.Br.013/014 (Fig. 5), indicate excellent isolate sets from which to select candidates for future whole-genome sequencing and SNP discovery. Sequencing isolates from these groups will provide the largest amount of new information useful for further elucidating the deeper phylogenetic structure of F. tularensis.
Note that the use of products/names does not constitute endorsement by the Department of Homeland Security of the United States.
We thank Sumathi Venkatapathy for technical assistance with the MIP assay.
Published ahead of print on 27 February 2009. ![]()
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
Present address: Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»