Previous Article | Next Article ![]()
Journal of Bacteriology, March 2005, p. 1783-1791, Vol. 187, No. 5
0021-9193/05/$08.00+0 doi:10.1128/JB.187.5.1783-1791.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Microbial Evolution Laboratory, National Food Safety and Toxicology Center, Michigan State University, East Lansing, Michigan1
Received 4 October 2004/ Accepted 30 November 2004
|
|
|---|
140 times greater than divergence at the nucleotide sequence level. A >100-kb region around the O-antigen gene cluster contained highly divergent sequences and also appears to be duplicated in its entirety in one lineage, suggesting that the whole region was cotransferred in the antigenic shift from O55 to O157. The ß-glucuronidase-positive O157 variants, although phylogenetically closest to the Sakai strain, were divergent for multiple adherence factors. These observations suggest that, in addition to gains and losses of phage elements, O157:H7 genomes are rapidly diverging and radiating into new niches as the pathogen disseminates. |
|
|---|
Evolutionary analysis has shown that O157:H7 strains are genetically most closely related to enteropathogenic E. coli O55:H7 strains (40) and has engendered a model (7) specifying that O157:H7 evolved through a series of transitional steps from a nontoxigenic progenitor (Fig. 1A). The model predicts that the most recent common ancestor (A1 in Fig. 1) of today's O157:H7 and O55:H7 strains contained the LEE and presumably could elicit diarrhea via an attachment-effacement mechanism. In addition, the ancestor resembled wild-type E. coli in its abilities to ferment sorbitol (SOR+) and express ß-glucuronidase (GUD+). In a first step towards O157:H7, A1 acquired Stx2, presumably through transduction, resulting in a Stx2-positive O55:H7 (A2 in Fig. 1). In the next step, the large virulence plasmid (pO157) was gained and the somatic antigen switched from O55 to O157. From this stage (A3), two separate lines evolved. One branch lost motility by mutation in the flagellar operon (23), resulting in the SOR+ O157 (also called SF O157) clone discovered in hemolytic uremic syndrome cases in Germany (15, 17) and hereafter referred to as the German clone (A4). The other branch, from which the GUD+ O157 strains descended, lost the ability to ferment sorbitol and gained Stx1 (A5). Subsequently, mutational inactivation of the uidA gene (24) resulted in the non-sorbitol-fermenting, ß- glucuronidase-negative phenotype typical of E. coli O157:H7 (A6). It was the clonal descendants of A6 that expanded and spread geographically and that now account for most disease caused by EHEC (19).
![]() View larger version (33K): [in a new window] |
FIG. 1. Evolutionary genomic changes in the emergence of E. coli O157:H7. (A) Stepwise model for the evolution of E. coli O157:H7 from an enteropathogenic E. coli-like ancestor (modified from reference 7 with permission of the publisher). (B) Maximum-parsimony tree (21) based on the presence and absence of genes inferred by microarray hybridizations for 5,121 genes. Boxes indicate the reference genomes. The number of events for each branch is based on the assumption that gene gains and losses occur independently. Triangles mark occurrences of seven SNPs (gray, synonymous; black, nonsynonymous) found by sequencing 7,470 bp in 15 conserved genes. Note that the inferred topology based on gene content is identical to the model in panel A.
|
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Pathogenic E. coli strains used in this study
|
80%). We excluded probes that showed potential for nonspecific hybridizations (hits of 37 to 39 bp of overall identity or hits of 25 to 36 bp of overall identity including a stretch of 15 or more consecutive base pairs with 100% identity) or multiple target hybridizations with K-12 or Sakai DNA (because these two strains were used as reference genomes in this study). With respect to the K-12 and Sakai genomes, out of the 6,176 probes, 14 had no target (EDL-933 specific), 38 had a potential for nonspecific hybridization, 433 had multiple targets (with 365 of these belonging to phage or phage-like elements), and 5,691 matched single genome targets. Of these 5,691 probes, 3,963 target both genomes, 1,257 target only Sakai, and 471 target only K-12. All probes were assigned ORF designations (b-, ecs-, or z- numbers) or intergenic region labels based on the RefSeq database available on the National Center for Biotechnology Information website (38). Of these 5,691 probes, 5,353 target the same genomes (Sakai, K-12, or both) as given in the original annotation by MWG Biotech, and 338 show differences. DNA labeling. Genomic DNA was sheared into 500- to 5,000-bp fragments in a cup sonicator (Heat Systems Ultrasonics W-225; 20 kHz, 200 W). A total of 250 ng of sheared DNA was aminoallyl-dUTP (Sigma, St. Louis, Mo.) labeled with the Invitrogen (Carlsbad, Calif.) DNA labeling system, using a modified 25x deoxynucleoside triphosphate mix consisting of 12.5 mM (each) dATP, dGTP, and dCTP; 2.1 mM dTTP; and 10.4 mM aminoallyl-dUTP. The DNA was purified with Qiagen (Valencia, Calif.) PCR purification columns, using modified amine-free wash (5 mM KPO4, pH 8.0, 80% ethanol) and elution (5 mM KPO4 [pH 8.0]) buffers. The aminoallyl-labeled DNA was dried down in a vacuum centrifuge, suspended in 4.5 µl of 0.1 M Na2CO3 (pH 9.3), Cy coupled by the addition of 4.5 µl of Cy3 or Cy5 dye in dimethyl sulfoxide (1/16 of one vial of Mono-reactive Cy dye [Amersham, Piscataway, N.J.]), and incubated for 1 h at room temperature in the dark. Cy-labeled DNA was purified with Qiagen PCR purification columns. DNA and dye concentrations were determined with a spectrophotometer (NanoDrop Technologies, Rockland, Del.), and labeled DNA was dried down by vacuum centrifugation.
Microarray hybridizations and data processing. Equal amounts of DNAs from strains to be compared, labeled with different Cy dyes, were suspended and combined in a final volume of 35 µl of formamide-based hybridization buffer (MWG Biotech). MWG E. coli O157 arrays were hybridized and washed according to the manufacturer's instructions for hybridization with coverslips. Lifter slips (22x40I-2-4710) from Erie Scientific Company (Portsmouth, N.H.) were used. The used arrays were stripped and reused once. For stripping, the arrays were washed two or three times for 5 min each in 90°C H2O and twice for 10 s each at room temperature in H2O and then were dried by centrifugation (3 min at 500 x g). Test strains were hybridized twice with Sakai as a reference: once on a new chip with the test strain Cy3 labeled and Sakai Cy5 labeled and once on a used chip with the test strain Cy5 labeled and Sakai Cy3 labeled.
Arrays were scanned with a Genepix 4000B instrument (Axon Instruments, Union City, Calif.), and probe intensities (median pixel intensities) were retrieved with Genepix 3.0 software (Axon Instruments). The data quality and the normalization effects were assessed by viewing plots of M versus A [M = log2 (test/reference); A = log2 (test x reference)/2] and by checking for spatial effects with GeneTraffic (Iobion, La Jolla, Calif.) and MAANOVA (41) software. Arrays were generally normalized by global LOWESS normalization, unless they showed spatial bias (in which case subgrid LOWESS normalization was used) or unless normalization skewed the data in the plot of M versus A (in which case raw values were used).
Data analysis.
Data points were filtered for further analysis if probes showed either printing abnormalities or exhibited a low signal in hybridization with the Sakai reference strain. After filtering and normalization, hybridization data were analyzed as the distribution of the two-color signal ratios by using GACK (Genomotyping Analysis by Charlie Kim) (18). The GACK program uses the shape of the log2 distributions to locate signal ratio cutoffs for classifying genes as present in a genome or absent. For each array, analyses of the log2 (test strain/reference strain) distribution (GACK1) as well as of the reciprocal ratio, log2 (reference strain/test strain) (GACK2), were performed. The GACK2 value provides information about probes without targets in the reference strain and might also detect duplicated targets. We classified genes with a GACK1 value of <0.4 as absent and those with a GACK1 value of
0.4 as present. Genes with a GACK2 value of <0.4 were classified as duplications (see also "Chip validation" below). For probes without targets in the reference strain, GACK2 values of <0.4 indicate presence in the test strain if the signal exceeds the low-intensity cutoff. Otherwise these genes were also classified as absent in the test strain.
For probes without targets in the reference strain, an additional indirect analysis was done. The log2 (test strain/K-12) values were calculated from the test strain-Sakai and the K-12-Sakai hybridizations as follows: log2 (test/K-12) = log2 (test/Sakai) log2 (K-12/Sakai). GACK analysis was done with these log2 (test/K-12) values.
Multilocus sequence analysis. The nucleotide sequences of internal fragments of multiple housekeeping genes were determined as described previously (12, 32). The 15 loci were arcA, aroE, aspC, clpX, cyaA, dnaG, fadD, grpE, icdA, lysP, mdh, mtlD, mutS, rpoS, and uidA. The sequencing protocols are available on the STEC website (http://www.shigatox.net/mlst).
Phylogenetic analysis. A phylogeny for the genomes was inferred by parsimony analysis of the presence or absence of genes by using the PAUP (34) program (version 4.0b10). The presence or absence of individual genes was coded as 0 (absence) or 1 (presence) in binary characters. Parsimony analysis was based on the subset of genes that were phylogenetically informative, using the ordinary parsimony algorithm with all steps counted and random sequence addition. Character states were unordered and given equal weight. All genes that were variably absent or present were included in a second parsimony analysis with MEGA (21) to infer the total number of gene gains and losses in genome divergence.
|
|
|---|
We compared the in silico analysis described above with the performance of actual two-color hybridizations with Cy5-labeled Sakai DNA and Cy3-labeled K-12 DNA. The log intensity ratio (M) plotted against the mean log intensity (A) shows outstanding separation of the strain-specific probes (Fig. 2). We adjusted the cutoff values in separate GACK analyses, with Sakai [log2 (K-12/Sakai)] and K-12 [log2 (Sakai/K-12)] as references, to classify genes as present or absent. This analysis used only probes with identical targets in both strains or with targets in only one strain (Fig. 2). A GACK cutoff of 0.4 (Fig. 2) gave <0.6% false negatives (i.e., genes known to be present but classified by the GACK value as absent) and a maximum sensitivity and specificity (1) in distinguishing genes that are present from those that are absent (Table 2). Targets with sequence differences between the genomes are more difficult to classify, because their signals overlap with both present and absent targets (Fig. 2). The sequence similarity within these 269 divergent 50-mer probes ranges from 84 to 98%. Most targets with 96 to 98% similarity are called present, and we estimate that about 50% of targets with 94% similarity are called divergent at a GACK cutoff of 0.4 (see Fig. S6 in the supplemental material).
![]() View larger version (49K): [in a new window] |
FIG. 2. Plot of M versus A for two-color hybridization of Sakai and K-12 genomic DNAs for 5,667 probes (24 of the 5,691 single-copy probes were excluded because of poor signals). Probes are placed into five groups (color coded) based on in silico analysis of single-copy targets in the Sakai and K-12 genomes. Gray, identical target sequences in both genomes (n = 3,684); blue, Sakai-only targets (n = 1,246); green, K-12-only targets (n = 468); red, Sakai-like targets (n = 145); yellow, K-12-like targets (n = 124). In the last two groups, homologous target sequences occur in both genomes but have diverged in sequence from 1 to 8 bp. Lines show the cutoffs (GACK value of 0.4) used for calls of present or absent. Targets between the lines are classified by hybridization as present in both strains; others are classified as present in one strain only.
|
|
View this table: [in a new window] |
TABLE 2. Sensitivity and specificity analysis for different GACK cutoff values
|
Gene content in the stepwise evolution of O157:H7. Genomic DNAs from nine strains representing different stages in the stepwise evolution model of O157:H7 (Fig. 1A) were labeled and hybridized to the oligoarray. All strains were tested twice with a dye swap, using Sakai genomic DNA for a reference. GACK values were calculated to classify genes as present or absent (cutoff = 0.4). From the nine experiments, the normalized log2 signal ratios (mean ± standard deviation [SD]) corresponding to the GACK value of 0.4 were 0.5 ± 0.04 (GACK1) and 0.48 ± 0.08 (GACK2) for the new arrays and 0.61 ± 0.12 (for both GACK1 and GACK2) for the used arrays. The reproducibilities for the nine dye-swap experiments were 99.0% ± 0.6% (mean ± SD) with binary classification (0 = absent; 1 = present) and 98.1% ± 1.1% (mean ± SD) with trinary classification (0 = absent; 1 = present; 2 = duplicated). Out of 5,691 probes, 5,121 (90%) gave perfectly consistent results, with identical binary classification in the replicate experiments throughout all experiments. This set of high-quality data for all strains was used in the subsequent analysis. Final data sets, including trinary scores, are available in Table S5 in the supplemental material.
A total of 4,230 (82.6%) of the 5,121 genes were present in all nine tested strains, and 311 (6.1%) were K-12 specific and absent in the nine strains (Table 3). The remaining 580 genes (11.3%) were present in some strains and clearly absent or highly divergent in others. These variably absent or present genes we refer to as VAP genes. The majority of the VAP probes targeted prophage or phage-like genes. This was particularly striking for the Sakai-specific regions, where > 40% of all 1,084 Sakai-specific probes were VAP prophage and phage-like elements. In contrast, there are only 26 VAP elements (0.7%) in the 3,643 probes of the Sakai-K-12 backbone (Table 3). The percentage of Sakai-specific genes that are VAP genes is an order of magnitude greater (7.4%) than that for backbone genes (0.7%) but is 1/10 of that for phage-related Sakai genes (74.1%).
|
View this table: [in a new window] |
TABLE 3. Numbers of backbone and phage-related genes that are conserved (present in all) or variably absent or present among recent relatives of E. coli O157:H7
|
SNPs.
We assessed the level of sequence divergence by comparative sequencing of
500 bp within each of 15 housekeeping genes (7,470 bp total), which uncovered seven single-nucleotide polymorphisms (SNPs) among the 11 genomes (K-12 excluded) (Table 4). Four of the seven SNPs are synonymous mutations, and three predict amino acid replacements. The comparison of the SNP distributions among strains (Table 4) and the stepwise model allow one to infer the step in which the point mutation occurred (Fig. 1B). For example, the mutation A776G in uidA (SNP-1) was inferred to have occurred between A3 and A5 previously by Monday et al. (24). In this way, mutations underlying all seven SNPs can be placed on the genome phylogeny as single events without any homoplasies (i.e., multiple origins, reversals, or losses).
|
View this table: [in a new window] |
TABLE 4. SNPs among E. coli O157:H7 and closely related strains
|
140 times that for point mutations in housekeeping genes. Islands. To investigate the genomic distribution of the presence or absence of polymorphisms, we plotted the binary data for 5,121 genes against map position in the Sakai genome (Fig. 3). Stretches of K-12-specific DNA were reduced for illustration (Fig. 3). Most (27 out of 33) of the larger (>5-kb) O islands identified in the comparison of the EDL-933 and K-12 genomes are conserved in gene content among the recent relatives of E. coli O157:H7. There are 27 variable regions where gain or loss of two or more adjacent genes has occurred (Fig. 3). Of the 18 Sakai prophages (Sp), only Sp16 (OI-102) was conserved in all strains, but only two of the six Sakai phage-like elements (SpLE), SpLE1 (TAI, OI-43/48) and SpLE 5 (OI-172) were variable.
![]() View larger version (34K): [in a new window] |
FIG. 3. Gene contents of strains from the stepwise evolution model for the emergence of E. coli O157:H7. Genes are ordered by their position in the Sakai genome from left to right and top to bottom. Genes are color coded as follows: black, backbone; blue, Sakai specific; green, K-12 specific. Rows represent 12 genomes in the phylogenetic order as shown in Fig. 1B: 1, Sakai; 2, 93-111; 3, EDL-933; 4, 86-24; 5, ST-530; 6, G5101; 7, CB2755; 8, 493/89; 9, 5905; 10, TB182A; 11, DEC 5d; 12, K-12. O islands are labeled by number below the rows, with those implicated in virulence highlighted in yellow. Known islands or putative functions are labeled above the rows (Fe, iron transport; Fim, fimbrial operon; FA syn, fatty acid synthesis; Rib, ribose transport; Sor, sorbose transport; Suc, sucrose transport). Arrowheads in Sp9 and Sp11/12 mark the sites of the large inversion in EDL-933 compared to Sakai.
|
Variable regions of K-12-specific genes, inferred both by the direct and indirect methods (see Materials and Methods), identified nine regions (Fig. 3, with the b- number given for first gene in region), four of which are phage related and five of which have other putative functions. None of these regions were found in typical O157:H7 strains. Three regions were detected in all strains except the typical E. coli O157:H7; these are ORFs with unknown functions (b1160 to -72), genes for a putative sensor-type regulator and a putative adherence and penetration protein (b1201 to -2), and genes for starvation-sensing proteins (b1580 to -81). The other six regions found in some genomes contained secretion pathway and glycolate metabolism genes (b2968 to -86), fimbrial protein genes (b2106, b2108, b2111, and b2112), genes for cold shock-like proteins (b1550 to -68 on qin), prophage P2 proteins (b2082 to -84), and genes with unknown function (b1145 to -48 on e14 and b2356 to -61 on KpLE1).
LEE and Shiga toxins. The expression of both the LEE (SpLE4, OI-148) and the Shiga toxin genes contribute to the full virulence of EHEC strains. In agreement with previous results, the LEE has been found complete and intact in the E. coli O55:H7 strains and other close relatives of E. coli O157:H7. However, we found that nleA, which is located outside the LEE on Sp9, is absent in the O55:H7 genomes but present in the EHEC O157 strains. NleA is an effector secreted by the LEE-encoded type III secretion system and plays a key role in virulence of Citrobacter rodentium in a mouse infection model (10). EspFu, encoded on Sp14, is a second effector secreted by EHEC that is not encoded by the LEE and has recently been shown to be essential for pedestal formation together with the LEE (3). We found the EspFu gene to be present in all of the close relatives of EHEC O157 examined here, suggesting that it was present in the progenitor of E. coli O55:H77 and EHEC O157:H7.
Microarray hybridizations correctly identified the Stx1 and Stx2 genes in toxin-producing strains. However, the presence of the toxin genes does not necessarily mean that the corresponding Stx-carrying Sakai phages (Sp5 for Stx2 and Sp15 for Stx1) have been conserved. In fact, all of the Stx1- or Stx2-producing strains before the proposed emergence of A6 (Fig. 1) are divergent in most phage genes found in Sp15 or Sp5 (Fig. 3). Both Sp5 (OI-45) and Sp15 (OI-79/93) were completed as recently as in the step leading to A6. The entire phage (all Sp5 genes) is missing in the Stx2-negative O55:H7 strains (DEC 5d and TB182A), whereas in the five Stx2-positive strains, only
12 to 23% of Sp5 ORFs are found. Even among the typical O157:H7 strains EDL-933 and 86-24, there are stretches of >20 ORFs missing (Fig. 3, Sp5).
Strains lacking Stx1 contain several genes of the Sp15 phage. Sp15 in Sakai corresponds to OI-93, which is the Stx1 island, and parts of OI-79, which is located
200 ORFs away from OI-93 in the EDL-933 genome. Interestingly, all strains except O55:H7 DEC 5d and O157:H7 86-24 contain the genes common to Sp15 and OI-79 (Fig. 3, left half of Sp15) but lack most of the ORFs common to Sp15 and OI-93 (Fig. 3, right half of Sp15). Even the Stx1-positive strains G5101 and ST-530 have only about 30% of the OI-93 genes.
O-antigen region. As expected, the complex locus specifying O157 lipopolysaccharide (OI-84) was absent in the three O55 strains and present in all O157 strains (Fig. 3 and 4). Previous sequence analysis of the gnd locus demonstrated an allelic difference between the O55:H7 and O157:H7 strains that led to the conclusion that gnd cotransferred during the antigenic shift (36). Wang et al. (39) identified a recombination site far downstream based on sequence differences in the his operon between strains O55 TB182A and Sakai. Interestingly, we found the same pattern of divergence between O55 and O157 strains for targets scattered over a stretch of about 100 ORFs between SpLE2 and Sp15 (ecs2819 to 2925) (Fig. 4). In this region, it appears that the K-12-specific fimbrial genes (b2106, b2108, b2111, and b2112) in the ancestral O55 strains were replaced by type 1 fimbrial genes (Fig. 4) in immediate O157 ancestors. In the O157 strain G5101, 103 of the127 targets between ecs2813 and ecs2937 were scored as duplications in replicate hybridization experiments (Fig. 4) suggesting the hypothesis that the whole region is duplicated in strain G5101. Together, these observations suggest that the entire 140-kbp segment, including three regions encoding surface properties (LPS, colanic acid biosynthesis, and type 1 fimbrial genes), was cotransferred horizontally and recombined in the O55-to-O157 antigenic shift.
![]() View larger version (23K): [in a new window] |
FIG. 4. Gene content in the region between SpLE2 and Stx1 prophage (ecs2806 to -2937). Genes are ordered by their position in the Sakai genome. Colors: white, absent; light red, present; dark red, duplicated; light blue, found present in one but absent in the other of two replicates (i.e., high probability of sequence divergence); dark blue, found duplicated in one of two replicates; gray, flagged spot in at least one of the replicates. Note that in addition to the genes determining the O antigens (rfb), several other genes in this region show a distinct pattern between O157 and O55 strains.
|
TAI, the tellurite resistance and adherence-conferring island (SpLE1), appears in O157 genomes after A3 and is duplicated in EDL-933 (OI-43 and OI-48). Interestingly, O55 strain 5905 contains about half of the island (ecs1359 to -1409), which could be a remnant or most likely was acquired independently. This part of the island contains the Iha adhesin but lacks the tellurite resistance and urease gene clusters. The GUD+ strains are also highly divergent in two regions of the TAI island (ecs1306 to -1313 and ecs1384 to -1396), which encode an AIDA-1 adhesin-like protein (ecs1396) and a putative complement resistance protein (ecs1312).
The complete OI-115 island, a putative type III secretion system (ETT2), was found in all strains except the most ancestral O55:H7 strains (DEC 5d and TB182A). In these two strains at the base of the phylogeny, only about 25% of the island is present (ecs3731 to -3736), and the rest of the island (ecs3716 to -3730) and also a part of the backbone (ecs3709 to -3715) are missing. This pattern of loss suggests that the whole island was present in the common ancestor (A1 in Fig. 1) but has eroded as part of it, together with some backbone genes, were deleted in the lineage leading to DEC 5d and TB182A.
In addition to the factors on the LEE island and the Stx phages, 28 putative adhesin and toxin genes are found on the Sakai chromosome (11, 30, 31). Eleven of these are located on the seven large islands discussed above. The other 17 are conserved in all strains, with the exception of an intimin-like ORF on the OI-173 (ecs5290) and two other putative adhesin genes (ecs0350 on OI-14 and ecs2776) that are divergent in the GUD+ O157 strains. Finally, 12 of the 14 loci for fimbrial biosynthesis (11) were found in all strains. One fimbrial locus showed the same pattern of distribution as the O-antigenic region (see above), and one locus was present in all except TB182A (ecs2112 to -2113). It has been shown recently that the latter fimbrial locus (ecs2113) is important in colonization of calves (5); however, its role in human disease is unknown. All strains also contain the six iron uptake systems found in Sakai and are missing the fec transport system found in K-12.
GUD+ O157 strains. The genomes of the GUD+ O157 strains showed several stretches of divergence scattered all over the chromosome. As noted above, these O157 strains are highly divergent in islands encoding five putative adherence factors and, in addition, have lost or are highly divergent in at least 10 more genomic regions, many of which contain genes of unknown function. One lost region specifies a putative complement resistance protein, TraT, and two others contain genes for resistance against acridine (ecs4134 to -4139) and methylviologen (ecs1611 to -1619). Clearly, none of the deleted regions are absolutely required for virulence of O157 strains.
|
|
|---|
94% sequence identity are scored as absent. A comparable GACK analysis of a PCR-based Helicobacter pylori array assigned 50% of genes with
89% identity as divergent (18). The majority of PCR products on the H. pylori array had 93 to 97% sequence identity to the test strain (18), whereas the majority of probes on the O157 oligoarray are 100% identical to the test genome targets. Because this majority defines gene presence, in a test strain with most sequences slightly divergent from the reference strain (as in the H. pylori case), the sequence identity value below which genes are called absent or highly divergent will also be lower. Thus, the detection limit of oligoarrays is nearly equivalent to that of PCR-based whole-ORF arrays when investigating present or absent polymorphisms in closely related genomes. The hybridization data are consistent with previous knowledge about the mobile virulence elements in the pathogens investigated here, which further validates the accuracy of these multigenome oligoarrays. The presence of the Stx1 and Stx2 genes and the LEE island was correctly assigned in all tests. Our analysis showed that the whole SpLE5 island (half of OI-172) is missing in O157 strain 86-24, a result consistent with a previous study (20). The results also indicate that tellurite resistance encoded by the TAI island was acquired recently in the step from A3 to A5, as proposed by Tarr et al. (35). The observation that stx1 is missing but elements of the Stx1 phage are present in 86-24 (Sp15 in Fig. 3) was also reported previously (33).
The Stx1 and Stx2 phages have a complex history even over the short time scale separating the immediate ancestors of E. coli O157:H7. Shaikh and Tarr (33) mapped the phage integration sites, which, with our results, indicate that the Stx1 and Stx2 phages integrated at the same site as in the Sakai strain (wrbA for Stx2 and yehV for Stx1) are similar in gene content, whereas phages integrated into other sites in related strains are divergent in gene content. Data from Ohnishi et al. (29), who determined position and diversity of Stx phages in eight O157 strains from Japan by PCR scanning, showed the same pattern. Together these findings support a scenario in which the phage-borne toxin genes were acquired early and conserved despite evidence of dynamic turnover in phage genes, resulting from phage replacement, localized recombination, and island erosion. The comparative genomic analysis indicates that after occupation of yehV by the Stx1 phage (or truncated Stx1 phage) and the occupation of wrbA by the Stx2 phage, these prophages diversified and gained the additional genes (or the prophages were replaced or recombined with other phages) to achieve the gene complement of O157 Sakai.
We identified the Sakai and K-12 genes that were gained or lost during the emergence of E. coli O157:H7 from its O55:H7-like ancestor. Overall, the phylogeny inferred from differences in gene content confirms the stepwise evolution model (7). The phylogeny shows that the representative intermediates have diverged from the proposed ancestral nodes (A1 to A6). It also reveals that the variation between pairs of strains derived from each ancestral node is small. For example, the two typical E. coli O55:H7 strains (DEC 5d and TB182A) descended from A1 are very similar but are clearly different from the atypical Stx2-positive O55:H7 strain (5905) derived from A2. The multigenome array used here also permits a second, independent enumeration of present or absent polymorphisms by using only genes present in K-12, excluding Sakai-specific genes. This second tree topology is identical to that in Fig. 1B with the exception that K-12 clusters with the O157 German clone, which contains several K-12-specific genes (Fig. 3) (b1145, b1547, and b2350) that are absent in most other strains. These results demonstrate that a K-12 array is useful not only for phylogenetic analysis of pathogenic E. coli strains that are distantly related among each other, as shown by Fukiya et al. (8), but even for phylogenetic resolution of a group of closely related strains.
About 85% of the genes that are variably present (so-called VAP genes) are phage related, underscoring the dominant role of phages in diversification of the chromosomal architecture of E. coli O157 strains (11, 31). Several Sakai prophages vary in gene content even among very closely related host strains (Fig. 3), indicating that phage genomes themselves rapidly diversify. Thus, these bacteria can act as "phage factories," producing a variety of chimeric phages (28). The extent to which these differences in gene content among phages contribute to variation in virulence among toxin-producing strains has yet to be elucidated.
To investigate further the genomic impact of phage variability, we classified the VAP genes into clusters of orthologous proteins (COGs). Most VAP genes could not be classified. Of those that could be classified, >20% of the 218 genes involved in DNA replication, recombination, and repair are the most variable (Fig. 5). This is because most VAP genes are phage related, so that COGs with a high fraction of phage-related genes also showed a high percentage of VAP genes (correlation coefficient = 0.97, P < 0.001, t = 17.43). In fact, there is a significant excess of phage-related VAP genes in DNA replication and recombination (e.g., integrases) and transcription (Fig. 4). In contrast, there was a significant deficiency of phage-related VAP genes associated with proteins involved in secretion, cell motility, cell membrane biogenesis, and carbohydrate metabolism (Fig. 4). Overall, the most conserved COGs include genes involved in either nucleotide or coenzyme transport. Interestingly, the more stable groups among the phage-related genes seem to be the more variable groups among the non-phage-related genes. This inverse relationship suggests the hypothesis that most phage genes are gained and lost quickly, except for the ones conferring advantages to the bacterial host, which are retained by natural selection. The same selective pressure drives diversity in non-phage-related genes of these functional groups. It is also possible that the phage origins might be obscured after the erosion of the extraneous phage genes so that the remaining genes are considered to be native and not phage related in origin. Phage and cell envelope genes were identified as the ones most often horizontally transferred based on sequence analysis of 116 prokaryotic genome sequences (25).
![]() View larger version (43K): [in a new window] |
FIG. 5. Percentages of VAP genes in COGs. For each functional group, the fraction of genes that are VAP genes is shown, divided into phage-related and non-phage-related categories. Functional groups were defined by the COG database (37). Significant excesses (+) and deficiencies () of phage-related VAP genes were determined by residual analysis of the contingency table (6).
|
Surprisingly, the GUD+ O157 strains, although phylogenetically closest to the typical O157:H7 strains, are highly divergent in several loci encoding adherence factors and defense mechanisms; such changes in adherence properties can indicate shifts in the environmental niche of a bacterium (13). Thus, it appears that genomic dynamics, primarily fostered in the O157 lineage by phage mobility, island acquisitions, and subsequent erosions, is a trial-and-error process that can promote genetic change underlying the ecological diversification of bacterial pathogens.
This project has been funded in part by the MSU foundation and in part with funds from NIH grant AI47499. The STEC Center is supported with funds from the NIAID, NIH, DHHS, under NIH research contract N01-AI-30058.
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»