Previous Article | Next Article ![]()
Journal of Bacteriology, May 2007, p. 3532-3546, Vol. 189, No. 9
0021-9193/07/$08.00+0 doi:10.1128/JB.01744-06
Copyright © 2007, American Society for Microbiology. All Rights Reserved.

Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, Michigan 48109,1 Department of Microbiology, University of Texas Southwestern Medical Center, Dallas, Texas 752352
Received 13 November 2006/ Accepted 19 February 2007
|
|
|---|
|
|
|---|
105 CFU/ml midstream urine (93, 105). However, Stamm and colleagues studied women with presumptive lower numbers of UTIs and discovered that up to 50% of symptomatic women with coliforms in their urine were not detected by using this criterion (98), suggesting that UTI is even more common than reported. Although UPEC strains exist within the intestinal tract of humans, they are distinct from most diarrheagenic or commensal E. coli strains in that UPEC isolates possess specific factors that permit their successful transition from the intestinal tract to the urinary tract. A range of putative and established virulence genes have been identified in UPEC that enable these isolates to overcome host defenses and establish infection in this unique niche. These factors include fimbrial adhesins (type 1, P, and S/F1C), toxins (cytotoxic necrotizing factor 1 [cnf1], hemolysin, and secreted autotransporter toxin), host defense avoidance mechanisms (capsule or O-specific antigen), and multiple iron acquisition systems (aerobactin, enterobactin, enterobactin-like, including iroN, and yersiniabactin) (47, 84). Additionally, sequencing of the prototypic pyelonephritogenic UPEC isolate E. coli CFT073 (104) has revealed that many of the coding sequences (CDS) could be assigned no function and are labeled as hypothetical or have been assigned putative functions. This abundance of unknown genes strongly suggests the existence of novel virulence determinants that may play important role in UTI pathogenesis. Despite the identification of multiple virulence-associated genes in UPEC, no single profile of urovirulence has been determined, with half of all UPEC isolates containing none or only one of the urovirulence determinants identified to date (60).
The genome size of naturally occurring E. coli isolates can differ by up to 1 Mb, ranging from approximately 4.5 to 5.5 Mb (5). This variability is reflected in the commensal E. coli K-12 isolate MG1655 (4.64 Mb) (9), the enterohemorrhagic E. coli (EHEC) strains O157:H7 Sakai (5.50 Mb) (38) and O157:H7 EDL933 (5.53 Mb) (77), the enteroaggregative E. coli (EAEC) strain O42 (5.36 Mb) (www.sanger.ac.uk), and UPEC isolates CFT073 (5.23 Mb) (104), J96 (5.06 Mb) (83), 536 (4.94 Mb) (16), and UTI89 (5.07 Mb) (20). The observed differences in genome size between different E. coli strains are primarily due to the insertion or deletion of a few large chromosomal regions, with overall gene order maintained between different strains (83).
The acquisition of DNA by horizontal gene transfer (HGT) is an effective mechanism of generating diversity between bacterial species. The acquisition of plasmids and bacteriophages also plays an important role in generating genomic diversity (24). HGT results in an unusually high degree of similarity in DNA composition between the exchanged region of the donor and recipient genomes (69). If the newly acquired DNA confers an advantage to the organism, then it is retained and may be stably integrated into the genome through the process of natural selection (56). HGT is believed to be essential for adaptive evolution of bacterial species (87). The amount of genetic material which has been acquired through HGT is unexpectedly high in a number of bacterial pathogens (54, 69). For example, 18% of the genes in the E. coli K-12 MG1655 chromosome appear to have been acquired from other bacterial species through HGT (57).
Overall, the G+C content of bacterial species can differ significantly; however, within a single species, base composition and codon usage are generally conserved. Regions of genome plasticity (plasticity zones) can be identified as areas with an atypical G+C content relative to the rest of the genome, suggesting that such DNA segments originated from a different organism (57, 69). Large regions of genomic DNA, ranging from 5 to 100 kb in length, are frequently exchanged between bacterial isolates. These regions of DNA are referred to as "genomic islands" (GIs) or, if these DNA segments contain virulence factors or virulence-associated genes, the term "pathogenicity island" (PAI) is commonly used (33). PAIs are large (>30 kb), unstable regions of chromosomal DNA that contain bacterial virulence genes (33, 34). The G+C content of PAIs frequently differs from the rest of the genome, indicating possible acquisition from a related bacterial species by HGT, and PAIs are frequently associated with tRNA genes, which have been suggested to act as integration sites for foreign DNA. Insertion sequences or direct repeats often flank these pathogenicity-associated GIs, and mobility genes (often cryptic), including insertion sequence elements, transposases, origins of plasmid replication, and integrases are often found within PAIs. Additionally, PAIs are commonly found in pathogenic strains but are absent or rarely found in nonpathogenic strains (33, 34). PAIs were first described for the UPEC strain 536 (11) and have since been identified in three other UPEC strains, the pyelonephritis isolates J96 (102) and CFT073 (32, 75, 81) and the cystitis isolate UTI89 (20).
Identification of the somatic (O) and flagellar (H) antigens of E. coli by serotyping is the traditional diagnostic classification system of pathogenic E. coli. Various groups of O and H antigens have been associated with specific E. coli pathotypes, with more than 176 O serogroups described to date (6, 71, 89). Ten of these O serogroups (O1, O2, O4, O6, O7, O8, O16, O18, O25, and O75) are preferentially associated with UPEC strains (48, 71). The majority of ExPEC isolates belong to phylogenetic group B2 and, to a lesser extent, group D (7, 13, 51, 78), whereas most commensal strains, including K-12 MG1655, belong to group A (41). Well-studied UPEC isolates 536 and J96 both belong to phylogenetic group B2 and have the serotypes O6:H31 and O4:H5, respectively.
The use of comparative genomic hybridization (CGH) analysis capitalizes upon the rapidly expanding fields of microbial genomics, bioinformatics, and microarray technology and is a powerful tool for comparing the gene content of multiple bacterial genomes. For this study, the genomes of three pyelonephritis strains, four cystitis strains, and three fecal/commensal E. coli isolates (including E. coli K-12 MG1655) were hybridized against the E. coli CFT073 microarray. A distinction could be made between genes with "core functions" (present in all 12 E. coli strains) and genes that were potentially involved in the pathogenesis of UTIs. Using this technique, we were able to clearly delineate 13 genomic or phage islands in strain CFT073 and identify UPEC-specific genes. Additional bioinformatic screens confirmed our CGH findings and allowed the inclusion of the recently sequenced UPEC strains UTI89, 536, and F11 in comparative analyses. Using these methods, we were able to conclusively identify 131 genes that were exclusively found in UPEC relative to commensal and fecal isolates. Half of these genes are annotated as hypothetical or have little functional characterization, thus identifying a pool of potential urovirulence factors.
|
|
|---|
105 CFU/ml, pyuria, fever, and no other source of infection) (65). Cystitis strains (F3, F11, F24, and F54) were isolated from the urine of women under the age of 30 years with first episodes of cystitis and bacteriuria of
105 CFU/ml (99). Fecal/ commensal E. coli isolates (EFC4 and EFC9) were collected from healthy women aged 20 to 50 years with no history of diarrhea, antibiotic usage, or symptomatic UTI within the past month (65) and are avirulent in the murine model of UTI (64). Additionally, the laboratory-adapted fecal/commensal E. coli isolate K-12 MG1655 was used as a negative control for CGH microarray experiments as the genome sequence has been determined (9). Serotyping and virulence gene identification. All serotyping and virulence gene identification was conducted by the Gastroenteric Disease Center at Pennsylvania State University. Using PCR, strains were tested for the presence of a range of virulence genes associated with UPEC and diarrheagenic E. coli strains: heat labile toxin (LT); heat stable toxin a and b (STa/STb), Shiga toxin types 1 and 2 (STX1/STX2), cytotoxic necrotizing factor 1 and 2 (CNF1/2), intimin-gamma (EAE), bundle-forming pili (BFP), O antigen type 157 (O157), P-fimbrial adhesin genes (papG alleles I and III), S-fimbrial adhesin (SFA), and F1C-fimbrial adhesin (focG).
Genome alignments of E. coli strains CFT073 and K-12 MG1655.
The full genomes of E. coli CFT073 (104) (GenBank accession no. AE014075) and K-12 MG1655 (9) (GenBank accession no. U00096) were sequentially aligned in
20-kb segments using coliBASE software (http://colibase.bham.ac.uk) (19). Using a gene-by-gene comparison of these two genomes, it was possible to identify CFT073 genes that are present in K-12 but may not have been annotated as present. Genes were classified as present if (i) the same gene was annotated in both strains, (ii) an orthologous gene was identified in K-12, or (iii) a gene with a high level of nucleotide identity to a CFT073 gene was found in K-12. Genes that were severely truncated in either strain were not considered present. The findings from this gene-by-gene comparison between the E. coli CFT073 and K-12 genomes were used to validate the microarray data.
CGH microarray analysis. The E. coli CFT073-specific DNA microarray (NimbleGen Systems, Inc., Madison, WI) includes 5,379 annotated CDS from the CFT073 genome sequence (104). Each of the CDS is represented on the glass slide by a minimum of 17 unique probe pairs of 24-mer in situ-synthesized oligonucleotides. Probes are evenly spaced throughout the CDS, and intergenic sequences are not included on the array. Each pair consists of a sequence perfectly matched to the CDS, and another adjacent sequence harbors two mismatched bases for the determination of background and cross-hybridization, equating to 190,000 probes per array.
Total genomic DNA from log-phase UPEC and fecal/commensal E. coli isolates was isolated using Genomic-Tip 500/G columns (QIAGEN) according to the manufacturer's protocol, and the DNA concentration was adjusted to approximately 1 µg/µl. Genomic DNA was labeled with a randomly primed reaction (92). DNA (1 µg) was mixed with 1 optical density of 5' Cy3-labeled random nonamer (TriLink Biotechnologies) in 62.5 mM Tris-HCl, 6.25 mM MgCl2, and 0.0875% ß-mercaptoethanol; denatured at 98°C for 5 min; chilled on ice; and incubated with 100 U Klenow fragment (NEB) and deoxynucleoside triphosphate mix (6 mM each in Tris-EDTA) for 2 h at 37°C. Reactions were terminated with 0.5 M EDTA (pH 8.0), precipitated with isopropanol, and resuspended in water. A 50-fold amplification was typically achieved. Labeled genomic DNA was hybridized to arrays in 1x NimbleGen hybridization buffer for 16 h at 45°C using a Hybriwheel hybridization apparatus (NimbleGen) in a rotisserie oven. The next morning, arrays were washed with nonstringent wash buffer (6x SSPE [1x SSPE is 0.18 M NaCl, 10 mM NaH2PO4, and 1 mM EDTA {pH 7.7}], 0.01% [vol/vol] Tween 20) for 2 min and then twice in stringent wash buffer (100 mM morpholineethanesulfonic acid [MES], 0.1 M NaCl, 0.01% [vol/vol] Tween 20) for 5 min, all at 47.5°C. Finally, arrays were washed again in nonstringent wash buffer (1 min) and rinsed twice for 30 s in 0.05x SSC (1x SSC is 0.15 M NaCl plus 0.015 M sodium citrate). Arrays were spun dry in a custom centrifuge and stored until scanned. Microarrays were scanned at a 5-µm resolution using the Genepix 4000b scanner (Axon Instruments, Union City, CA), and pixel intensities were extracted using NimbleScan image extraction and analysis software.
Data from all microarray experiments were normalized using the technique described by Irizarry and colleagues (44) and log2 transformed prior to analysis. The normalized data took into account the signal intensity from every probe (perfect match and mismatch oligonucleotides) for each CDS in the genome and permitted comparative analyses to be made between individual hybridization experiments. Normalized data were analyzed for the presence/absence of annotated CDS relative to the E. coli CFT073 reference strain. CDS with normalized array values of less than 7.9 were considered to be absent from the test strain relative to the reference strain, E. coli CFT073. The cutoff value differs between individual microarray experiments, as the normalization of data from multiple experiments is dependent upon the set of input data. To validate the normalized, log2-transformed microarray data, a gene-by-gene comparison between the E. coli CFT073 and K-12 genomes was conducted using coliBASE software (19).
Bioinformatic screen of the E. coli CFT073 genome.
Each of the CDS for the E. coli CFT073 genome was compared against the CDS for the publicly available UPEC genomes (UTI89, 536, and F11) as well as all other commensal and diarrheagenic E. coli strains listed in Table 1 by using BLAST score ratio (BSR) analysis (80). The comparisons in this study were performed using the nucleotide sequences for each coding region instead of the peptide coding regions to allow direct comparison between the microarray studies and the BSR analysis (peptide comparisons were also performed, and the data for the peptides were similar to data for the nucleotide comparisons). For each of the predicted CDS in E. coli CFT073, a BLASTN raw score was obtained for the alignment of the CDS against itself (REF_SCORE) and the most similar CDS (QUE_SCORE) in each of the genomes listed in Table 1. These scores were then normalized by dividing the QUE_SCORE obtained for each query genome CDS by the REF_SCORE. CDS with a normalized ratio of <0.4 were considered to be nonhomologous and scored as "absent" in this data set. A normalized BSR of 0.4 is generally similar to two CDS being
30% identical over
30% of the CDS. A normalized BSR of >0.8 indicates that the CDS are highly conserved and were scored as "present" in the study. This value represents more than
85 to 90% nucleotide identity over 90% of the reference sequence, indicative of a highly conserved sequence. CDS labeled as divergent have BSR values between these two extremes and represent genes that have diverged but still show significant levels of similarity such that they can be identified as homologs.
|
View this table: [in a new window] |
TABLE 1. Sequenced E. coli strains used for BSR analysis against CFT073
|
|
|
|---|
|
View this table: [in a new window] |
TABLE 2. Characteristics of E. coli strains used in this study
|
Of the 4,025 CFT073 CDS identified in K-12 by genome alignments, 531 of these CDS are not annotated in K-12. Microarray data confirmed the presence of 461 of these genes (87%) in the K-12 genome sequence. Many of the genes that are present in K-12 but appeared to be absent by microarray were either truncated genes or contained divergent nucleotide sequences that would have affected DNA hybridization to the CGH arrays. The difference in the number of genes shared between K-12 and CFT073 by genome alignment versus array data was 147 genes, indicating that only 2.7% of the genes in the array could be misclassified by CGH as absent when they were present (i.e., false negative results). Thus, 97.3% of genes were classified correctly, validating the microarray for determination of gene content among strains. In silico BSR analysis of the K-12 and CFT073 CDS revealed a similar number (3,933) of the CDS classified as either present (3,381) or divergent (552) using a conservative threshold.
Comparative genomic hybridization of E. coli CFT073 with uropathogenic and fecal or commensal E. coli strains.
The number of genes that each E. coli strain had in common with CFT073, based upon microarray data, is shown in Table 3. Pyelonephritis and cystitis isolates (UPEC strains) contained similar numbers of CFT073 genes, whereas the fecal or commensal strains had
100 fewer genes than the UPEC isolates; the laboratory-adapted fecal or commensal strain K-12 had approximately 300 fewer genes than the UPEC isolates. Although the UPEC isolates tended to contain more CFT073 genes than did the fecal or commensal strains, this difference was not statistically significant (P > 0.05). The number of genes that were common to all 10 E. coli strains was 2,820, representing 52.4% of the E. coli CFT073 genome.
|
View this table: [in a new window] |
TABLE 3. Number of CFT073 genes present in UPEC and fecal/commensal E. coli strains based on CGH microarrays
|
![]() View larger version (54K): [in a new window] |
FIG. 1. Graphical display of CGH microarray data. Each row corresponds to the annotated CDS of E. coli CFT073, from thrL (c0001) at the top, to lasT (c5379) at the bottom. The columns represent the 10 E. coli strains hybridized against the CFT073 microarray, grouped by clinical presentation (CFT204, CFT269, and CFT325 strains, pyelonephritis; F3, F11, F24, and F54 strains, cystitis; EFC4 and EFC9, fecal/commensal strains; and K-12, E. coli K-12 MG1655). Based on the microarray data, blue indicates CDS identified as present and yellow indicates CDS identified as absent. The 10 genomic islands and three phage regions identified in Table 4 are shown on the right.
|
|
View this table: [in a new window] |
TABLE 4. Genomic islands and phage regions of >30 kb identified in E. coli CFT073 using CGH
|
-CFT073-gene name, as described above. Eight of the 10 genomic islands (80%) were associated with a tRNA locus, and the majority of islands contained a phage integrase, transposase, or insertion sequence at one or both boundaries of the island. The size of the islands ranged from 32 to 123 kb (median size of 54 kb) and 8 of the 10 (80%) islands had G+C contents that differed from that of CFT073 (50.5%) (104). Seven of the genomic islands contained one or more genes with predicted or established roles in virulence (PAI-CFT073-pheV [formerly designated PAI ICFT073], PAI-CFT073-pheU [PAI IICFT073], PAI-CFT073-aspV [PAI IIICFT073], PAI-CFT073-serX, PAI-CFT073-icdA, PAI-CFT073-metV, and PAI-CFT073-asnT [HPICFT073]), while six (
-CFT073-b0847,
-CFT073-potB, GI-CFT073-asnW, GI-CFT073-cobU,
-CFT073-smpB, and GI-CFT073-selC) contained no known virulence genes. However, all of the genomic islands contained a high number of CDS with hypothetical or putative functions (Table 4), and thus, additional virulence factors may exist within these islands. Studies are currently under way in our laboratory to elucidate the function of these genes. Phage DNA sequence is common in E. coli CFT073; indeed, five cryptic prophage genomes have been identified in this strain, although they do not contain sufficient genetic information to produce viable phage (104). Islands
-CFT073-b0847,
-CFT073-potB, PAI-CFT073-icdA, and
-CFT073-smpB are particularly phage-rich regions of sequence. The position of each genomic island relative to the CFT073 genome sequence is shown in Fig. 2.
![]() View larger version (25K): [in a new window] |
FIG. 2. Ten genomic islands in E. coli CFT073. The 10 genomic islands and three phage regions of E. coli CFT073 (Table 4) are shown relative to the CFT073 genome sequence. The three previously identified PAIs of CFT073 (PAI-CFT073-pheV [PAI ICFT073], PAI-CFT073-pheU [PAI IICFT073], PAI-CFT073-asnT [HPICFT073]) are shown in white boxes, and the 10 novel genomic islands (including three phage islands) identified by CGH analysis are shown in black boxes.
|
The presence of these 10 genomic and 3 phage islands in nine other sequenced bacterial strains was examined using coliBASE genome alignments. Eleven of the CFT073 genomic islands are not present in any of the strains available for analysis by coliBASE.
-CFT073-b0847 and PAI-CFT073-asnT (HPICFT073) are present in other strains to various degrees.
-CFT073-b0847 is present in E. coli E2348/69 (EPEC), Salmonella enterica serovar Typhi TY2, and Salmonella enterica serovar Typhimurium LT2, although differences were observed at CDS c0933, c0944 to c0946, and c0967 to c0970. CDS c0963 to c0968 of
-CFT073-b0847 are inverted in E. coli E2348/69 (EPEC) relative to the CFT073 genome. Otherwise, the gene order is conserved between strains in the genomic island regions. PAI-CFT073-asnT (HPICFT073) was also identified in E. coli O42 (EAEC) and Yersinia pestis CO92, although a minor difference was observed at CDS c2425, and CDS c2424 to c2429 were annotated differently in the strains. In Y. pestis CO92, the corresponding region of sequence from c2424 to c2429 in CFT073 is annotated as irp2 and irp1, and the same CDS have been predicted in E. coli O42 (EAEC) by using Glimmer (86). The irp1 and irp2 genes encode iron-repressible yersiniabactin biosynthesis proteins, which, along with fyuA (yersiniabactin receptor), are part of the high-pathogenicity island (HPI) in Yersinia species (91).
UPEC-specific genes. Using CGH analysis, we identified 2,820 genes that were common to all of the UPEC and fecal or commensal strains studied. To estimate the number of these genes that could be considered UPEC specific, we asked how many genes were present in at least a certain number of UPEC strains but not present in any of the fecal or commensal strains, including strain K-12 MG1655. For example, there were 743 such genes present in at least one of the UPEC strains studied by CGH, 590 in at least two strains, and so on (Fig. 3). In our most conservative assessment, there were 173 UPEC-specific CDS that were considered present in all eight UPEC strains (including CFT073) but absent in the fecal or commensal strains. Although UPEC strains are members of the ExPEC family of E. coli, and many genes referred to here as UPEC specific may actually be ExPEC specific, we refrain from making this assumption since no other members of the ExPEC family were tested in this study. In order to answer this question, it would be necessary to examine the presence of these genes in bacterial meningitis E. coli, septicemia isolates, and avian pathogenic E. coli, rather than extrapolate data based upon UPEC isolates alone.
![]() View larger version (14K): [in a new window] |
FIG. 3. Identification of UPEC-specific genes in 10 UPEC strains using CGH and genomic analyses. CGH analysis of seven UPEC strains and three fecal/commensal strains in reference to strain CFT073 identified 173 genes as present in all seven UPEC isolates that were not present in any fecal/commensal strain. These 173 genes were then compared to the sequenced UPEC genomes UTI89, 536, and F11, and the fecal/commensal strain HS. Genes present in the seven UPEC strains by CGH and in one, two, or all three strains UTI89, 536, and F11 but absent from HS were determined; 131 genes were identified as UPEC specific. Of these, 37 genes are within genomic islands of 30 kb and 61 genes are annotated as encoding hypothetical proteins.
|
|
View this table: [in a new window] |
TABLE 5. 131 UPEC-specific genes identified using CGH and in silico BSR analysis of ten UPEC and four fecal/commensal strains
|
3 genes with no more than one missing gene (range, 3 to 12 genes; median, 5 genes), with an additional six 2-gene clusters, and 25 individual genes. CDS with hypothetical functions comprise approximately half (61/131 genes) of these UPEC-specific genes. The UPEC-specific group also contains seven CDS predicted to be involved in transcriptional regulation, 12 CDS for ABC transport systems, and the chu gene cluster involved in heme/hemoglobin utilization (103). Relative expression of these 131 genes in vivo and their upregulation in vivo relative to in vitro expression are provided from our previous study (95) (Table 5). More than half (78) of these genes had an in vivo/in vitro ratio of >1, suggesting that they are synthesized as well in vivo as in vitro. Thirty-eight of 131 genes were upregulated more than twofold in vivo. Virulence-associated genes in UPEC strains. While surveys of virulence factors in large strain collections have been conducted previously by us and others (2), we were nevertheless able to make unique observations using this approach. The prevalence of 11 virulence-associated genes or operons from CFT073 (sat, picU, tsh, iha, iroN, sitABCD, fyuA, iucABCD/iutA, chuSA, hlyA, and usp) were assessed in the eight UPEC and three fecal or commensal isolates (Table 6). Pyelonephritis strain CFT204 contains 8 of the 11 virulence-associated genes and appears most closely related to CFT073 in terms of gene content and presence of genomic islands (Table 6 and Fig. 1). The pyelonephritis strains generally contained the most established virulence factors (mean, 6.3), cystitis isolates contained a mean of 4.8 virulence factors, and fecal/commensal strains contained a mean of only 1.3 virulence factors (with none present in E. coli K-12). Both pyelonephritis and cystitis isolates contain significantly more virulence factors than fecal strains do (P < 0.05).
|
View this table: [in a new window] |
TABLE 6. Presence of virulence-associated genes in uropathogenic and fecal/commensal E. coli strains
|
As many as 12 putative fimbrial gene clusters have been identified in CFT073 (95, 104), 10 chaperone-usher family fimbriae, and two type IV pili. Several of these chaperone-usher pathway fimbrial gene clusters were found to be UPEC specific by CGH, including the yad/htr/ecp genes (c0166 to c0172) and CDS c4207 to c4214. In each case, the chaperone-usher genes were the most highly conserved, the adhesive tip protein was the least conserved, and the minor structural subunits showed various degrees of conservation between strains.
Type IV pili, a feature of many gram-negative bacteria, are involved in twitching motility (14, 40, 52). Type IV pili have been associated with adhesion to epithelial cells (30, 43, 100), and the extension and retraction of type IV pili have been shown to directly mediate cell movement (61, 62, 94). The type IV pilin genes c2394 and c2395 were present in all three pyelonephritis isolates and one of four cystitis isolates (F11), but not in fecal/commensal strains by CGH (data not shown). In silico BSR analysis revealed that the type IV pilin genes are present only in UPEC strains 536 and F11.
For iron acquisition, enterobactin (79), also known as enterochelin (68), functions as a catecholate siderophore in E. coli that sequesters iron from the environment and provides it in a soluble form able to be utilized by the organism. The enterobactin gene cluster (ent/fep genes) was present in all 10 E. coli strains analyzed by CGH and all 14 sequenced E. coli strains by in silico BSR analysis (data not shown). The entire iroNEDCB gene cluster, encoding the related enterobactin-like system, was found in three cystitis isolates (F3, F11, and F24) and one fecal/commensal strain (EFC9) by CGH (data not shown). The enterobactin-like gene cluster (iroNEDCB) was identified by in silico BSR analysis as present in only UTI89, 536, and F11, not in 11 other E. coli strains, and therefore, it appears to be UPEC specific.
The yersiniabactin receptor, encoded by the fyuA gene in Yersinia pestis CO92 (76), is 99.9% identical to CFT073 gene c2436 at the nucleotide level. c2436 contains no apparent premature stop codons with reference to Y. pestis. The c2436 gene, annotated as a putative pesticin receptor precursor, is present in all seven UPEC isolates but none of the fecal/commensal strains analyzed by CGH. The in silico BSR screen reveals that gene c2436 is present in the UPEC strains UTI89, 536, and F11 as well as EPEC strain E110019 and EAEC strain 042. The sitABCD operon is an iron transport system in CFT073 that was present in all three pyelonephritis isolates, three of four cystitis isolates and one fecal/commensal strain by CGH. In addition, as determined by BSR analysis, the three UPEC strains and the EAEC strain O42 contain the sitABCD operon, while this iron transport system was absent from 10 other E. coli strains. The chuS (c4307) and chuA genes (c4308), involved in heme/hemoglobin transport and binding, respectively, are present in all seven UPEC strains (three pyelonephritis and four cystitis isolates) but none of the fecal/commensal strains as examined by CGH. The chuSA genes are also present in the UPEC strains UTI89, 536, and F11, the EHEC strains EDL933 and Sakai, and EAEC strain O42 as measured by BSR. However, these genes are absent from all other E. coli strains examined, including the commensal strain HS and the laboratory-adapted commensal strain K-12 MG1655.
All genera within the family Enterobacteriaceae are capable of producing a layer of strain-specific, surface-associated polysaccharides known as capsule (88). Some forms of capsule have been strongly associated with extraintestinal E. coli infections, including urinary tract infection (45, 53, 72). The kpsMT genes of CFT073 encode the ATP-binding cassette (ABC) transporter components of the group II capsule gene locus (10), have been associated with virulence in UPEC (29), and were present in a single pyelonephritis strain (CFT204) in the CGH analysis. The capsule genes in both UPEC and fecal/commensal E. coli were diverse in strains based upon DNA hybridization to the arrays. This likely indicates that different strains express different capsular types.
The autotransporter genes sat and picU, found in CFT073, were identified in only the pyelonephritis strains CFT325 and CFT204, respectively. In contrast, the autotransporter tsh (also referred to as vat or hemoglobin protease) (39) was present in all pyelonephritis isolates and three cystitis isolates (F3, F11, and F24).
The uropathogenic specific protein (usp) is encoded by gene c0133 in CFT073 and was identified in one pyelonephritis isolate and one cystitis isolate but in none of the fecal/commensal strains by CGH (data not shown). Furthermore, BSR analysis supported the results of previous studies showing that the usp gene is UPEC specific. The usp gene was present in all three UPEC strains but none of the EHEC, ETEC, EPEC, rabbit enteropathogenic Escherichia coli, EAEC, or fecal/commensal E. coli isolates (data not shown).
|
|
|---|
This is the first study to use CGH microarray analysis to compare a collection of uropathogenic and fecal/commensal E. coli isolates. This approach permits the identification of genomic islands and genes specific to UPEC isolates. Genomic DNA from three pyelonephritis isolates, four cystitis isolates, and three fecal/commensal E. coli strains were hybridized to an E. coli CFT073 whole-genome microarray. Seven new genomic islands were delineated and characterized in CFT073, the details of the two previously known PAIs in this strain were revised, and a third PAI of CFT073 was analyzed in greater detail than that previously published. The prevalence of established or putative virulence factors of UPEC was analyzed across the 11 E. coli strains. Furthermore, this study has demonstrated that unrelated UPEC isolates hybridize to the CFT073 microarray, with approximately 77% of the CFT073 genes present in other UPEC isolates.
Genes found to be conserved in all UPEC strains but not found in fecal or commensal strains (Table 5) are perhaps not those that would be expected. When one considers urovirulent strains, specific virulence determinants come to mind, including P, S, or F1C fimbriae, hemolysin, cytotoxic necrotizing factor, autotransporters, and certain iron acquisition systems. Despite being characteristic of uropathogenic strains, genes encoding these virulence factors are not found in every strain. On the contrary, half of UPEC-specific genes identified in this study predicted proteins with no known homologs or function. This indicates that these strains possess invariant genes for which the role in uropathogenesis is not known. Our findings also suggest that ABC transport of several unknown substrates may be critical and that the Chu heme uptake system is important. Implication as a virulence factor, however, requires the testing of complementable mutations in the murine model of UTI. Finally, the identification of the target genes of seven predicted transcriptional regulators may reveal more about the mechanisms of urovirulence. Finding that 106 of the 131 genes present in all UPEC strains are found within 22 gene clusters of 2 or more genes indicates that UPEC-specific genes are not randomly distributed in these strains; rather, operons likely encode systems that contribute to pathogenesis or survival in the urinary tract.
Three PAIs, PAI-CFT073-pheV (PAI ICFT073), PAI-CFT073-pheU (PAI IICFT073), and PAI-CFT073-asnT (HPICFT073) have previously been identified in UPEC strain CFT073 (32, 70, 81). Some confusion, however, currently exists in the literature as to the correct annotation of PAIs in CFT073, as subsequent analysis showed that the original annotation of PAI-CFT073-pheV (PAI ICFT073) and PAI-CFT073-pheU (PAI IICFT073) contained errors. We are now able to clarify and expand on these findings based upon our CGH data by using the CFT073 whole-genome microarray. Nomenclature for these new PAIs and GIs has been proposed based upon the existing PAIs in this and other UPEC strains (25, 66) (Table 4). PAI-CFT073-pheV (PAI ICFT073) (32) was originally reported to be 58.0 kb and to contain a pap operon, hemolysin, and iron-regulated genes. Our current study has shown that PAI-CFT073-pheV (PAI ICFT073) is
123 kb in length, has a G+C content of 47%, is located at the pheV tRNA locus and contains hlyA (c3570), the first CFT073 pap operon (c3582-c3593), iha (c3610), sat (c3619), iutA (c3623), iucDCBA (c3624-c3628), antigen 43 precursor (c3655), and kpsTM (c3697-c3698). The original annotation of PAI-CFT073-pheU (PAI IICFT073) (81) contained errors related to rearrangements in cosmid clones, resulting in incorrectly assembled sequence from distinct regions of the genome. This PAI was annotated as being at least 71.7 kb, although insertion sites at the boundary of the island were never identified. We have shown that PAI-CFT073-pheU (PAI IICFT073) is 52 kb in length, has a G+C content of 48%, is located at the pheU tRNA locus, and contains the pap_2 operon (c5179-c5189).
Parham and colleagues (75) recently identified a 100-kb PAI in CFT073 which they reported to be the correctly annotated PAI-CFT073-pheU (PAI IICFT073) originally identified by Rasko et al. (81). However, previous studies of PAI-CFT073-pheU (PAI IICFT073) (8, 81, 96, 104) consistently mention the presence of the second pap operon of CFT073 (c5179-c5189) within this island. The PAI identified by Parham and colleagues is identical to genomic island 1 identified in this study (c0253-c0368). However, since this PAI does not contain the pap_2 operon, it cannot be referred to as PAI-CFT073-pheU (PAI IICFT073); we therefore propose that this PAI be renamed PAI-CFT073-aspV (PAI IIICFT073).
The HPI of Yersinia pestis encodes the yersiniabactin iron acquisition system (18). The HPI has been identified in members of the Enterobacteriaceae family that are pathogenic to humans (91) and was present in 71% of E. coli urine isolates and 75% of E. coli blood isolates (90). Although the HPI has been documented in E. coli CFT073 previously (70), it was not well characterized. The HPI of CFT073 has subsequently been reported to contain premature stop codons in several genes, corresponding to an absence of detectable yersiniabactin production (16). The HPI of E. coli CFT073 (PAI-CFT073-asnT [HPICFT073]) has been further examined in Table 4. In this study, the PAI-CFT073-asnT (HPICFT073) was present in 100% of pyelonephritis and cystitis isolates but none of the fecal/commensal strains. In three pathogenic Yersinia species, the HPI was inserted at one of three asn tRNA genes (17) and the PAI-CFT073-asnT (HPICFT073) is also located at an asn tRNA gene in CFT073. It should be noted that data from CGH microarrays indicate the presence of only DNA sequences and does not indicate the functionality of a CDS.
These genomic islands are frequently associated with tRNA genes, generally have G+C contents that differ from that of CFT073, and contain integrases, transposases, and phage sequences, all of which are common characteristics of bacterial PAIs (33, 34). In each of the genomic and phage islands, the majority of CDS predict hypothetical or putative functions, which is highly suggestive of additional genes with potential roles in virulence. For example, PAI-CFT073-aspV (PAI IIICFT073) is 100 kb in length and contains 99 hypothetical or putative CDS, GI-CFT073-selC is 68 kb and contains 76 CDS with hypothetical or putative functions, and even the most well-characterized PAI of CFT073, PAI-CFT073-pheV (PAI ICFT073), is 123 kb and contains 86 uncharacterized CDS.
Sequence alignments revealed that two of the genomic islands in strain CFT073 were found in five sequenced bacterial genomes.
-CFT073-b0847 was identified in E. coli E2348/69 (EPEC), Salmonella enterica serovar Typhi TY2, and Salmonella enterica serovar Typhimurium LT2, whereas PAI-CFT073-asnT (HPICFT073) was present in E. coli O42 (EAEC) and Yersinia pestis CO92. The 11 remaining PAIs were not identified in their entirety in any of the strains analyzed. Some of the PAIs appeared to have been composed of smaller genomic islands, indicated by internal insertion sequences and differences in gene content between strains. Over time, these smaller genomic regions may have become parts of larger islands that acquire virulence genes and are mobilized together between strains as PAIs. Alternatively, these regions may be remnants of larger islands that have been lost over time in these isolates. PAIs frequently have a mosaic-like structure which has been generated by a multistep process of genomic acquisition, loss, and rearrangement (34).
In a comparative analysis of newly sequenced E. coli 536, Brzuszkiewicz and colleagues noted that the primary difference between strains 536 and CFT073 was restricted to large PAIs that were unique to 536 or CFT073 (16). Indeed, they were able to predict or confirm the presence of six islands in CFT073 associated with aspV, serX, selC, aspV, pheV, and pheU. In our study, we precisely delineated these islands along with additional islands. Interestingly, they noted that 432 genes were present in both 536 and CFT073. Using additional strains in our analysis, we restricted the number of genes common to UPEC to 131.
The eight UPEC isolates were of the common UPEC-associated O serogroups (O1, O2, O4, O6, O7, O8, O16, O18, O25, and O75) (48, 71). Several serotypes were found in more than one UPEC isolate, confirming that randomly selected UPEC isolates demonstrate similarities in serotypes. Two of the pyelonephritis strains had the serotype O6:H1, two cystitis isolates were O18:H7, and the representative cystitis isolate F11 (97) had the same serotype as another well-characterized UPEC isolate, E. coli 536 (O6:H31) (35).
A strong correlation between the production of class III P-fimbrial adhesin (papG allele III),
-hemolysin (hly), S-fimbrial adhesin (sfa), and cytotoxic necrotizing factor 1 (cnf1) has been shown by Mitsumori and colleagues (63), with 87% of UPEC strains analyzed containing these four genes. Similarly, 75% of the cystitis isolates in this study were positive for cnf1, papG allele III, sfa and hly, whereas this profile was not observed for any of the pyelonephritis or fecal/commensal E. coli strains (Table 2). The class III papG allele is predominantly found in cystitis isolates (1, 63, 99). As shown in Table 2, 75% of the cystitis isolates contained papG allele III, whereas 80% of the pyelonephritis isolates contained papG allele I.
One of the most striking findings of this study was high prevalence of iron acquisition systems in UPEC isolates and the obvious importance of iron sequestration and transport in the urinary tract. An analysis of the enterobactin (ent/fep), enterobactin-like (iro), aerobactin (iuc/iut), yersiniabactin (fyu), iron transport (sit), and heme (chu) systems clearly illustrated the importance of iron for the survival of UPEC in the urinary tract. All seven UPEC isolates, in addition to CFT073, contained between three and five of these iron acquisition systems, with an average of four per strain. The fecal/commensal strains contained two or three iron-related operons, while K-12 contained only the enterobactin system, which was present in all 10 E. coli strains examined. The enterobactin-like genes were predominantly found in cystitis strains, whereas the aerobactin system was more prevalent in pyelonephritis isolates. In contrast, the heme/hemoglobin gene cluster (chu) was found in almost all UPEC isolates but was absent in the fecal/commensal strains. Torres and colleagues showed that an isogenic chuA mutant of CFT073 was significantly outcompeted by the wild-type strain in both the bladders and kidneys of mice (103), and the chu locus has shown to be associated with ExPEC isolates causing neonatal meningitis (12). Heme/hemoglobin utilization may be more important in the later stages of a UTI, where heme and hemoglobin are released following the lysis of host cells. The prototypic pyelonephritogenic isolate CFT073 contains all six iron systems mentioned above, five of which were highly upregulated in vivo (95). This redundancy in iron acquisition systems may provide a competitive advantage to UPEC in vivo in terms of growth and survival over E. coli strains lacking these alternative iron acquisition systems.
CGH analysis does have limitations and may not accurately represent genes with divergent sequences at the nucleotide level. The array signal is dependent upon DNA hybridization to the probes, and low sequence identity results in poor recognition of the probe sequence. The use of at least 17 probes for every CDS in CFT073 partially compensates for this, as regions of minor sequence divergence generally do not adversely affect overall hybridization. In contrast, substantial divergence across the entire gene sequence results in low normalized signal intensities from the array. Genes, operons, or genomic islands that are absent are generally evident, resulting in regions with very low normalized data signals. Similarly, genes that are clearly present give high normalized signals. However, genes that have divergent nucleotide sequences tend to give values close to the cutoff value, often with some CDS in an operon appearing present and others appearing absent. This was observed with the fimbrial genes of CFT073, which showed substantial sequence divergence and consequently hybridized poorly to the microarray. PCR analysis revealed that all UPEC strains contained either papG allele I or papG allele III, and yet the microarray suggested that the pap operon was absent from or only partially present in all strains. Although the type 1, P, and S/F1C operons gave variable results, there were clear trends within these data. The adhesin genes were the least conserved betweens strains, whereas the chaperone-usher genes were most conserved. Chaperone-usher genes perform very similar functions in strains and, therefore, require a similar structure, as sequence divergence would only reduce their efficiency. In contrast, it is beneficial for bacterial pathogens to differ in their adhesin moieties, which may provide an advantage for survival in different niches. It has been proposed that the three papG alleles in UPEC confer differences in receptor binding specificity, resulting in differences in host range (59, 101) or clinical presentation (46, 73), and a similar argument can be made for other adhesins. The only exception to this observation was the fimH adhesin of type 1 fimbriae, which was highly conserved among all E. coli strains analyzed and was present in 100% of strains by microarray (data not shown). Type 1 fimbriae are found in more than 90% of uropathogenic and commensal E. coli strains (4, 36, 47, 106) but nevertheless contribute significantly to the virulence of UPEC isolates (21, 31, 55). A recent CGH study comparing 11 ExPEC isolates, all E. coli K1 strains from the cerebral spinal fluid of patients with meningitis, showed sequence divergence in the adhesin gene of F1C fimbriae and of another gene, hek, identified only as an adhesin/virulence factor (107). These findings support the hypothesis that adhesin genes are not highly conserved between E. coli strains.
The hybridization of genomic DNA to conventional microarrays is a powerful approach to studying the genomic content of multiple bacterial strains, and comparisons between pathogenic and commensal isolates permit the identification of novel virulence factors. One weakness of the CGH approach is that only genes present in the array strain can be analyzed in other strains. Nevertheless, sequenced bacterial genomes are generally based upon a representative isolate from a specific disease or clinical syndrome and, therefore, will contain numerous virulence factors, including strain- or subtype-specific genes or PAIs. Whole-genome analysis provides data on a scale that cannot be compared to any other technique, allowing insight into the genomic content of an entire organism(s) and the ability to identify trends across strains.
The E. coli CFT073 genome contains 5,379 CDS, and therefore, an analysis of these genes across multiple strains provides a much broader and more extensive understanding of UPEC isolates and how the gene content compares to fecal/commensal E. coli strains. This is the first study using both experimental and bioinformatic approaches to compare the genomic content of a collection of uropathogenic and fecal/commensal E. coli isolates. One of the most significant findings was the identification and characterization of seven additional genomic islands in strain CFT073, opening the way for subsequent studies of the many CDS that have been annotated with hypothetical or putative functions as well as closer comparisons between the CFT073 PAIs and other well-characterized UPEC PAIs, such as strains 536 (25) and J96 (102).
Funding for this research was provided by Public Health Service grants AI043363 and AI059722 from the National Institutes of Health.
Published ahead of print on 23 February 2007. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»