Previous Article | Next Article ![]()
Journal of Bacteriology, January 2009, p. 347-354, Vol. 191, No. 1
0021-9193/09/$08.00+0 doi:10.1128/JB.01238-08
Copyright © 2009, American Society for Microbiology. All Rights Reserved.
,
Division of Bioenvironmental Science, Frontier Science Research Center,1 Division of Microbiology, Department of Infectious Diseases, Faculty of Medicine, University of Miyazaki, Miyazaki, Japan,3 Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom,2 School of Immunity and Infection, University of Birmingham, Birmingham, United Kingdom,4 Department of Biological Information, School and Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Kanagawa, Japan,5 Institute of Cell and Molecular Biosciences, University of Newcastle, Newcastle upon Tyne, United Kingdom,6 Centre for Molecular Microbiology and Infection, Division of Cell and Molecular Biology, Imperial College London, London, United Kingdom7
Received 5 September 2008/ Accepted 15 October 2008
|
|
|---|
|
|
|---|
Whole-genome sequencing approaches have revealed that E. coli has a conserved core of genes common to both commensal and pathogenic strains. The conserved genome framework is decorated with genomic islands and small clusters of genes that have been acquired by horizontal gene transfer and that in pathogenic strains are often associated with virulence (for a review, see reference 32). EPEC strains provide a striking example of a pathovar highly adapted to virulence in the human intestine (8, 15), but until now no EPEC strain has been fully sequenced.
EPEC was the first pathovar of E. coli to be implicated in human disease (4) and remains a leading cause of infantile diarrhea in developing countries (for a review, see reference 6). However, because EPEC strains were found not to invade cells or release diffusible toxins, doubts about their pathogenic potential were raised in the 1960s and 1970s. However, induction of diarrhea in human volunteers (21) provided the decisive evidence that EPEC is a true human pathogen. As a result of this study, one of the strains tested, E2348/69 (serotype O127:H6), isolated in Taunton, United Kingdom, in 1969, became the prototype strain used globally to study EPEC biology and disease. Indeed, E2348/69 is probably the most-studied pathogenic E. coli strain, and until now it was impossible to place the vast amount of biological data in a genomic context.
Typical EPEC strains, which belong to a limited number of O serogroups, contain the EPEC adherence factor plasmid that encodes the bundle-forming pilus (BFP) (10) and also contain the gene regulator locus per (for a review, see reference 6). Typical EPEC strains are further divided into four distinct lineages, EPEC lineages 1 to 4 (18); E2348/69 belongs to EPEC lineage 1 and to the B2 phylogroup.
The hallmark of EPEC infection is formation of distinct attaching and effacing (A/E) lesions, which are characterized by effacement of the brush border microvilli and intimate bacterial attachment (for reviews, see references 6 and 8). The ability to induce A/E lesions is encoded on a pathogenicity island termed the locus of enterocyte effacement (LEE), which is also present in O157 and non-O157 EHEC strains and the mouse pathogen Citrobacter rodentium (24, 25; for a review, see reference 9). The LEE encodes the adhesin intimin, the structural components of a type III secretion system (T3SS) involved in translocation of effector proteins into the mammalian host cell, gene regulators, chaperones, translocators, and seven effector proteins (EspB, EspF, EspG, EspH, EspZ, Map, and Tir) (for a review, see reference 9). Recent studies have shown that the EPEC strain E2348/69 genome encodes several additional non-LEE effectors, including EspJ (23), EspG2, and EspI/NleA, as well as NleB, NleC, NleD, NleE, and NleH (for a review, see reference 9). Additional putative virulence factors include the autotransporter protein EspC (26), lymphostatin (LifA) (17), and several fimbrial operons. Here we report the genome sequence of E2348/69, describe a bioinformatics survey of this strain's virulence factors, and present the results of comprehensive comparative studies performed with commensal and other pathogenic E. coli strains.
|
|
|---|
The whole genome was sequenced to a depth of 8x coverage by using pUC19 (insert size, 2.8 to 5 kb) and pMAQ1b (insert size, 5.5 to 10 kb) small-insert libraries and dye terminator chemistry with ABI3700 automated sequencers. End sequences from larger-insert plasmid (pBACe3.6 [insert size, 20 to 30 kb]) libraries were used as a scaffold. The sequence was assembled and finished as described previously (33).
Gene prediction and annotation and comparative analysis. Protein-encoding sequences (CDSs) were identified using GeneHacker (43), followed by manual inspection of start codons and ribosome binding sequences of each CDS. Intergenic regions that were >150 bp long were further reviewed to determine the presence of small CDSs encoding proteins with significant homology to known proteins. Functional annotation of the CDSs was performed on the basis of the results of homology searches with the public nonredundant protein database (http://www.ncbi.nlm.nih.gov/) using BLASTP. Genes for tRNAs, transfer-messenger RNA, rRNAs, and other small RNAs were identified by using the Rfam database (11) at the Rfam website (http://www.sanger.ac.uk/Software/Rfam/index.shtml). We also searched the E2348/69 genome for all the RNA genes that have been identified in K-12 and Sakai by using BLASTN. Figure S1 in the supplemental material shows the methods used for cluster analysis of the E2348/69 CDSs and for genomic comparison with eight E. coli genomes.
Nucleotide sequence accession numbers. The annotated genome sequences of E2348/69 have been deposited in public databases under accession number FM180568 for the complete genome and under accession numbers FM180569 and FM180570 for EPEC strain E2348/69 plasmids pMAR2 and pE2348-2, respectively.
|
|
|---|
![]() View larger version (31K): [in a new window] |
FIG. 1. Circular maps of the chromosome and plasmids of EPEC strain E2348/69. (A) EPEC strain E2348/69 chromosome. From the outside in, the first circle shows the locations of PPs and IEs (purple, lambda-like PPs; light blue, other PPs; green, IEs and the LEE element), the second circle shows the nucleotide sequence positions (in Mbp), the third and fourth circles show CDSs transcribed clockwise and anticlockwise, respectively (gray, conserved in all eight other sequenced E. coli strains; red, conserved only in the B2 phylogroup; yellow, variable distribution; blue, E2348/69 specific), the fifth circle shows the tRNA genes (red), the sixth circle shows the rRNA operons (blue), the seventh circle shows the G+C content, and the eighth circle shows the GC skew. (B) EPEC strain E2348/69 plasmids. The boxes in the outer and inner circles represent CDSs transcribed clockwise and anticlockwise, respectively. Pseudogenes are indicated by black boxes, and other CDSs are indicated by the colors described above for panel A.
|
|
View this table: [in a new window] |
TABLE 1. Comparison of general genome features of EPEC strain E2348/69 and eight other sequenced E. coli strains
|
![]() View larger version (82K): [in a new window] |
FIG. 2. Dot plot presentation of DNA sequence homologies between the chromosomes of E2348/69 and eight sequenced E. coli strains. The chromosome sequence of E2348/69 was compared with the chromosome sequences of eight sequenced E. coli strains. The locations of PPs and IEs on the E2348/69 chromosome are indicated.
|
![]() View larger version (35K): [in a new window] |
FIG. 3. Genome organizations of four PPs and five IEs carrying E2348/69 virulence-related genes. The four phages are lambda-like phages. Homologous genes in the lambda (accession no. NC_001416) and four PP genomes are indicated by gray shading. T3SS effectors are encoded on three lambda-like phages (PP2, PP4, and PP6) and four IEs (IE2, IE5, IE6, and LEE). Since the NleE family gene on IE2 contains an in-frame 168-bp deletion, it may be nonfunctional.
|
We also identified a total of 117 IS elements or fragments of IS elements in E2348/69, which were classified into 41 types based on sequence similarity (see Table S3 in the supplemental material). The most abundant IS element is ISEc13 (a total of 30 copies). ISEc21 is a newly identified IS element belonging to the IS110 family. E2348/69 contains six copies of ISEc21.
Genomic comparison with commensal and other pathogenic E. coli strains. We performed an all-against-all reciprocal BLASTP comparison of the complete gene sets of E2348/69 and the eight previously sequenced E. coli strains. However, large variations in gene predictions confounded this analysis. In order to avoid biases introduced by differences in gene prediction, we compared the E2348/69 gene set with the genomes of the other E. coli strains using one-way comparisons with TBLASTN.
We first identified how many of the E2348/69 genes were unique and how many belonged to paralogous gene families. To this end, we performed a cluster analysis of the 4,656 E2348/69 CDSs (pseudogenes were excluded from this analysis) using BLASTP, which yielded 4,419 unique genes or singlets and 69 gene families containing more than one member. We then performed a TBLASTN analysis using the unique genes and one representative gene from each of the 69 gene families for a comparison with the eight E. coli genomes (see Fig. S1 in the supplemental material).
Figure 4 shows that more E2348/69 genes are conserved (as defined by
90% identity and
60% overlap) in the four strains belonging to phylogroup B2 than in strains belonging to other phylogroups. Of the 4,488 E2348/69 genes or gene families, 3,141 (70%) (which include no IS transposase genes) are conserved in all nine strains examined. With one exception, all of the 3,141 genes were found to be on the chromosomal backbone outside PPs and IEs (E2348_C_1118 was on IE2, which encodes a predicted protein). In contrast, the majority of the E2348/69-specific genes (349/424) were found to be in PPs and IEs (319 genes) or plasmids (30 genes). The remaining 75 E2348/69-specific genes on the chromosome backbone include genes for O127 antigen biosynthesis and two restriction-modification systems, a D-arabitol utilization operon (alt), a retron element, and three fimbrial biosynthesis operons (see Table S4 in the supplemental material). The E2348/69-specific BFP fimbrial operon was found to be in the pMAR2 plasmid. In addition, two E2348/69-specific genes for tRNAAsn (codon AAC) and tRNAThr (codon ACA) were found to be "cargo" on PP8 (see Fig. S4 in the supplemental material).
![]() View larger version (39K): [in a new window] |
FIG. 4. Conservation of E2348/69 genes in the eight sequenced E. coli strains. The numbers of E2348/69 genes conserved in each of the eight sequenced E. coli strains are indicated. Of the 4,492 genes (or multicopy gene families) identified in E2348/69, 3,141 (indicated by gray) were conserved in all the E. coli strains. Other genes were classified into three groups according to their locations in the E2348/69 genome (in plasmids, in PPs and IEs, and in other chromosome regions).
|
Although not conserved across the entire B2 phylogroup, 119 genes were found exclusively in strains belonging to this group (see Table S5 in the supplemental material). Since most of these genes are carried on PPs, IEs, or plasmids, they are unlikely to be true orthologues present in the last common phylogroup B2 ancestor. However, there were a few notable exceptions, including the operon for sucrose utilization that occurs in the same chromosomal context in E2348/69 and UPEC strain 536.
Interestingly, of the four non-phylogroup B2 strains, EPEC strain E2348/69 shares the highest number of genes with EHEC strain Sakai (Fig. 4; see Fig. S1 in the supplemental material). Most of the shared genes are carried on PPs and IEs, suggesting that there was independent acquisition through horizontal gene transfer rather than inheritance through vertical descent.
Unexpectedly, we found that all of the genes encoding (pdu) and regulating (pocR) the coenzyme B12-dependent degradation of 1,2-propanediol are present only in E2348/69 and ETEC strain E24377A (see Fig. S6 in the supplemental material). The pdu operon and pocR are found in the same genetic context, alongside the cobTSU genes encoding parts I and III of the cobalamin biosynthetic pathway, as in Salmonella (19). In contrast to Salmonella, the cbi operon, encoding the endogenous biosynthesis of coenzyme B12, has been deleted in the two E. coli strains and the cob-pdu locus is highly divergent in E. coli (see Fig. S6 in the supplemental material). The significance of this diversity is not clear, but the locus was most likely inherited from an E. coli-Salmonella common ancestor and has undergone extensive deletion and rearrangement in multiple lineages of E. coli.
Functional gene loss. E2348/69 possesses 168 pseudogenes that have frameshifts or premature stop codons or are remnants of genes present in other bacteria (see Table S6 in the supplemental material). Pseudogenes occur about three times more frequently in plasmids, PPs, or IEs (64/869 CDSs) than in the chromosome backbone (101/3,965 CDSs). Pseudogenes found in accessory chromosome regions and in the plasmids were largely remnants of IS element insertion events or were genes related to phage functions (see Table S6 in the supplemental material). However, several of the pseudogenes in PPs are associated with virulence, including genes encoding multiple T3SS effector proteins. Likewise, several of the pseudogenes found in the chromosome backbone are associated with survival in the host, including the genes that disrupt four fimbrial operons and the dmsA gene required for the anaerobic use of dimethyl sulfoxide as a terminal electron acceptor and a remnant of the gene encoding hemolysin E. Interestingly, hlyE is intact in MG1655 and EHEC strain Sakai but is inactivated in the other six E. coli strains compared. The deletion in the E2348/69 hlyE gene is identical to those in the other phylogroup B2 strains. Furthermore, the E2348/69 dsdA gene, encoding D-serine deaminase implicated in D-serine catabolism, which has been shown to be involved in UPEC virulence (39), contains a frameshift mutation. In addition to point mutations and deletions there is also evidence of metabolic streamlining through insertional inactivation; two genes involved in ethanolamine utilization, eutC and eutA, have been inactivated by insertion of Mu-like phage (PP9) and ISEc13, respectively. Likewise, the gene encoding the L-arabinose transporter (araH) and dgoD, required for use of galactonate as a carbon source, have also been disrupted by ISEc13 elements.
Virulence determinants. Since its discovery (23), it has been apparent that the LEE plays a major role in EPEC pathogenesis. Use of random and targeted discovery programs led to identification of several additional chromosomal and plasmid-encoded virulence factors. When these results were supplemented with the results of homology searches, the genome project data revealed an E2348/69 virulence repertoire that consists of the following elements.
(i) Afimbrial adhesins and outer membrane proteins. LifA (17) and intimin (14) are major E2348/69 afimbrial adhesins. We also found a second LifA homologue and a Saa autoagglutinating adhesin-like protein. E2348/69 Saa is almost identical to the homologues found in EHEC O157 strain Sakai, UPEC strain UTI89, and APEC (95, 97, and 97% amino acid sequence identity, respectively), but it is only distantly related (30% sequence identity) to Saa found in a LEE-negative Shiga toxigenic E. coli serotype O113:H21 strain (34). E2348/69 also has five intact genes and four pseudogenes encoding afimbrial adhesin homologues or adhesin-like proteins (see Table S7 in the supplemental material). The E2348/69 gene C_2704 encodes an adhesin-like autotransporter that is 70% similar to Salmonella enterica ShdA (16). Upstream of C_2704 is C_2705, which encodes a glycosyltransferase of the type associated with glycosylation of a subset of autotransporter proteins (28). Although thought to be confined to the intestinal lumen, E2348/69 encodes five Lom (also known as Ail, OmpX, or PagC) family proteins, which are outer membrane proteins implicated in cell adhesion, resistance to complement-dependent killing, and survival in macrophages (1), and a homologue of SfpA, a porin implicated in survival of Yersinia enterocolitica during systemic infection (27).
(ii) Fimbrial adhesins. E2348/69 has eight intact and five incomplete fimbrial operons. The complete fimbrial operon repertoire of E2348/69 in the context of the sequenced E. coli strains is shown in Table S8 in the supplemental material. Among the intact operon products, BFP play a role in microcolony formation in vitro (10) and diarrhea in a human volunteers (2), while long polar fimbriae play no obvious role in cell adhesion in vitro (7). The function of the other fimbrial operons has not been elucidated yet.
(iii) T2SS, iron uptake systems, and virulence gene regulators. The sequenced E. coli strains each encode one or two type II secretion systems (T2SS) (see Fig. S7 in the supplemental material). E2348/69 contains a single T2SS, encoded in the gspM-yghJ locus; although its substrate remains unknown, the heat-labile enterotoxin is secreted by this T2SS in ETEC strain H10407 (40). E2348/69 contains six iron uptake systems (see Table S9 in the supplemental material). These systems include three systems that are not present in K-12 but are largely conserved in UPEC, APEC, and EHEC strain Sakai.
Regulation of virulence genes and coordinate regulation of virulence and housekeeping genes are required for virulence. Major regulators for the E2348/69 virulence genes are encoded on the LEE (Ler, GrlA, and GrlR) and on pMAR2 (Per). In addition, the RcsBCD two-component system (TCS) is known to regulate production of the exopolysaccharide colanic acid, and two TCSs (CxpAR and EvgAS) regulate LEE and BFP gene expression (22, 29, 30). All other TCSs that are widely distributed in E. coli strains are conserved in E2348/69; this includes several TCSs (e.g., QseEF [36]) that have been shown to be involved in virulence in other pathogenic E. coli strains (see Table S10 in the supplemental material).
(iv) T3SSs and their effectors. E2348/69 encodes 21 T3SS effectors and contains 6 effector pseudogenes, while EHEC strain Sakai encodes 50 effectors and contains 12 effector pseudogenes (41) (Table 2). Recently, we have shown that EPEC lineage 2 strain B171 (serotype O111:NM) encodes at least 28 effectors and contains 12 pseudogenes (31). Twelve of the E2348/69 effectors are encoded by PPs (PP2, PP4, and PP6), seven are encoded by IEs (IE2, IE5, and IE6), and seven are encoded by the LEE (Fig. 3; see Table S2 in the supplemental material). As in EHEC strain Sakai, the three effector-transducing phages are lambda-like, and the effectors are encoded in the regions called exchangeable effector loci downstream of tail fiber genes (Fig. 3).
|
View this table: [in a new window] |
TABLE 2. Comparison of T3SSs and effectors of 10 E. coli strains
|
Concluding remarks. In this study we identified 424 E2348/69-specific genes, most of which are carried on PPs, IEs, or plasmids. We also identified a number of genetic traits that are specific for the phylogroup B2 strains irrespective of the pathotype, including the absence of the ETT2-related T3SS, which is present in E. coli strains belonging to all other phylogroups.
Interestingly, we found that the T3SS of E2348/69 is much simpler than the T3SSs of EHEC strain Sakai and EPEC strain B171. The LEE of E2348/69 is the smallest LEE and consists only of the core 41 CDSs. Moreover, compared with the genomes of EPEC strains B171 (not complete [35]; accession no. AAJX00000000) (31), E22 (not complete [35]; accession no. AAJV00000000), and E110019 (not complete [35]; accession no. AAJW00000000), which encode at least 28 (plus 12 pseudogenes), 40 (plus 6 pseudogenes), and 24 (plus 13 pseudogenes) known effectors, respectively, EPEC E2348/69 has a smaller effector repertoire (only 21 intact effectors) but nonetheless an effector repertoire which is sufficient for colonization and human disease. Importantly, we cannot exclude the possibility that any of the EPEC strains encode novel, yet-to-be-characterized T3SS effectors. Nonetheless, the simplicity of the virulence gene set of E2348/69 provides the first opportunity to fully dissect the entire virulence strategy of A/E pathogens in the genomic context.
This work was supported by The Wellcome Trust and by grants-in-aid for scientific research [Kiban (B) and Priority Areas "Applied Genomics"] from the Ministry of Education, Science, and Technology of Japan.
Published ahead of print on 24 October 2008. ![]()
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»