Scott A. Beatson,1,
Julian Parkhill,2 and
Mark J. Pallen1*
Bacterial Pathogenesis and Genomics Unit, Division of Immunity and Infection, Medical School, University of Birmingham, Birmingham,1 The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom2
Received 2 September 2004/ Accepted 15 October 2004
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Curiously, one area of bacteriology in which E. coli K-12 has been eclipsed as a model organism is the study of flagellar biosynthesis, assembly, and regulation. In this area, Salmonella enterica serovar Typhimurium strain LT2 has been the most commonly used model organism (34, 35). Nonetheless, it has been assumed that the genetics and physiology of flagellar systems are essentially the same in E. coli and S. enterica; minor differences include a tap receptor gene in E. coli but not in S. enterica and an fliB flagellar methylase gene and a phase 2 locus in S. enterica but not in E. coli (12, 34). In a similar vein, we have recently concluded from comparative sequence analysis that the S. enterica-E. coli model of flagellar function holds up surprisingly well even when it is generalized to bacteria that are only distantly related to E. coli (43). However, there are at least two challenges to this paradigm. First, unlike the E. coli-S. enterica archetype, many flagellar systems rely on the alternative sigma factor RpoN as a key facet of gene regulation (11, 27, 28, 30, 55). Second, some gamma-proteobacterial species (Aeromonas hydrophila and Vibrio parahaemolyticus) have been shown to utilize two distinct flagellar systems for motility, a polar system for swimming in the liquid phase and a lateral system for swarming over solid surfaces (31, 32, 37).
Studies initially with uropathogenic E. coli (UPEC) and later with other pathotypes suggested that E. coli strains often acquire new complex pathogenic phenotypes in a single step by the acquisition of pathogenicity islands, which contain virulence genes clustered on the chromosome and which are acquired en bloc by horizontal gene transfer (21, 22). More recently, the island concept has been generalized to encompass almost any horizontally acquired gene cluster or even any region in which there is a difference between two genomes. The most striking example of the latter expansion of use occurred when the first genome sequence of a pathogenic strain, enterohemorrhagic E. coli O157:H7, was compared to the K-12 genome sequence (47).
Recently, uncritical adoption of the island concept in genome annotation has faced several challenges. It is now clear that in the K-12-O157 comparison, neither the presumed polarity of change (i.e., the presumption that an O-island is an insertion in O157 relative to the K-12 backbone) nor the ancestral status of K-12 can be justified in all cases. For example, we described a striking case in which O-island 115 is part of a much larger gene cluster, ETT2, associated with type III secretion, and in which the essential difference between O157 and K-12 at this locus is a deletion in K-12 rather than an insertion into the O157 genome (i.e., O157 reflects the ancestral state better than K-12) (48). Furthermore, some so-called pathogenicity islands, even in UPEC, are more fluid than first thought (54). Thus, rather than a fixed core of housekeeping genes supplemented by a limited set of optional islands, the E. coli genomes are perhaps better viewed as frequently redrafted palimpsests, subject to repeated rounds of insertion, deletion, and rearrangement.
In addition to within-species alignments, comparison of the E. coli K-12 genome with the genome of S. enterica LT2 might be seen as a way of defining the E. coli backbone and the E. coli genomic islands (39). Such a comparison reveals a small but puzzling difference in the flagellar gene repertoires. K-12 possesses an additional pair of divergent, promoterless genes, fhiA-mbhA, an apparent flagellar islet that is absent from S. enterica (39). These genes appear to encode incomplete homologs of FlhA and MotB. We therefore examined the genomic context of the mbhA-fhiA genes in 11 different genome sequences from Escherichia/Shigella strains. We were surprised to discover that these genes represent a remnant of an ancestral gene cluster present in around one-fifth of E. coli strains that is potentially capable of encoding a novel flagellar system previously overlooked in this intensively studied model organism.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Sequence analysis. The Flag-2 gene cluster was initially identified in the unfinished genome of EAEC strain 042 by using BLASTP searches with E. coli K-12 flagellar protein sequences against E. coli 042 GLIMMER (14)-predicted coding sequences (CDSs) (available at http://vge.ac.uk/; genome sequence data downloaded from http://www.sanger.ac.uk/ on 10 December 2003). Systematic gene names are those provided by the Sanger Institute for the complete genome of E. coli 042 (http://www.sanger.ac.uk/Projects/Escherichia_Shigella/). Subsequent BLASTP and PSI-BLAST searches of the nonredundant protein and nucleotide databases (http://www.ncbi.nlm.nih.gov/) and unfinished microbial genomes (http://vge.ac.uk/) resulted in identification of equivalent gene clusters in the complete genomes of Vibrio parahaemolyticus, Yersinia pestis strains KIM and CO92, and Chromobacterium violaceum (10, 15, 36, 44), in the complete but unannotated genome of Yersinia pseudotuberculosis (ftp://bbrp.llnl.gov/pub/cbnp/y.pseudotuberculosis/), in the incomplete genome of Citrobacter rodentium (http://www.sanger.ac.uk/Projects/C_rodentium/), and in previously annotated clusters from Aeromonas species (19, 41).
When possible, comparative analyses of the regions surrounding and containing Flag-2 gene clusters were performed and visualized by using the coliBASE server (http://colibase.bham.ac.uk) (13), and these analyses covered the complete or nearly complete genome sequences of 12 Escherichia/Shigella strains, 11 Salmonella strains, and selected other bacterial pathogens; the organisms used included the laboratory strains E. coli K-12 strain MG1655, E. coli K-12 strain W3110, and E. coli strain DH10B; UPEC strain CFT073; enterohemorrhagic E. coli O157:H7 strains EDL933 and RIMD 0509952 (= Sakai); EAEC strain 042; enteropathogenic E. coli strain E2348/69; Shigella flexneri 2a strains 2457T and 301; Shigella dysenteriae strain M131649 (= M131); Shigella sonnei strain 53G; Salmonella enterica serovar Typhi strains CT18 and Ty2; S. enterica serovar Typhimurium strains LT2, DT104, and SL1344; Salmonella enteritidis strains LK5 and PT4; the lesser known salmonellae Salmonella bongori, Salmonella dublin, Salmonella gallinarum strain 287/91, and Salmonella pullorum; Yersinia pestis strains CO92 and KIM; Yersinia enterocolitica strain 8081; and C. rodentium (9, 15, 16, 24, 29, 40, 44, 46, 47, 53, 54).
Detailed analyses of the Flag-2 clusters of E. coli 042, V. parahaemolyticus, Y. pestis strains KIM, CO92, and 91001, Y. pseudotuberculosis, C. violaceum, and C. rodentium were carried out by using stand-alone BLAST (4) to confirm the presence of positional orthologs and CLUSTALW (51) to align orthologous protein sequences. When appropriate, E. coli 042 GLIMMER-predicted CDSs were shortened to more closely match the gene lengths of corresponding orthologs from V. parahaemolyticus and Y. pestis genomes and to minimize the overlap between adjacent genes. SEAVIEW (18) was used to visualize multiple alignments, and ARTEMIS (49) was used to annotate the E. coli 042 Flag-2 region. Promoter and sigma factor binding sequences were predicted by using promscan (http://www.promscan.uklinux.net/). All sequence analyses were carried out with a Macintosh G5 computer.
Strains. The ECOR strain collection was kindly provided by Thomas Whittam and has been described elsewhere (http://foodsafe.msu.edu/whittam/ECOR). Representatives of other pathotypes, including NMEC (E. coli associated with neonatal meningitis) strain RS218, EAEC strain 042, enterotoxigenic E. coli strain H10407, EAEC strain EAEC25, UPEC strain CFT073, and E. coli strain K-12 were kindly provided by Ian Henderson (University of Birmingham), while an isogenic nontoxigenic derivative of the E. coli O157:H7 Sakai strain was a kind gift from Chihiro Sasakawa (University of Tokyo).
PCR. Genomic DNA from each strain was extracted with a Puregene isolation kit (Flowgen, Ashby-de-la-Zouch, United Kingdom) and was stored at 4°C. Primers were designed by using the Primer3 software on the coliBASE server (http://colibase.bham.ac.uk). Primer sequences are listed in Table 1. For short PCRs, each 20-µl reaction mixture contained 1 U of Taq polymerase (Invitrogen, Renfrew, United Kingdom) in the buffer supplied by the manufacturer, 20 ng of genomic DNA, and each deoxynucleoside triphosphate at a concentration of 250 µM. The short PCR conditions were 30 cycles of 30 s at 94°C, 30 s at 62°C, and 30 s at 72°C, followed by a 7-min extension at 72°C. Long PCRs were performed by using TaKaRa LA Taq (Cambrex Bio Science, Wokingham, United Kingdom) in the buffer supplied by the manufacturer. Each 25-µl long PCR mixture contained 60 ng of genomic DNA, 5 pmol of each primer, each deoxynucleoside triphosphate at a concentration of 250 µM, and 1 U of TaKaRa LA Taq; the reaction conditions were 30 cycles of a two-step program consisting of 20 s at 96°C and 10 min at 69°C, with a 10-min extension at 72°C. The long PCR fragments were analyzed by electrophoresis by using a 0.5% agarose gel, while the short PCR products were analyzed on a 1.0% agarose gel.
|
5-kb fragments spanning the whole
35-kb cluster, with each fragment overlapping its neighbors by a few hundred base pairs (Table 1 and Fig. 1C). Any negative results obtained by the long PCR were followed up by deletion scanning long PCRs as described previously (48).
|
1-kb PCR products (Table 1). The amplicons were purified by using a PCR purification kit (QIAGEN, Crawley, United Kingdom) and were sequenced directly with nested primers. Nucleotide sequence accession number. The sequence and annotation of the Flag-2 cluster from E. coli 042 have been deposited in the EMBL database under accession number CR753847.
| RESULTS |
|---|
|
|
|---|
BLAST searches of the nonredundant GenBank databases showed that E. coli 042 Flag-2 genes are more similar to lateral flagellar genes found in V. parahaemolyticus and Y. pestis than to the conventional Flag-1 genes found in other E. coli strains and E. coli 042 itself. V. parahaemolyticus has been shown to have a lateral flagellar system encoded by (at least) 37 genes in five operons arranged in two clusters, in addition to a polar flagellar system (50). E. coli 042 Flag-2 contains positional orthologs of all V. parahaemolyticus lateral flagellar genes except motY. In addition, there is a conserved operon structure, but there is one salient difference: the genes form a single cluster in E. coli 042 but form two well-separated clusters in V. parahaemolyticus (Fig. 1B).
Although it might be premature to assume that the Flag-2 flagellar system produces lateral flagella, we adopted a lateral flagellar nomenclature for Flag-2 that is easily comparable to that used for the well-established Flag-1 flagellar system and is compatible with the requirements of genome annotation (Table 2). Genes with fli, flg, and flh prefixes in the Flag-1 system have prefixes of lfi, lfg, and lfh, respectively, in Flag-2-like and lateral flagellar systems (e.g., LfiM is the Flag-2 or lateral flagellar homolog of FliM). This nomenclature is consistent with that used for a previous V. parahaemolyticus GenBank submission (accession no. U51896), as well as a recent submission by Stewart et al. (accession no. AY225128). In the latter case a different nomenclature was used in the accompanying publication (genes were designated with reference to the polar flagellar nomenclature, so that fliML was equivalent to the polar flagellar gene fliM) (50), but this system is not easily adapted for use in standard sequence file formats. Importantly, the homologs of Flag-1 genes fliCDSTKLA and motAB are designated lafABCDEFSTU, which is consistent with the first description of lateral flagellar genes (38) (accession no. gb:U52957) and subsequent descriptions of Aeromonas lateral flagellar clusters (32). Additional genes that were found to be associated with Flag-2 loci but not with Flag-1 loci are referred to as laf genes, as with the regulatory protein LafK encoded by the V. parahaemolyticus Flag-2 system (50).
|
Both K-12 strain MG1655 lfhA and lafU appear to have been truncated such that the first 391 and 189 nucleotides of E. coli 042 lfhA and lafU, respectively, do not match sequences anywhere in the K-12 genome. This places the predicted site of deletion of the Flag-2 cluster at nucleotide 250060 of the K-12 strain MG1655 genome. This point of deletion is also found in all other available Escherichia/Shigella genome sequences; more specifically, it occurs in E. coli K-12 strain W3110, enteropathogenic E. coli strain E2348/69, strain DH10B, O157:H7 strains EDL933 and RIMD 0509952 (= Sakai), UPEC strain CFT073, S. flexneri 2a strains 301 and 2457T, and S. sonnei 53G. Strain DH10B also has a frameshift mutation within lfhA. These results suggest that the entire Flag-2 locus was originally present in the last common ancestor of the species. It is not clear why the deletion occurred. However, the ends of the deletion correspond exactly to the 17- or 25-bp direct repeat GTNGATNNTCANCAGNNTNA(N)TAAA, which does not appear anywhere else in the E. coli 042 genome. It is possible that recombination between the repeats may have been responsible for the deletion.
All 11 S. enterica genomes surveyed lack counterparts of the lfhA and lafU genes (Fig. 1B). The S. enterica genomes also lack yafL and yafM, but they do contain full-length divergently transcribed dinP and yafK genes (Fig. 1B). In contrast to sequenced Escherichia/Shigella strains, this genetic arrangement suggests that S. enterica probably never possessed a Flag-2 locus. The hypothesis that Escherichia/Shigella acquired Flag-2 by lateral gene transfer after divergence from S. enterica is supported by the observation that yafM encodes a protein with significant similarity to transposases and inactivated derivatives (data not shown).
The E. coli 042 Flag-2 cluster is very similar to the V. parahaemolyticus lateral flagellar system.
The majority of E. coli 042 Flag-2 protein sequences exhibit 25 to 58% amino acid identity with the orthologous proteins in the V. parahaemolyticus lateral flagellar system (36, 50), and the average level of identity is around 40%. In contrast, LfgA, LfgM, and LfgN (the P-ring addition, anti-
28, and chaperone proteins, respectively) are more divergent, exhibiting <25% identity. FlgM and FlgN show similar high variability and compositional bias across the full range of flagellar diversity. LfiM and LfiJ (export and assembly proteins) and LafD and LafF (a chaperone protein and a protein with an unknown function, respectively) were not detected in TBLASTN searches with V. parahaemolyticus proteins against the E. coli 042 Flag-2 region, but positional orthologs were identified by using PSI-BLAST.
lfgC is a pseudogene in E. coli 042. In most cases, the GLIMMER-predicted CDSs in the E. coli 042 sequence closely matched the lengths of their counterparts in V. parahaemolyticus, although there are minor discrepancies in the predicted start codons for lfiG, lfgA, lfgB, lfgG, lfgL, and lafU (data not shown). However, the lfgC gene from 042 is over 40 codons shorter than its homologs in other systems. A TBLASTN search of the E. coli 042 Flag-2 region with V. parahaemolyticus FlgC indicated that there is a frameshift mutation in E. coli 042 lfgC. A run of three GC dinucleotide repeats is present upstream of the 042 lfgC open reading frame, but if one repeat is removed, a full-length lfgC gene is discernible, and it exhibits a high level of identity with V. parahaemolyticus lfgC over its entire length. The lfgC gene encodes a FlgC-like proximal rod protein and so is likely to be essential for the production of Flag-2 flagella by E. coli 042. In other words, and consistent with our inability to elicit swarming motility in E. coli 042 (data not shown), this frameshift probably inactivated the Flag-2 system in this strain.
Evidence that the Flag-2 system might be RpoN regulated.
V. parahaemolyticus contains an RpoN-dependent regulator, LafK, that is required for the expression of lateral flagellar early genes (50). RpoN has not previously been shown to be required for E. coli flagellar systems, so we were interested to see if there was any evidence that the Flag-2 system might be regulated in this manner. As expected, the LafK homolog in Flag-2 contains a full-length Pfam:Sigma54 activat domain (9.2e-113). Furthermore, we identified consensus
54 sites (TGGCAC-N5-TTGC) upstream of both lfgB and lafB translation start codons, as found in V. parahaemolyticus (50). Together, these findings suggest that the Flag-2 system is RpoN dependent.
In V. parahaemolyticus the central chemosensory elements are shared by both polar and lateral flagellar systems (50). Similarly, E. coli 042 Flag-2 does not encode the normal complement of chemotaxis proteins normally found associated with polar flagellar systems, so it is likely that it too shares chemosensory functions with the Flag-1 flagellar system. Despite these similarities, it appears that the regulation of Flag-2-like flagellar gene expression may differ substantially in V. parahaemolyticus and E. coli 042. In V. parahaemolyticus there is a consensus
54 site upstream of the lateral flagellar motY gene; however, this gene is not present in E. coli 042 Flag-2, and no consensus
54 site was identified in this region. Interestingly, V. parahaemolyticus LafK contains a CheY-like receiver domain at its N terminus, whereas E. coli 042 LafK (and Y. pestis LafK homologs) has no such domain and consequently is 100 amino acids shorter.
The E. coli 042 Flag-2 locus contains nonflagellar genes. Two additional CDSs are found between lfiJ and lfgN in the E. coli 042 Flag-2 cluster that are not found in the V. parahaemolyticus or Y. pestis genomes (Fig. 1B). The predicted coding sequence immediately adjacent to lfiJ (Ec042-0259) encodes a 135-residue homolog of cytidylyltransferase (residues 5 to 130 match Pfam:CTP_transf_2; 1.00e-6). In BLASTP searches of the nonredundant protein database of the National Center for Biotechnology Information (nr) with Ec042-0259, known and putative glycerol-3-phosphate cytidylyltransferases were found with high significance (gi 46914301; 59% identity; 1e-39). Ec042-0260 encodes an 823-residue protein with an N-terminal glycosyl transferase domain (residues 44 to 217 match Pfam:Glycos_transf_2; 1.40e-13) and a C-terminal glycerolphosphotransferase domain (residues 647 to 835 match Pfam:glyphos_transf; 4.60e-3). This domain organization is shared by more than 20 other proteins, including the minor teichoic acid biosynthesis protein encoded by ggaB of Bacillus subtilis and the teichoic acid biosynthesis protein encoded by tagF of Staphylococcus epidermidis. Genes that are homologs of Ec042-0259 and Ec042-0260 are often found together as part of capsular polysaccharide biosynthesis gene clusters. It is possible that these genes may be responsible for posttranslational modification of the flagellar proteins, such as the glycosylation demonstrated to occur on the Aeromonas lateral flagella (20).
Two further predicted coding sequences with no counterparts in the V. parahaemolyticus lateral gene flagellar system are found between lfgL and lafA in the E. coli 042 Flag-2 cluster. Ec042-0277 encodes a 115-residue protein with similarity to other bacterial proteins with an unknown function (COG4683). Ec042-0278 encodes a 100-residue protein that contains a helix-turn-helix domain (residues 32 to 86 match Pfam:HTH_3; 5.20e-12) and exhibits high amino acid identity (
50%) with several other putative transcriptional regulators.
E. coli 042 Flag-2 contains three genes that are conserved in other similar flagellar gene clusters. Situated between Ec042-0260 and lfgN is a gene predicted to encode a 323-residue protein with significant similarity as determined by BLASTP analysis against nr to a hypothetical protein from Y. pestis KIM (E value, 1e-4) and two S. enterica FliB proteins (E value, 2e-4). FliB is a lysine-N-methylase that is required for posttranslational methylation of lysine residues in the flagellin of S. enterica but is not found in E. coli (12). The fliB gene is normally found adjacent to fliA, and in some S. enterica strains FliB has been found to be encoded by two adjacent genes (fliUV) (17). Interestingly, there has been a report that Aeromonas punctata has a fliU-like gene (gb AAK57643 in a cluster with lafA1, lafA2, and lafB, although the lack of this gene did not noticeably affect swarming or swimming motility (32). No such homolog could be identified in the lateral flagellar clusters of V. parahaemolyticus or Y. pestis, although a homolog was found in the C. rodentium Flag-2 cluster (see below). We predict that a FliB homolog is a novel component of some Flag-2-like flagellar systems and propose the designation lafV for the gene that encodes it.
Immediately downstream and in the same orientation as lfgL is a gene predicted to encode a 325-residue protein with significant similarity to hypothetical proteins encoded in Flag-2-like flagellar gene clusters in Y. pestis, Y. pseudotuberculosis, C. violaceum, and V. parahaemolyticus. Interestingly, the homolog from V. parahaemolyticus is annotated a putative flagellin. A PSI-BLAST search with this sequence did indeed find flagellin proteins, albeit with low significance. Therefore, this protein is conserved in several Flag-2-like flagellar systems, and we propose the designation lafW for the gene that encodes it. Given that FlgL and FliC are paralogous, the low-significance matches to flagellin suggest that LafW may represent a novel hook-associated protein, like that encoded by the adjacent lfgL gene.
Upstream and divergent from lafA is a predicted coding sequence for a 298-residue protein with an N-terminal transcriptional regulator domain (residues 40 to 115; Pfam:trans_reg_C; expect value, 8.10e-10) and a predicted membrane-spanning region (residues 160 to 182). In BLASTP searches against nr several putative transcriptional regulators were found, along with several other regulators with known functions (notably, Vibrio cholerae ToxR). Intriguingly, homologs were found in syntenic locations in Y. pestis KIM and CO92, Y. pseudotuberculosis, and C. rodentium. We propose that this putative transmembrane transcriptional regulator could be involved in Flag-2 gene expression, and we designated the corresponding gene lafZ.
Sequenced strains of Y. pestis each have a single Flag-2-like flagellar gene cluster.
Both the annotated Y. pestis genomes (CO92 [46] KIM [15]) and the completed but unpublished Y. pestis biovar Mediaevalis strain 91001 genome (gb:NC_005810) have predicted Flag-2-like flagellar gene clusters (Fig. 1B and Table 2). In common with E. coli 042 and in contrast to V. parahaemolyticus, all three Y. pestis genomes encode the Flag-2 system in a single locus. However, this locus is present at a different chromosomal location than its equivalent in E. coli 042. In nearly all cases, the amino acid identity between Flag-2 orthologs is
10% higher between Y. pestis and E. coli 042 than between V. parahaemolyticus and E. coli 042, suggesting that there is less evolutionary distance between the former Flag-2 clusters than between the latter clusters. Ec042-0259, Ec042-0260, Ec042-0277, Ec042-0278, and lafV are not found in either strain, but both KIM and CO92 have copies of lafW and lafZ in the appropriate locations. Interestingly, Y. pestis CO92 has three sequential copies of lafA that appear to have arisen via gene duplication. None of the Y. pestis genomes appears to have a functional Flag-2 cluster (Fig. 1B); all three contain a frameshift mutation in lfhA that should truncate LfhA to 432 residues. KIM also has a 14.9-kb deletion that includes the entire lafBCDEFSTU locus and several non-Flag-2 genes between lafA and a pair of transposase genes, y3440 and y3439; CO92 has a small insertion of a few hundred nucleotides in the middle of lfgF; and 91001 has a frameshift mutation in lfgL that should lead to a severe truncation of LfgL.
Flag-2-like flagellar loci in C. violaceum, C. rodentium, and Y. pseudotuberculosis. We were interested to see if Flag-2-like flagellar genes could be identified in any additional bacterial genome sequences. We identified Flag-2-like flagellar loci using two broad criteria: (i) higher sequence identity with the E. coli 042 Flag-2 genes than with the E. coli K-12 Flag-1 genes and (ii) a conserved genetic organization, with five operons showing a conserved gene order arranged in one or two gene clusters (in contrast to the typical four or five clusters encoding Flag-1). In addition, we used two more focused criteria, absence of an fliO homolog and presence of lafV, lafY, and lafZ homologs.
We identified Flag-2-like flagellar loci in the genomes of C. violaceum, C. rodentium, and Y. pseudotuberculosis. Each genome also encoded a complete Flag-1-like system, and there were some minor differences between the Flag-2 systems; the most extreme of these differences was that the C. violaceum system did not appear to be RpoN dependent. C. rodentium is a close relative of E. coli and has been used as a model to study type III secretion (reviewed in reference 33). Although the genome sequence is not yet complete, the content and organization of the C. rodentium Flag-2 cluster are indistinguishable from those of the E. coli 042 cluster, and there are high levels of nucleotide identity (80 to 90%) across the entire cluster. Furthermore, the cluster is located in the same position relative to the genomic backbone, suggesting that the Flag-2 cluster was acquired by a common ancestor prior to the divergence of the Escherichia and Citrobacter clades. Interestingly, the C. rodentium Flag-2 cluster lacks the inactivating frameshift mutation in lfgC found in the 042 Flag-2 cluster.
Similarly, the Y. pseudotuberculosis Flag-2 cluster is very similar to the Y. pestis cluster and occurs at the same chromosomal location, although this is not surprising as Y. pestis is a recently derived clone of Y. pseudotuberculosis (1). Intriguingly, Y. pseudotuberculosis possesses full-length copies of lfhA, lfgF, and lfgL, suggesting that the Flag-2 system may be functional in this organism. Also, curiously, the Y. enterocolitica 8081 genome does not appear to encode a Flag-2 system; this system appears to have been lost due to a
100-kb deletion (data not shown).
The Flag-2 cluster is present in around 20% of E. coli strains.
Next, we wished to determine the distribution of the Flag-2 gene cluster among a larger collection of Escherichia strains. PCR across the fhiA-mbhA boundary was positive for 58 strains (
80%) from the well-characterized ECOR collection, showing that they all possessed the same two-gene scar seen in K-12. Fifteen ECOR strains (
20%) were negative in this PCR (ECOR-1, -3, -4, -5, -12, -17, -24, -35, -36, -48, -49, -50, -64, -65, and -67) (Table 3), suggesting that they might harbor the full Flag-2 cluster at this site (Fig. 2A). A second round of PCRs targeting pairs of genes at either end of the full Flag-2 cluster provided complementary results (i.e., negative for the 58 strains with the K-12 Flag-2 genotype and positive for the 15 strains with the 042 Flag-2 genotype) (Fig. 2B). These results were consistent with the hypothesis that the full Flag-2 cluster was present in the last common ancestor of all E. coli strains and, although it has been lost from most strains, it has been retained in a sizable minority (around one-fifth) of E. coli isolates. Curiously, similar short PCR surveys applied to four Escherichia spp. other than E. coli (Escherichia blattae, Escherichia fergunsonii, Escherichia hermannii, and Escherichia vulneris) showed that all four of them possessed the K-12-like genotype (data not shown).
|
|
35-kb cluster. Most PCRs were positive for all 15 strains (Fig. 1C). Any negative results were followed up by deletion-scanning PCRs with the primers flanking the negative regions (Fig. 1C and Fig. 3). Surprisingly, in contrast to our experience with the ETT2 gene cluster, we could not detect any large-scale insertions, deletions, or rearrangements in any of the Flag-2 clusters from the 15 ECOR strains compared to the 042 genotype (Fig. 1C).
|
| DISCUSSION |
|---|
|
|
|---|
These observations are relevant to the study of model organisms; they cast doubt on the wisdom of ever considering a single strain, such as K-12, as the archetype for a whole species. Furthermore, they emphasize the need to adopt a historical and comparative viewpoint when genomes are annotated, so that genes such as fhiA and mbhA that represent remnants of larger gene clusters can be recognized as such and appropriately annotated as pseudogenes.
Several points spring to mind about the evolution of this gene cluster. The locus occurs at the same location in the E. coli and C. rodentium genomes, suggesting that it was present in the ancestor of both species. However, its absence from Salmonella suggests that it was acquired after these two species diverged from Salmonella. This, combined with the presence of a Flag-2-like locus in Yersinia at an entirely different site in the genome, suggests that the cluster was acquired independently at least twice by lateral gene transfer. This suggestion is supported by a striking property of the cluster: unlike the Flag-1 system, all the components for the Flag-2 system appear to be encoded in a single large gene cluster, so that the entire self-contained Flag-2 flagellar system could be acquired in a single step. However, the mechanism of lateral gene transfer remains unclear, although the similarities between YafM, encoded by a gene at one end of the cluster, and transposases might provide a clue.
Curiously, the majority of E. coli strains lack almost all Flag-2 genes and possess an identical fusion (fhiA-mbhA) between the remnants of genes (lfhA and lafU) from either end of the cluster. This might suggest that the cluster was deleted once, in the ancestor of all such strains. However, a finding that does not support this idea is the lack of congruence between the distribution of intact or deleted Flag-2 clusters and the accepted phylogenetic structure of E. coli, as defined by multilocus enzyme electrophoresis (25) (i.e., division into the A, B1, B2, and D clades). Intact Flag-2 clusters are more common in the A clade but are nonetheless scattered throughout all four subdivisions, scuppering any attempt to link the Flag-2 genotype with the lines of phylogenetic descent. One possible explanation is that recombination has occurred between strains at this locus, purging most of them of the Flag-2 cluster and overwriting any phylogenetic signal.
So far, we have not detected a phenotype in enteroaggregative E. coli strain 042, whose genome has been sequenced, that could be ascribed to the Flag-2 cluster; for example, this strain did not show swarming behavior in our hands (data not shown). This is not too surprising, as we discovered a frameshift in one important gene, lfgC, that was almost certain to have inactivated the system. However, we did not detect any other inactivating mutations in the Flag-2 cluster in this strain, suggesting that the Flag-2 system was active in the recent past. Indeed, we cannot rule out adaptation to the laboratory environment as the cause of loss of Flag-2 function in this strain.
The fact that the Flag-2 system has been inactivated in all Escherichia/Shigella strains whose genomes have been sequenced, whether through a large deletion or, in the case of 042, through a frameshift mutation, provoked comparison with other pathogens that have lost motility functions (e.g., loss of Flag-1 function in Shigella, loss of both Flag-1 and Flag-2-like systems in Y. pestis, and loss of flagellar motility in Bordetella pertussis) (2, 45, 46). It is tempting to speculate that similar immune selective pressures provide a common explanation, given that flagellin is so highly visible to the innate immune system through its interactions with Toll-like receptor 5 (23). However, the frequent loss of the Flag-2 system may simply reflect the energy costs of producing a large multiprotein organelle in niches where it provides no selective advantage.
Although we have yet to find a strain with an active Flag-2 system, a number of pertinent structural and functional predictions can be made about the system upon scrutiny of the gene cluster. By analogy with related systems in Vibrio and Aeromonas, one could anticipate some distinctive features that distinguish Flag-2 from the conventional Flag-1 system; for example, its smaller flagellin could assemble into filaments thinner than conventional flagella (7, 32). One could also expect the Flag-2 system to mediate swarming motility and to be activated under high-viscosity conditions (5, 32). Other roles might include biofilm formation, cell-cell linkage, surface colonization, and adhesion to and invasion of eukaryotic cells. One might even anticipate a role in gut colonization and/or virulence.
The distribution of the Flag-2 filaments is likely to differ from the peritrichous distribution seen with the Flag-1 system. All previously characterized Flag-2-like systems are lateral flagellar systems (7, 32). However, all known lateral systems occur in association with polar flagella rather than peritrichous flagella, which are characteristic of Flag-1; thus, given this discrepancy, it is perhaps premature to assume that the Flag-2 system in E. coli necessarily produces flagella with a lateral distribution. For this reason, we adopted the Flag-2 designation rather than simply calling this system the E. coli lateral flagellar system (even though that is what it might turn out to be). By analogy with the Vibrio lateral flagellar system, Flag-2 is likely to be proton driven (6). This suggests an intriguing difference between the Flag-1-Flag-2 combination in E. coli and the lateral-polar combination in Vibrio; in E. coli strains that possess an intact Flag-2 system, two proton-driven flagellar systems might coexist in the same cell, whereas in the lateral-polar arrangement, one system is proton driven and the other is driven by sodium ions (6).
Another striking feature of the Flag-2 system is the lack of variability in the sequence of its flagellin, LafA; this distinguishes it from Flag-1, in which there are numerous antigenically distinct H types associated with sequence polymorphisms in the surface-exposed D2 and D3 domains of the Flag-1 flagellin, FliC (52). Indeed, it is interesting that identical Flag-2 lafA sequences are distributed among ECOR strains with various H types (Table 3). This hints at differences in the selective pressures exerted on the two systems by the acquired immune system or by other pressures driving flagellar diversity.
Analysis of the Flag-2 gene cluster allows several conclusions to be drawn about the regulation of Flag-2 gene expression and biosynthesis of the second flagellar system in E. coli. The most remarkable inference, based on the presence of LafK and some RpoN consensus-binding sites, is that this system is likely to be RpoN dependent. The Flag-1 system appears to be increasingly unusual among bacterial flagellar systems in its lack of dependence on RpoN. If proven, the RpoN dependence of Flag-2 would therefore provide a fundamental link between flagellar biosynthesis in E. coli and many other RpoN-dependent flagellar systems. Furthermore, it might enable molecular dissection of the role of RpoN in flagellar gene expression in a tractable host.
Another prediction is that like the lateral flagellar systems of Vibrio and Aeromonas, the E. coli Flag-2 system utilizes its own flagellar sigma-antisigma combination, encoded by homologues of FliA (LafS) and FlgM (LfgM). How regulation of Flag-2 gene expression is coupled to global gene regulation is less clear. It is likely to be coordinately regulated with Flag-1 and may exploit the same chemotaxis apparatus (it appears to have none of its own). However, it may well be independent of FlhCD, which are high-level regulators of the Flag-1 system, as these regulators are absent from species that contain functional Flag-2-like lateral flagellar systems (data not shown).
The Flag-2 gene cluster encodes all the components of a flagellar type III secretion system (Table 2). If the Flag-2 gene cluster does indeed encode a functioning flagellar system in some strains, this would increase the number of distinct type III secretion systems in Escherichia/Shigella to five. It is now well established that there is regulatory cross talk between some of these systems (48). The discovery of the Flag-2 cluster in 20% of E. coli strains increases the potential for this phenomenon, particularly as some ECOR strains possess ETT2, Flag-2, and Flag-1 genes (Table 3). Furthermore, given our recent discovery that regulatory influences can outlive the decay of structural genes in the ETT2 gene cluster (the "Cheshire cat" effect), it is possible that the Flag-2 locus might exert regulatory effects even in strains in which it is no longer capable of encoding a fully functional flagellar system (48, 56).
The fact that the Flag-2 gene cluster was not discovered in the first 10 Escherichia/Shigella genome sequences obtained emphasizes the importance of maintaining an energetic program of genome sequencing in this taxonomic group. The hunt is now on for a functional Flag-2 system and for any phenotypes associated with it in E. coli strains. However, the presence of similar, potentially functional gene clusters in C. rodentium and Y. pseudotuberculosis should also focus the spotlight on motility in these bacteria. In addition, one might anticipate fresh insights into Flag-2-like systems to emerge from the soon-to-be-completed Aeromonas hydrophila ATCC 7966 genome sequence.
In conclusion, the obvious similarities between Flag-2 genes and related genes in C. rodentium and Y. pseudotuberculosis and with lateral flagellar systems in Aeromonas and Vibrio illustrate a recent dictum (8), "... if we are willing to think in terms of an idealized E. coli, we can include a great deal of well-studied biology of closely related enteric bacteria, " while the discovery of the Flag-2 locus in E. coli demonstrates how much there is still to learn about motility in this intensively studied model organism.
| ACKNOWLEDGMENTS |
|---|
M.J.P. thanks Arshad Khan for systems administration. We thank the BBRP Sequencing Group at LLNL for making the unfinished Y. pseudotuberculosis genome sequence data publicly available.
| FOOTNOTES |
|---|
C.-P.R. and S.A.B. contributed equally to this study. ![]()
| REFERENCES |
|---|
|
|
|---|
54 and the transcriptional regulator FleQ of Legionella pneumophila, which are both involved in the regulation cascade of flagellar gene expression. J. Bacteriol. 186:2540-2547.
54 controls motility, biofilm formation, luminescence, and colonization. Appl. Environ. Microbiol. 70:2520-2524.
This article has been cited by other articles: