ABSTRACT
Shigella strains are nonmotile. The master operon of flagellar synthesis, flhDC, was analyzed for genetic damage in 46 Shigella strains representing all known serotypes. In 11 strains (B1, B3, B6, B8, B10, B18, D5, F1B, D10, F3A, and F3C) the flhDC operon was completely deleted. PCR and sequence analysis of the flhDC region of the remaining 35 strains revealed many insertions or deletions associated with insertion sequences, and the majority of the strains were found to be defective in their flhDC genes. As these genes also play a role in regulation of nonflagellar genes, the loss may have other consequences or be driven by selection pressures other than those against flagellar motility. It has been suggested that Shigella strains fall mostly into three clusters within Escherichia coli, with five outlier strains, four of which are also within E. coli (G. M. Pupo, R. Lan, and P. R. Reeves, Proc. Natl. Acad. Sci. USA 97:10567-10572, 2000). The distribution of genetic changes in the flhDC region correlated very well with the three clusters and outlier strains found using housekeeping gene DNA sequences, enabling us to follow the sequence of mutational change in the flhDC locus. Two cluster 2 strains were found to have unique flhDC sequences, which are most probably due to recombination during the exchange of the adjacent O-antigen gene clusters.
Shigella is a pathogen that causes bacillary dysentery or shigellosis in humans. Historically, Shigella was distinguished from Escherichia coli by its biochemical characteristics and nonmotility (6), but in recent years it has been suggested that Shigella strains are so close to E. coli as to be effectively members of the same species. Our recent study (18) suggested that with the exception of Shigella boydii 13, Shigella strains fall into three main clusters and several outliers within E. coli (note that we prefer that Shigella and the four species names not be italicized since we contend that they are in effect forms of E. coli; however, this is not consistent with this journal’s policy).
E. coli strains are generally motile and produce flagella, with over 50 genes involved. These genes are located in three regions of the chromosome and organized in at least 12 operons. These operons are grouped into three classes, reflecting their transcriptional hierarchy (3). The single class 1 operon, comprising the flhDC genes, is the master operon of the flagellar transcriptional hierarchy (11). Class 2 operons encode proteins for the components and assembly of the hook-basal body and transcriptional regulator, including the flagellum-specific sigma factor gene fliA. FliA is indispensable for the expression of class 3 genes, including the flagellin gene, fliC. Consequently, mutations in the flhDC genes have a decisive effect on expression of other flagellar operons, resulting in the loss of flagella and motility. Although Shigella strains are characteristically nonmotile, they have been shown to have the components for synthesis of flagella. In some strains cryptic flagellin genes have been detected and characterized (23, 24) at the fliC locus, and when 43 forms of the E. coli fliC gene were sequenced, it could be seen that these cryptic Shigella genes were each typical of one of the known E. coli H-antigen genes (26). In other studies it has been shown that in the strains examined the cause of loss of flagellation in Shigella varies and can be due to insertion sequence (IS)-insertion mutations in the flhDC operon or deletion in region III flagellar operons (1, 2).
In a previous study, it was suggested that Shigella strains are derived from E. coli as several apparently independently derived groups or strains (18), with the loss of motility presumably occurring independently in different lineages. In this study we analyzed the flhDC operon and the genetic organization of the flhDC regions of Shigella strains representing all 46 serotypes. IS elements mediated the majority of the damage, and the changes revealed support our previous finding of independent origins of the Shigella clusters.
Strains.
Forty-six Shigella strains that represent the known serotypes were used; details are given in Pupo et al. (18). The symbols, D for S. dysenteriae, F for S. flexneri, and B for S. boydii are prefixed to each serotype, and such serotype designations are used in place of strain names in this paper. For S. sonnei, SS was used. E. coli K-12 wild-type strain (P801) was used as the control.
PCR amplification of the flhDC region.
A recent study (2) showed that in some Shigella strains the flhDC operon carries IS elements or suffered a deletion. To determine the difference among the Shigella strains in the genetic organization of the flhDC operon, genomic DNA was prepared by methods described previously (22), and an 8.1-kb region containing motBA, flhCD, yecG, otsAB, and araH genes was analyzed by PCR using primers shown in Fig. 1. Using primers 5047 and 5048, which encompass the flhDC genes, we amplified PCR products from 30 of the 46 strains. In 20 strains (B2, B4, B12, B14, B17, D1, D3, D4, D6, D7, D8, D9, D11, D12, D13, F4A, F5, F6, F6A, and SS) the flhDC region was amplified as a 1.8-kb fragment, as for E. coli K-12, and for 10 strains (B7, B9, B11, B15, B16, D2, F1A, F2B, F3B, and F4B) fragment sizes ranged from 2.4 kb to 3.3 kb in length. For a further four strains (B5, F2A, FX, and FY) which gave no PCR product with the above primers, the flhDC region was amplified by using primers 5047 and 5062 as a fragment ranging from 2.0 kb to 3.4 kb, shorter than the predicted size (3.5 kb) in E. coli K-12, suggesting that the region containing the primer 5048 sequence is deleted. In the B13 strain, the flhDC region was obtained only as a 3.5-kb fragment by using primers 5047 and 5062. Because this strain is very divergent (18), the failure of PCR was likely due to sequence differences affecting primer binding. Regions upstream or downstream of flhDC were amplified using primer pairs 5130 and 5068 or 5067 and 5064, respectively. The upstream or downstream regions of the flhDC genes from 16 strains (B2, B4, B12, B14, B15, D1, D3, D4, D6, D9, D11, D12, D13, F1A, F6, and F6A) or 1 strain (D8), respectively, were amplified as larger fragments than that predicted in E. coli K-12, and from 3 (B5, B11, and B13) or 7 (B2, B4, B13, B14, D1, F6, and F6A) strains, respectively, PCR products were not obtained. The others gave products of a size similar to that in K-12.
Schematic presentation of the cheA-araG regions of Shigella strains. The top diagram is the genetic organization of the region based on E. coli K-12. Oligonucleotide primers used for PCR are indicated by arrowheads with numbers (sequences given below). Below are representations of Shigella strains grouped by clusters based on the study of Pupo et al. (18). Unamplifiable segments are shown as blank. In three strains (D10, F3A, and F3C), the whole region could not be amplified and is not shown. Probe A was used for Southern analysis. The flag positions are insertion sites of each IS element (shown at the bottom), and its orientation indicates the transcriptional direction of its transposase gene. The ISSfl4 with an asterisk has a deletion of the right half. The sequences of the PCR primers used are as follows: 5047, 5′-AACCGCCGAAAACTGTACCGAGA-3′; 5048, 5′-GACGCAATCCCAACTCGGTCAAAC-3′; 5062, 5′-TACGTGGGCCTCTTTTAACC-3′; 5064, 5′-GCTCTCAACACGCTTGCTGA-3′; 5066, 5′-TCAGCAAGCGTGTTGAGAGC-3′; 5067, 5′-AGATGCGTGGTTTCCTGCA-3′; 5068, 5′-TCAACATCAGTGCCAGAC-3′; 5099, 5′-GGTTAAAAGAGGCCCACGTA-3′; 5130, 5′-GTTTGACCGAGTTGGGAT-3′; and 5152, 5′-CACGTCAGAGTAGCGGAATA-3′. Primers were based on the sequence of the E. coli K-12 flhDC region (9).
For three strains (D10, F3A, and F3C) no PCR product was amplified using all combinations of primers covering the region from cheA to araG. Another eight strains (B1, B3, B6, B8, B10, B18, D5, and F1B) had no PCR product for the flhDC genes but gave a PCR product from the otsB-araG region upstream of flhDC. In these 11 strains, the presence of flhDC genes in the genome was examined by Southern hybridization at low stringency with a probe including the flhDC genes (probe A, shown in Fig. 1). No bands were observed (data not shown), indicating that the flhDC genes are completely deleted in these Shigella strains.
History of change in the flhDC region.
Amplified fragments containing the flhDC genes from the 35 strains were sequenced as described previously (12). Nucleotide and amino acid sequences were analyzed using GENETYX information processing software (Software Development Co. Tokyo, Japan). We also sequenced regions immediately upstream of flhCD for strains where the size of the PCR product differs from that found in K-12.
Several IS elements were observed and are depicted in Fig. 1. Sequence variations including small deletions, additions, and base substitutions in the flhDC genes are presented in Fig. 2. In the discussion that follows the substitutions and other changes are referred to by their base positions as indicated in Fig. 2. The 35 strains have been previously determined to belong to three clusters (clusters 1 to 3) and five outliers by using housekeeping gene sequences (18). The data from the flhDC region give support to the separation of the clusters, with four sites (438, 796, 1360, and 1432,), eight sites (76, 357, 375, 414, 435, 785, 1193, and 1289), and one site (863) common to strains within clusters 1, 2, and 3, respectively.
Polymorphic nucleotide sites in the flhDC region (nucleotide positions 1 to 1708). A plus above the base number indicates the position for a nucleotide addition or IS insertion. IS elements are shown by a number with key at the bottom of the figure. Long deletions (over 200 bases) are shown by nucleotide positions in brackets. A 33-base deletion (nucleotide positions 549 to 581), common to B14 and D1, is not shown. Note that B13, which is very divergent, has an additional 285 singular polymorphic sites not shown.
The variation observed in this study allows us to look at the relationships within each cluster, giving further support to, and in some cases further resolution of, the branching order (Fig. 3). An evolutionary tree was constructed based on changes observed in this region only (data not shown) for the strains belonging to the three clusters. Except for the two strains B16 and B17, the tree is consistent with the housekeeping gene tree previously determined (18) but has much lower resolution. Therefore, we describe and discuss the changes in the flhDC region with reference to the relationships based on housekeeping genes. The changes discussed below are all indicated in Fig. 3, and provided that we treat the multiple base substitutions in B16 and B17 as due to single recombination events involving the whole flhDC region, there are no reverse or parallel changes, giving us confidence in the branching order and sequence of events.
Relationships of strains within the three Shigella clusters. A, cluster 1; B, cluster 2; C, cluster 3. The tree was based on housekeeping genes (18) with additional data for cluster 1 from tna genes (20) and changes in genetic organization and sequence variation of the flhDC region observed in this study. Events are marked at the nodes: flh (above branches) for changes in the flhDC region (see Fig. 2), HK (below branches) for housekeeping genes (18), and tna (below branches) for tna genes (20). The numbers of substitutions are indicated within brackets. However, the numbers prefixed with + or − indicate numbers of nucleotide additions or deletions, respectively. flh(r) and flh(Δ all) denote a recombinational change and a deletion of the whole flhDC region, respectively. Insertion of IS elements is prefixed with the gene or IS name. An insertion (ins) of unknown nature is indicated with its size. The branch length is arbitrary and does not reflect genetic distance.
(i) Cluster 1.
Cluster 1 has been further divided into three subclusters (20), as shown in Fig. 3. An IS911 (16) insertion bracketed by a 3-bp direct repeat is present at the same site in the otsA gene in all 12 cluster 1a and 1b strains. This IS911 insertion must have occurred before the subcluster 1a and 1b clade diverged from the other cluster 1 strains. In addition, in two closely related strains (B2 and B4) the IS911 element itself had an IS1SB element (8) inserted. As the whole region is deleted in all cluster 1c strains, it is not known whether the IS911 insertion had occurred before the divergence of subcluster 1c. D7 is the only cluster 1 strain with no IS911 element in otsA.
All seven subcluster 1a strains have a 5-bp deletion (bp 710 to 714) in the flhD gene, which changes the five C-terminal amino acids of FlhD and generates an in-frame fusion to the N-terminal 66 amino acids of FlhC. They also share two base substitutions (sites 438 and 496) in the flhC gene, one of which (site 496) generates a stop codon in flhC. All five subcluster 1b strains have a 12-bp deletion (bp 645 to 656) in the flhC gene. All seven subcluster 1c strains have a deletion of the cheA-flhDC-otsA region (Fig. 1), which is not shown in Fig. 2. D7, which does not belong to any of the subclusters observed using housekeeping genes, is again shown to be an outlier within cluster 1.
(ii) Cluster 2.
An ISSfl4 element (27) was found at the same site (site 1290) in the flhDC operon promoter region in the six cluster 2 strains. In addition, in B5 a part of the ISSfl4 sequence (643 bp) was subsequently deleted together with 1,630 bp upstream from its insertion site, and in B11 a single base (G) was added at the insertion site.
IS600 elements (27) were present in the otsA gene in B15 and in the flhD gene in B16, in both cases with a 3-bp direct repeat. The IS600 element itself suffered an additional insertion with an IS30 element (5) in B15.
Four related strains (B7, B9, B15, and D2) share a single base substitution at site 118 in the flhC gene, which also converted the termination codon to a leucine codon, resulting in the C-terminal addition of 12 amino acids.
There are exceptions to the correlation of changes in the flhCD region and clustering of strains determined using housekeeping genes. Two cluster 2 strains (B16 and B17) have flhDC region sequences dissimilar to those of all other Shigella strains (Fig. 2), but within the range for E. coli. These two strains are most likely to have undergone recombination. A recent study of E. coli (14) revealed two large hypervariable regions of the chromosome that include the O-antigen and restriction-modification genes. The hypervariability is inferred to be due to lateral transfer of O-antigen or restriction-modification genes together with large flanking regions. The flhDC region (42.6 min) is contained in the hypervariable region around the O-antigen gene cluster, centered near 45 min. The O antigens of B16 and B17 are reported to be related to those of other E. coli strains (6), and the flhDC region of these two strains probably transferred from other E. coli strains at the time that they gained their current O-antigen genes. The changes in these two strains are treated as single recombination events in Fig. 3.
(iii) Cluster 3.
The changes observed in this study grouped 7 of the 13 cluster 3 strains together, as all have an IS1F element (25) present at the same site (+902) in flhD (Fig. 2). They are 224-bp, 1,923-bp, or 2,327-bp deletions upstream from the insertion site in four (F1A, F2B, F3B, and F4B), one (FX), or two (F2A and FY) strains, respectively, leading to the deletion of the 5′ 156 bp of flhD in all cases and variably affecting the neighboring genes.
The F3A and F3C strains have the whole region deleted, and the F1B strain has most of it deleted and (Fig. 1). The arrangement of the strains in the tree (Fig. 3) indicates that the IS1F insertion occurred before these strains diverged but the insertion was lost in the deletion events. From the housekeeping gene relationship, it is clear that the deletion of the region in F1B is independent of the deletion of the region in F3A and F3C, which we treat as a single event that occurred before the two diverged. The other three strains (B12, F4A, and F5) diverged at the base of the cluster 3 branch and share two base substitutions (at sites 970 and 979) in flhD, one of which (at site 970) changes the 25th glutamine codon (CAG) to a termination codon (TAG), resulting in truncation of FlhD. Since these two sites fall into the region deleted in all other cluster 3 strains, it is not known whether the two changes were present in the other strains prior to the deletion.
There are also IS1F elements in the yecG gene in F1A and the otsA gene in B12, with a 9-bp direct repeat at the ends in both cases.
(iv) Outliers.
Five strains (B13, D1, D8, D10, and SS) are outside the clusters identified using housekeeping gene sequences (18). Sequences in the flhDC region are largely consistent with this observation. The B13 flhDC operon was very divergent from those of other Shigella strains and E. coli K-12, as expected given the divergence of housekeeping genes (18). However, unexpectedly, D1 shares sequence similarity with cluster 3 strains at the 5′ end of the flhDC genes (Fig. 2), indicating a recombination event. There are several substitutions common to cluster 3 strains but absent in D1, suggesting that the recombination occurred a long time ago with an ancestral cluster 3 strain. D1 also shares a 33-bp deletion in the flhD gene with B14. This deletion is flanked by a 6-bp direct repeat, and as such repeats facilitate deletions by slippage during DNA replication (28), it is most likely that the deletions in the two strains were due to independent events.
Basis of loss of motility.
Almost all strains studied had defects in the flhDC genes or promoter region, suggesting that this has at least contributed to loss of motility. All cluster 1 strains other than D7 are defective in the flhDC genes, with the defect being different for each subcluster. Subcluster 1a strains have a 5-bp deletion in flhD, subcluster 1b strains have a 12-bp deletion in flhC, while subcluster 1c strains have the entire region deleted. D7 has no major damage, having only two amino acid changes (A43V and D85N) in FlhD and one (S135C) in FlhC. It remains to be determined whether these changes will render the protein nonfunctional, but the loss of motility could be due to damages in other flagellar synthesis genes. It is also possible that the deletion of four amino acids in subcluster 1b does not affect function, with loss of motility due to damage in other genes.
The flhDC genes in cluster 2 are themselves intact, but all except the B16 and B17 strains have an ISSfl4 insertion in the flhDC promoter region, between the −35 sequence and the catabolite activator protein binding site consensus sequence (21). As the cyclic AMP-catabolite activator protein complex positively regulates expression of the flhDC operon by interaction with RNA polymerase (21), this insertion presumably results in a shut down of flhDC expression. B16 has an IS600 element inserted in flhD.
The majority of cluster 3 strains have a truncated FlhD (see above), and the others (F1B, F3A, and F3C) had the cheA-flhDC region deleted, so are probably all defective in transcriptional activation for class 2 flagellar genes.
For the five outlier strains, three (B13, D8, and SS) have both FlhD and FlhC intact. D8 and SS also have identical FlhDC amino acid sequences to those of K-12. It has been reported that in another S. sonnei strain, genes in the fli operon coding for flagellar structural proteins were defective (1). In this instance the loss of flagella is not due to a defect in the flhDC genes. D10 has the whole region deleted, while D1 has a deletion of 11 amino acid residues in flhD, which may or may not affect its function.
Shigella is in part defined by loss of motility, and we find that in almost all strains there is major damage to the flhDC operon. Giron (7) in a recent study of the expression of flagella and motility of 31 Shigella strains found that 27 of 28 stains tested produced one or occasionally two to three polar flagella and 14 strains were also motile on 0.2% soft agar, with more strains showing motility at lower percentages of agar. The strains used included S. flexneri serotypes 1 to 6, S. sonnei, and S. dysenteriae and S. boydii strains of unknown serotypes. All strains used in our study were therefore tested for motility according to the method described by Giron (7), using glass vials at 37°C, with inocula observed every 24 h for 72 h. However none of our strains representing all Shigella serotypes showed any motility in this test, although it was not determined if any produce a polar flagellum. For those strains with defects in the flhDC operon, it is possible that there are unknown regulators other than the FlhDC regulator that allow the production of flagella, although in E. coli K-12 the expression of the flhDC genes has been shown to be absolutely required for the expression of other genes of the flagellar regulon (11). We have no explanation for the different results in the two studies but note that strain 2457T, the S. flexneri 2A strain used by Giron (7), has had its genome sequenced (27) and has the same IS insertion and associated deletion as the S. flexneri 2A strain we used.
We looked more closely at the genomes of the F2A strains (2457T and 301), for which complete genome sequences have been reported (10, 27). There are 11 common dysfunctional flagellar genes in these strains. Three genes, flgC, flgK, and flhA, each have a single base deletion, three have small deletions (4 bp in flgF and 11 bp each in flhL and flhB), and two (flhD and fliJ) have large deletions of 155 and 99 bp, respectively. The deletion in flhD is associated with the insertion of an IS1. For the remaining three dysfunctional genes, flhE has an IS3411 insertion, fliA has a stop codon (sites 217 to 219) leading to truncation of the protein, and fliE has a 6-base deletion as well as a 7-base insertion. There are also two genes inactivated in one strain but not the other. A flgDE gene fusion occurred in strain 301, and there was a single base insertion in fliY plus a 28-base insertion in fliP in strain 2457T. The mutations common to both genomes must be in all stocks of 2457T, and the extensive damage that these mutations involve makes it quite unlikely that the flagella observed by Giron are assembled by the well-worked system that we are studying. However, it remains possible that the flagellin of these polar flagella is encoded by fliC, as this gene seems to be intact in at least the genome strains and some others that have been examined (23, 24). The serology undertaken by Giron (7) was not consistent in this regard. Antisera prepared by one of us (A.T.) against FliC of S. sonnei or S. flexneri, expressed in E. coli K-12, was more active against flagella prepared from S. sonnei or S. flexneri, respectively, than against flagella prepared from the other (7), indicating that fliC was involved, but this was not confirmed by the antisera prepared against the flagellin extracted from the Shigella strains themselves. There are no reports of other flagellar systems in Shigella strains, and the alternate flagellar system recently found in many E. coli strains (19) is not functional in the two F2A genomes discussed above or the D1 or S. sonnei genome (19), and further work will be needed to determine the genetic basis and role of the flagella observed by Giron (7).
Concluding comments.
Flagellar motility is an important trait for E. coli living in the intestinal environment. This property was lost in all Shigella strains, although there is a recent report of motility under special conditions (7). Normal motility at least presumably became redundant and perhaps deleterious with the adoption of a new niche involving invasion of epithelial cells and gain of genes for actin-based movement within cells. Flagellar synthesis in E. coli involves a complex set of genes located in several regions of the chromosome, with the master operon consisting of two genes, flhC and flhD. In this study we examined the flhDC region and revealed extensive damage in the two genes as well as deletions often extending into neighboring genes not associated with flagellar synthesis. Decay of the flhDC genes appears to have occurred independently many times. Each of the three clusters of Shigella strains are so closely related that we did not observe any recombination in housekeeping genes. The mutations and insertions observed in the flhDC region can be located on the tree based on housekeeping genes, and one can see that the decay of these genes appears to be ongoing once the synthesis system is inactivated.
The Shigella genome has undergone considerable degradation, as shown by comparison with the K-12 genome (10, 27). The loss of functions in Shigella strains is presumably due to niche adaptation, as they occupy a very different niche than commensal E. coli and are expected to shed functions that are not needed for their particular niche (13) as a trade-off for optimal fitness (4). This loss seems to be mediated by IS insertions and deletions in many instances, including the inactivation of the tna operon for the utilization of tryptophan in many Shigella strains (20). The IS content in the S. flexneri 2A genome is more than sevenfold greater than that of the nonpathogenic E. coli K-12 and enterohemorrhagic E. coli O157:H7 (10, 27). The increase of IS content leading to inactivation of genes is likely to be driven by this adaptation process. This is in sharp contrast to loss of function in Salmonella enterica serovar Typhi, which also has undergone considerable loss of function, but which is in many cases due to base substitutions or mutation in homopolynucleotide tracts due to strand slippage (15). A distinctive difference in consequence of IS-targeted inactivation is that the inactivation event is often followed by deletion of neighboring genes, as seen in this study.
A recent study revealed that FlhDC is involved in regulation of nonflagellar systems, including anaerobic respiration and carbon and nitrogen metabolism as a positive regulator (17), so the finding that damage in the flhDC genes is widespread in Shigella strains raises the possibility that inactivation of flhDC genes offers Shigella strains benefits other than the effect on motility. It seems clear that there is still much to be learned regarding the impact of loss of gene function in Shigella.
Nucleotide sequence accession numbers.
The GenBank accession numbers for the nucleotide sequences determined in this study are AB168054 to AB168088 .
ACKNOWLEDGMENTS
This research is supported by a grant from the National Health and Medical Research Council of Australia. A.T. was a visiting research fellow at the University of Sydney, supported by Monbukagakusho (Japan).
FOOTNOTES
- Received 17 October 2004.
- Accepted 14 March 2005.
- Copyright © 2005 American Society for Microbiology