Sequence of the flaA (cheC) locus of Escherichia coli and discovery of a new gene

The flaA (cheC) locus from Escherichia coli is important in controlling the rotational direction of flagella during chemotaxis. The locus was sequenced, and a site of transcriptional initiation was determined. Two reading frames, flaAI and flaAII, span the locus. flaAII corresponds to certain flaA and cheC mutations, and has some unusual features in the predicted secondary structure. flaAI, however, has not been identified previously, but a flaAI deletion, which produced a truncated FlaAI peptide in minicells, clearly identified the FlaAI protein.

Mutations in the flaA (cheC) locus exhibit a number of phenotypes. Some E. coli and Salmonella typhimurium mutants, including null mutants, lack flagella (17,35,36), while others exhibit paralyzed flagella (17,34,36), suggesting thatflaA has roles in both flagellar structure and motility. There are also chemotaxis mutants (designated cheC) with strong biases toward smooth swimming or tumbling (3,17,25,28,36,37), including a mutant with inverted response to attractants and repellents (12,31). Although FlaA is not an integral membrane protein (7), the phenotype of flaA mutants is retained in spheroplasts which lack cytoplasmic material (29), suggesting that FlaA is membrane-associated. The cumulative evidence suggests that FlaA protein is part of a molecular switch at the motor controlling flagellar rotation during chemotaxis (12,27,28,31).
Recently, the flaA locus from E. coli has been subcloned, and the effects of its overproduction on chemotaxis have been studied (7). Deletion mutants and two-dimensional gels of cheC mutants correlate flaA with a 37-kilodalton (kDa) protein. Overproduction of this 37-kDa protein affects the motility of cells by slowing the rate of swimming and reducing the frequency of tumbling.
We have sequenced the flaA locus from E. coli and determined that two open reading frames, designatedflaAI and flaAII, span the region. flaAI has not been identified previously, whereas flaAII corresponds to certain flaA and cheC mutations. Evidence is presented that flaAI is expressed in minicells.

MATERIALS AND METHODS
Bacterial strains. Only E. coli K-12 strains were used in this study. RP437 [F-thr-J (Am) leuB6 his4 metFlS9(Am) eda-S0 rpsL136 (thi-J ara-14 lacYimtl-i xyl-SfluA3J tsx-78)], which is wild type for chemotaxis, was from J. S. Parkinson (26). Plasmid (32) dideoxy chain termination were used to determine the sequence of both DNA strands from pCK210 (7). Chemical cleavage was used to determine the HindIII-BamHI sequence on both strands. The rest of the insert was sequenced by shotgun subcloning into M13mp8 (24) and by using [35S] dATP aS and dideoxynucleotides as described previously (5).
Sequences were analyzed by programs supplied by H. Martinez, University of California, San Francisco. Hydropathy was calculated by the algorithm of Kyte and Doolittle (15). Secondary structure was predicted by the algorithm of Chou and Fasman (6), using the updated data set (2).
Preparative techniques. Restriction enzymes were used as recommended by the supplier. DNA manipulations were generally carried out as described by Maniatis et al. (19). Minicells were purified and labeled as described previously (22), except that L-[35S]methionine, at 50 ,uCi/ml in Difco methionine assay medium, was used to label minicells at an optical density (650 nm) of 1. Protein gels were done by the methods of Laemmli (16) and Ames (1). Molecular weight calibrations for protein gels were performed with the Pharmacia calibration kit. S1 nuclease mapping. RNA minilysates were prepared from mid-log-phase cells grown at 30°C by a modification of the lysozyme-lysis procedure of Davis et al. (8) in which the diethyl oxydiformate and RNase A treatments were omitted. RNA content was estimated spectrophotometrically, with the bulk of the minilysate nucleic acid assumed to be RNA. By standard methodology (19), the HindIII-BamHI probe from pCK210 was 5' labeled with T4 polynucleotide kinase, and strands were separated on a native 5% polyacrylamide gel. Strands were identified by Maxam-Gilbert sequencing The sequence of theflaA (cheC) locus was determined by using both Maxam-Gilbert and Sanger dideoxy sequencing on both strands, as described in Materials and Methods and in the legend to Fig. 2. Two open reading frames span the insert, and the predicted amino acid sequences are shown. The site of transcriptional initiation, labeled as + 1, was determined by S1 nuclease digestion as shown in Fig. 4. A portion of X sequence from the original Xfla36A7 clone is also indicated. (21). Si nuclease protection mapping was done by the procedure of Gilman and Chamberlin (9). All fragments were run on an 8% polyacrylamide gel containing 8.3 M urea. RESULTS Sequence of theflaA locus. Previously, Clegg and Koshland subcloned the Escherichia coli flaA (cheC) locus from a X clone into pBR322, constructing pCK210 (7). The DNA sequence of this flaA subclone is shown in Fig. 1, and the sequencing strategy is shown in Fig. 2. Two open reading frames, flaAI and flaAII, span the region and encode polypeptides with predicted molecular masses of 17 and 38 kDa, respectively. Mutants inflaAI have yet to be identified, but there already exists evidence that flaAII corresponds to flaA and cheC. Previous deletion studies by Clegg and Koshland correlate FlaA with a 37,kDa protein that is truncated by a deletion spanning the PvuII sites of pCK210 (7). Since the flaAII gene should encode a 38-kDa polypeptide, and since it spans a PvuII site,flaAII corresponds to the gene identified by certain flaA and cheC mutants. Immediately following flaAII on pCK210 is a portion of X sequence from the original lambda subclone, Xfla36A7 (28), which confirms previous restriction mapping of Xfla36 deletions (7).
The predicted polypeptides were analyzed for hydropathy (Fig. 3). FlaAI had a strongly hydrophobic amino-terminal region and an overall hydropathy that was higher than average (0.03 versus -0.26 for globular proteins), suggesting that it is membrane associated. Whether or not FlaAI inserts into the inner membrane has yet to be determined. FlaAII, on the other hand, exhibited an average overall hydropathy.
It had two strongly hydrophilic regions and only one strongly hydrophobic region. Secondary structure predictions were performed by using the algorithm of Chou and Fasman. Two notable regions of FlaAII are predicted to be a-helical and are included in Fig.  3. These regions are very amphiphilic, which strengthens the Chou-Fasman prediction of a-helicity for these two regions (11,33). The amphiphilicity of the second helical region, Asp-248 through Leu-276, is particularly striking since most of the hydrophobic residues are arranged in a strip along one side of the helix and there are strong possibilities of salt bridges between ionized residues on adjacent turns of the helix. If this region is a single, continuous a-helix, the sequence Asp-248 through Leu-276 is predicted to produce a  (7), this fragment was used to map the 5' end of the transcript by protection from S1 nuclease digestion. Transcripts were isolated from bacteria transformed with pCK210. The protected fragment comigrated with a sequencing ladder fragment 80 nucleotides in length (Fig. 4). Assuming the protected fragment is 80 nucleotides long, transcription starts at an adenine 31 nucleotides upstream of flaAI which is identified as + 1 in Fig. 1. The region 10 nucleotides upstream of this adenine also provides the best match to the consensus sequence for RNA polymerase-binding sites (30). The -35 region showed a poorer match to the consensus sequence. Since transcription is consistent withflaAI expression, we attempted to identify the 17-kDa FlaAI polypeptide. Because of the high background labeling in bands of this molecular mass range, Clegg and Koshland (7) could hot clearly observe a polypeptide from flaAI in minicells transformed with pCK210. To eliminate such ambiguity, a truncated version of FlaAI was constructed by a 96-base-pair in-frame deletion fusing Ile-16 to Arg-49 (Fig. 5). Protein expression from both the wild-typeflaAI gene and theflaAI deletion was examined in minicells produced by aberrant division of minB mutants (22). Expressed proteins were labeled with [35S]methionine and resolved on polyacrylamide gels (Fig. 5). Both plasmids express the FlaAII polypeptide at 37 kDa, as reported previously (7). Wild-type flaAI expressed a polypeptide with an apparent molecular mass of 18.3kDa rather than the predicted l7kDa. Such discrepancies between predicted and apparent molecular masses are not uncommon and have been observed for other proteins (e.g., reference 10). . The hydropathy of the two reading frames in the flaA locus was dete-mined by the algorithm of Kyte and Doolittle (15) with an averaging window of seven amino acids. The position numbers of amino acids at the center of each averaging window are given on the abscissas. Baseline is at -0.26, the average hydropathy of globular proteins, and the dashed line at 1.6 is the threshold for strongly hydrophobic regions. The secondary structures predicted by the algorithm of Chou and Fasman (2, 6) are shown below the hydropathy plots. Sequences Val-47 through Arg-75 and Asp-248 through Leu-276 are predicted to be a-helical. polypeptide was observed, clearly identifying the F1aAI polypeptide. The difference of 3.3kDa in apparent molecular weights is consistent with a loss of 32 amino acids (96-basepair deletion). DISCUSSION Sequence determination of theflaA (cheC) locus revealed two open reading frames which span the locus. The second reading frame, flaAII, was identified previously by physical mapping as the coding sequence for the 37-kDa FlaA protein. A previously unknown gene, designatedflaAI, encodes a 17-kDa polypeptide and precedes flaAII. The discovery of flaAI has also been made by J. Malakooti and P. Matsumura (personal communication).
Since flaAI was missed in previous genetic studies, it was important to show that flaAI was actually transcribed and expressed. S1 nuclease digestion identified the site of transcriptional initiation that is consistent with complete transcription of flaAI. Furthermore, the -10 region matches very well with the bacterial consensus promoter sequence (30), but the -35 region is not as clear a match. Protein gels of minicells containing flaAI plasmids showed a polypeptide with an apparent molecular weight of 18.3kDa that became 3.3kDa smaller when a 96-base-pair in-frame deletion was constructed in flaAI. This result unambiguously identifies the FlaAI polypeptide.
In light of the discovery of a new gene in the flaA locus, the designation of cheC mutations and their interpretations must be reexamined. Both smooth swimming and tumbling phenotypes in E. coli (3,25,28) and S. typhimurium (17,36,37), as well as an inverted response phenotype (11,31), have been observed for cheC. Since these genetic studies were conducted before the existence offlaAI was known, there is some doubt now whether all of these phenotypes are due to lesions in a single gene. To date, there is no evidence that a gene corresponding to E. coliflaAI exists in S. typhimurium.
In the only work correlating FlaA protein with cheC, flaA locus clones were generated by homologous recombination (7) of Afla36 (13) with two E. coliflaA (cheC) mutants. The mutant loci were further subcloned into pBR322 and examined for protein expression on two-dimensional gels. The smooth swimming cheC497 allele from AW674 (3) is clearly associated with flaAII because the mutation alters the pI of its 37-kDa polypeptide. However, assigning the tumbling aliele of cheC from AW675 (14) to flaAII is less clear cut, as no change in pl for the 37-kDa protein was observed in this strain compared with its parent. Until further studies are done on AW675, the possibility of a dominant tumbling mutation outside flaAII still remains.
Chou-Fasman analysis of the FlaAII sequence produces predictions of a-helicity at Val-47 through Arg-75 and Asp-248 through Leu-276. The plausibility of the prediction for the second region is greatly enhanced by both amphiphilicity and possibilities of salt bridges. Recent genetic studies suggest a particular function for these two a-helical regions. Yamaguchi et al. have isolated several S. typhimurium flaQ(cheC) mutants and used deletions to map the sites of mutations (36). Of  Sl-Nuclease nmapping the start of transcription. Probe is the 307 nucleotide BamHI-HindIII fragment of the promoter region of the flaA locus labeled at the 5' end of the BamHI site. RNA was isolated from RP437 (wild-type chemotaxis) transformed with pCK210. Lanes 1 and 2, Sequencing ladder generated by Maxam-Gilbert chemical cleavage (A+G and C+T, respectively) of the probe region labeled 3' at the BamHI site with the Klenow fragment of DNA polymerase. Note that the Maxam-Gilbert fragments are one nucleotide out of register from enzymatically produced fragments. Lane 3, S1 nuclease digestion of probe alone (no RNA added). Lane 4, Full-length probe (no 51 nuclease digestion). Lanes 5 through 7, RNA-protected fragments after increasing S1 nuclease digestion (P-L Biochemicals units) at 37°C for 30 min and 120, 200, and 580 U, respectively. all 15 of the clockwise (tumbling bias) mutations map to two segments, 5 and 18. Assuming equal segment lengths (see reference 36) and a one-to-one correspondence between the E. coli flaAII and S. typhimurium flaQ genes, these two segments correspond exactly to portions of the two a-helical regions predicted in flaAII. Of the five counterclockwise (smooth-swimming) cheC mutants, only one mutation maps to a segment corresponding to the first a-helical region of FlaAII. Clearly, the sequences of these S. typhimurium mutants will aid the exploration of this intriguing correlation between these predicted a-helical regions and clockwise rotation.