Cloning and nucleotide sequence of a gene (ompS) encoding the major outer membrane protein of Legionella pneumophila

The major outer membrane protein of Legionella pneumophila is composed of 28- and 31-kDa subunits cross-linked by interchain disulfide bonds. The oligomer is covalently anchored to the underlying peptidoglycan via the 31-kDa subunit. We have cloned the structural gene ompS encoding both proteins. Oligonucleotide probes synthesized from the codons of the N-terminal amino acid sequence of purified 28- and 31-kDa subunits were used to identify cloned sequences. A 2.9-kb HindIII fragment cloned into pBluescript (clone H151) contained the ompS gene. Nucleotide sequence analysis revealed an open reading frame of 891 bp encoding a polypeptide of 297 amino acids. A leader sequence of 21 amino acids was identified, and the mature protein contained 276 amino acids. The deduced amino acid sequence of OmpS matched the experimentally determined amino acid sequence (32 amino acids), with the exception of two cysteine residues. The deduced amino acid sequence was rich in glycine and aromatic amino acids and contained four cysteine residues, two in the amino terminus and two in the carboxy region. Primer extension analysis (total RNA from L. pneumophila) identified the transcription start at 96 to 98 bp upstream of the translation start, but no Escherichia coli-like promoter sequences were evident. While an mRNA transcript from clone H151 was detected, no cross-reactive protein was detected by immunoblotting with either monoclonal or polyclonal antibody. Attempts to subclone the gene in the absence of the putative promoter region (i.e., under the control of the lac promoter) proved unsuccessful, possibly because of overproduction lethality in E. coli. The ompS DNA sequence was highly conserved among the serogroups of L. pneumophila, and related species also exhibited homology in Southern blot analysis at a moderately high stringency. Evidence is presented to suggest that this gene may be environmentally regulated in L. pneumophila.

Legionella pneumophila and related species are a diverse group of environmental microorganisms capable of producing severe lobar pneumonia in humans. These facultative intracellular parasites can invade and colonize a wide range of eukaryotic host cells, including aquatic amoebae (40,49,51), alveolar macrophages (28), and vertebrate cell lines (13,33,36). The bacteria reside in the phagosomes, in which they abrogate phagolysosomal fusion, a process that does not require de novo protein synthesis (29). These observations suggest that preexisting surface proteins may participate in the pathogenesis process. One surface protein, named Mip (macrophage invasion potentiator), is a 24-kDa protein exhibiting a pI of 9.3 (14). Null mutations in mip result in attenuated virulence for macrophage cell lines and for guinea pigs (10,11). The most abundant surface protein, referred to as the major outer membrane protein (MOMP), is a porin composed of subunits of 28 and 31 kDa (5,6,17,20,21). Studies by Payne and Horwitz (37) have demonstrated that the MOMP porin binds the C3b and C3bi factors of the complement system, which mediate phagocytosis via the macrophage integrin receptors CR1 and CR3. Further, complement enhances the binding of MOMP-containing liposomes to macrophages (3). Studies by Quinn et al. (38) also suggest that the MOMP may be directly involved in attachment in a HeLa cell model and also may be a protective immunogen in the guinea pig model. While human and guinea pig convalescent-phase sera contain little or no antibody to the MOMP (42), guinea pigs mount a significant cellular immune response to this antigen (23). Preliminary * Corresponding author.
reports suggest that this antigen may be protective against lethal challenge in the guinea pig model (48).
The MOMP is novel in that one of the subunits is covalently bound to peptidoglycan via a peptide bond, probably to diaminopimelic acid (5). This subunit is covalently crosslinked to 28-kDa subunits via interchain disulfide bonds (5,27). Progress in assessing the role of this protein in pathogenesis as well as in characterizing the novel structure of this putative porin has been hindered by an inability to clone the structural gene. A number of laboratories, including our laboratory, have been unsuccessful in cloning this gene in expression vectors. The expression of cloned porin genes, particularly from nonenteric pathogens, in Escherichia coli is often inhibitory for growth (2,19). In some cases, gene fusions in vectors such as lambda gtll have been used to identify porin sequences in expression libraries (19,45). Such approaches have been used to clone the 40-kDa MOMP gene (omplL2) from Chlamydia trachomatis (46).
Our approach to cloning the ompS gene involved the use of oligonucleotide probes generated from amino acid sequence information from the 28-and 31-kDa MOMP subunits. We found that both subunits had a common N-terminal amino acid sequence (27). On the basis of this sequence (GTMGPVWT), a series of oligonucleotides were generated, one of which hybridized to a single site of restricted genomic DNA, as judged by Southern blot analysis. In the present study, we report on the use of this oligonucleotide as a probe for identifying clones containing the putative genes coding for the L. pneumophila MOMP subunits. On the basis of the nucleotide sequence and the deduced amino acid sequence, we have confirmed that an open reading frame (ORF) of 891 bp encodes a polypeptide of 297 amino acids. The gene is poorly expressed under its own promoter in E. coli and may be down-regulated during the early stages of invasion.
(A preliminary account of this work was presented at the 1991 American Society for Microbiology general meeting [26].) MATERIALS AND METHODS Bacterial strains, protein sequence, and oligonucleotide reagents. A streptomycin-resistant strain (SVir) of L. pneumophila Philadelphia 1 (serogroup 1) was used as the source of the 28-and 31-kDa proteins and chromosomal DNA as described elsewhere (24,27). Other serogroups of L. pneumophila and various Legionella species used in this study included L. pneumophila serogroups 1, 2, 3, 4, 5, and 7; L. micdadei (HEBA and Tatlock); L. jordanis serogroup 1; and L. oakridgensis OR10. Preparation of genomic DNA from these strains has been described previously (24). Purification of the 31-and 28-kDa proteins, peptide maps, and peptide sequencing were as described elsewhere (5,27). Oligonucleotides used in this study were synthesized at the Molecular Resource Center, Department of Microbiology and Immunology, University of Tennessee, Memphis, and at the Molecular Gene Probe Laboratory, Dalhousie University. Two oligonucleotides (24b and H25 [complement]) used as probes in this study had the following respective sequences: 5'GGTACTATGGGTCCAGTATGGAC3' and 5'GTCCATA CTGGACCCATAGTACC3'. These sequences were based on the amino acid sequence GTMGPVWT.
Cloning and nucleotide sequencing of the ompS gene. Chromosomal DNA from L. pneumophila restricted with EcoRI, HindIII, or a combination of HindIII and PstI was subjected to electrophoresis in an 0.8% low-melting-temperature agarose gel (SeaPlaque; FMC Bioproducts, Rockland, Maine). DNA fragments in molecular weight ranges previously identified by Southern blotting were excised and ligated into pBluescript (Stratagene), similarly restricted, and purified from low-melting-temperature agarose. The resulting colonies were screened with [32P]ATP-end-labeled oligonucleotide H25 (reverse primer) as generally described by Sambrook et al. (41). Restriction enzymes, buffers, and procedures were as described by the various manufacturers. Restricted DNAs from clones H151 (2.9-kb HindIII fragment) and HP246 (900-bp HindIII-PstI fragment) were subcloned into M13 vectors for sequencing. Sequencing was performed by the dideoxy chain termination method with [355]dATP (NEN-DuPont, Toronto, Ontario, Canada) and Sequenase (U.S. Biochemical Corp., Cleveland, Ohio) as suggested by the manufacturers. Both DNA strands were sequenced, and the sequences were assembled and analyzed with the Wisconsin Genetics Computer Group sequence analysis programs (Genetics Computer Group, Inc., University of Wisconsin Biotechnology Center, Madison) (12).
Southern and Northern (RNA) blot analyses and primer extension. DNA hybridizations were done essentially as described by Southern (44). Chromosomal DNAs from the various Legionella serogroups and species were restricted with EcoRI or HindIII. The DNA probe was radiolabeled by the polymerase chain reaction (PCR). Oligonucleotide primers 24b and R3 (reverse primer hybridizing to an internal region of the ORF) were used to amplify an approximately 500-bp segment of ompS. [32P]dCTP (65 p,Ci in a 50-pl reaction mixture) was incorporated into the PCR fragments by decreasing the carrier dCTP molar concentration in the deoxynucleotide triphosphate reaction mixture from 200 to 50 ,uM. The PCR fragments were purified through Sephadex G-50 as described by Silhavy et al. (43). Approximately 8 x 106 cpm was added to the hybridization mixture following denaturation. Hybridizations and washings were done at a moderately high stringency (15% mismatch in the duplex) and at a high stringency (<5% mismatch in the duplex) as described previously (24).
Total RNA was prepared by a hot sodium dodecyl sulfatephenol procedure (27). Primer extension was performed by hybridizing probe H25 to approximately 25 pug of total RNA. The oligonucleotide probe was end labeled with [32P]ATP (125 ,Ci; NEN-DuPont) and T4 polynucleotide kinase (New England BioLabs, Inc., Beverly, Mass.) as described previously (27). The probes were used either directly or after gel purification on a 20% polyacrylamide-8 M urea gel. The extension reaction was run at 42°C for 1 h in the presence of actinomycin D. Following RNase treatment, phenol extraction, and ethanol precipitation, the cDNA was taken up in loading buffer and 2 to 5 ,ul was loaded onto the sequencing gel. The extension product was compared with the simultaneously run sequencing reaction products generated from M13 containing a HindIII-PstI fragment encoding the 5' region of the ompS gene and H25 as a primer.
Nucleotide sequence accession number. The nucleotide sequence accession number (GenBank) for ompS is M76178.

RESULTS
Cloning strategy. We have assumed, on the basis of various strategies aimed at cloning the MOMP gene into expression vectors, either that the expression of the gene was toxic or that the gene was not expressed in the E. coli genetic background. By use of a DNA hybridization approach, the requirement for gene expression could be eliminated and, in the case of toxic gene expression, unexpressed fragments of the desired gene might be cloned. In a previous study, we reported that an oligonucleotide synthesized from the N-terminal amino acid sequence GTMGPVWT hybridized to single sites in genomic DNA restricted with various restriction enzymes (27). Since Southern hybridization analysis provided information on the relative sizes of the desired restriction fragments, we first attempted to clone these sized fragments into pBluescript. We were unsuccessful in obtaining clones of a 1.5-kb EcoRI fragment in either pBluescript or M13 phage. The fact that recombinants were obtained (white versus blue colonies or plaques scored with isopropyl-p-D-thiogalactopyranoside and 5-bromo-4-chloro-3-indolyl-p-D-galactopyranoside) suggested no technical difficulty. We then screened clones containing DNA fragments obtained by HindIII cleavage and by a combination of HindIII and PstI restriction. Four colonies from a total of 280 white colonies were found strongly positive by Southern colony blot hybridization. Restriction endonuclease analysis of cloned sequences showed that two clones contained a 2.9-kb HindIII fragment (H151 and H157) and that two contained a 900-bp HindIII-PstI DNA fragment (HP243 and HP246). Restriction maps for H151 and HP246 are presented in Fig. 1. Interestingly, the 1.5-kb EcoRI fragment was contained within the 2.9-kb HindIll fragment. Attempts to subclone the EcoRI fragment proved unsuccessful, suggesting that an ORF, designated ompS, may be within the EcoRI restriction fragment and that the expression of this gene may produce a toxic product in E. coli. While not shown, a screening of the E. coli clones for antigen expression with both monoclonal and polyclonal antibodies reactive with both the 28-and the 31-kDa subunits was also negative. The possibility that a transcriptional defect might account for the FIG. 1. Restriction endonuclease maps of H151 and HP246 in pBluescript. A 2.9-kb HindIII fragment was cloned into pBluescript. The ORF was localized to the 1.5-kb EcoRl region. Abbreviations: H, HindIlI; P, PstI; E, EcoRI. The map distances are given in base pairs. The lac promoter is in the left arm of pBluescript and would permit the transcription of ompS. lack of expression was examined by Northern blot analysis of clone H151. A low level of transcription was observed for H151 probed with an internal PCR-generated DNA fragment of the ompS gene (Fig. 2). The molecular size of the H151 mRNA was similar to that detected in total RNA preparations from L. pneumophila SVir and from an isogenic protease-deficient strain, PRT8. These results suggest that the ompS promoter region is poorly recognized in E. coli and that perhaps a translational defect might also account for a lack of detectable MOMP.
Sequence analysis. The nucleotide sequence of structural gene ompS and flanking sequences and the deduced amino acid sequence are depicted in Fig. 3. The ORF begins approximately 800 bp in from the leftward HindIII site ( Fig.  1) and 270 bp into the depicted sequence. A ribosome binding domain (TGGAG) is located 9 bases upstream from the initiation codon. An 891-bp ORF encodes a polypeptide of 297 amino acids, of which the first 21 amino acids are presumably involved in protein export, since the processed protein begins at amino acid 22 (PheAlalGlyThrMet ...) (slash indicates processing site). With the exception of two cysteine residues, there was good agreement between the deduced and experimentally determined amino acid sequences (32 amino acids), as depicted by the underlined N-terminal region of the sequence in Fig. 3. The N-terminal amino acid sequence of a 19-kDa peptide generated by cyanogen bromide cleavage of the 31-kDa protein was also located in the deduced sequence. However, the differences noted between this sequence and the deduced sequences are A BC D . The mRNA was probed with a PCR-generated 31P-labeled DNA fragment generated to an internal region of the ompS gene (oligonucleotide primers 24b and R3). The hybridization noted in lane B at a high molecular weight probably represents contaminating plasmid sequences of H151. most likely due to the poor quality of the amino acid sequence obtained for this peptide. Downstream of the termination codon (TAA) is a palindromic sequence resembling a rho-independent transcription terminator. An obvious E. coli-like promoter sequence was not identified upstream of the translation start. The possibility that the gene was part of an operon was resolved by primer extension analysis. Primer extension revealed that transcription begins within a CCC sequence (bp 172 to 174 in Fig. 3) that is flanked on both sides by an AT-rich sequence and that is 90 to 100 bp upstream of the translation start (Fig. 4).
In a previous study, we predicted that the MOMP contained three cysteine residues on the basis of amino acid composition analysis (5). On the basis of the deduced amino acid sequence, four cysteine residues were identified. Two of these residues are located in the N-terminal region (amino acids 7 and 16 of the processed polypeptide), and two are in the carboxyl region at amino acids 194 and 197. The polypeptide also contains seven methionine residues rather than three, as previously reported on the basis of the amino acid composition analysis. The protein is rich in glycine (11%) and the aromatic amino acids phenylalanine (7.4%), tyrosine (6%), and tryptophan (3.4%). The processed polypeptide exhibits an acidic pl of 4.59, and the amino acid composition, hydrophilicity characteristics, and a lack of long stretches of alphaor beta-sheet sequences are typical of porin proteins. A search of the protein data base revealed that OmpS was not closely related to other porin proteins, with the exception of a 69-kDa outer membrane protein of Bordetella pertussis which exhibited a similarity of 45% and an identity of 21% (8). The sequence also exhibited 22% identity with a repeat in the CR1 or CR3 integrin, a complement receptor of macrophages (30).
Distribution of ompS in legionellae. Previous studies in our laboratory have demonstrated that several Legionella species express disulfide-cross-linked and peptidoglycan-bound outer membrane proteins (5). Furthermore, on the basis of immunoblot studies with polyclonal anti-Legionella MOMP serum, we reported that the MOMPs of the various species may share common epitopes (6). To address the possibility that the genes encoding these proteins from the various species might be genetically related, we probed chromosomal digests from selected Legionella species and serogroups with an internal PCR-radiolabeled fragment of the ompS gene. Figure 5A depicts the results of Southern hybridization analysis performed at a moderate stringency (15% mismatch in the duplex). At this stringency, all Legionella species examined exhibited related sequences. The weakest signal was noted for L. jordanis, which does not express a 28-kDa MOMP (5). The strongest signals were observed for the serogroups of L. pneumophila. However, multiple bands were seen with the serogroups, suggesting that there might be either related genes encoding outer membrane proteins or possibly cryptic genes. At higher  stringencies for genomic DNA restricted with HindIIl (5% mismatch in the duplex), the hybridizations noted with the other Legionella species disappeared (Fig. 5B), along with the high signal intensity of the multiple hybridizations noted for the serogroups. Southern blot analysis of L. pneumophila serogroup 1 DNA restricted with HindIII and EcoRI identified a single fragment of 1.5 kb, confirming that ompS exists in a single copy (data not shown).

DISCUSSION
We have cloned and sequenced the structural gene encoding the 28-and 31-kDa subunits of the L. pneumophila MOMP oligomer. The MOMP is the most abundant protein synthesized by L. pneumophila, and recent studies have implicated this protein in pathogenesis (3,37,38) and immu-nity (23,48). DNA sequence analysis revealed an 891-bp ORF encoding a polypeptide of 297 amino acids. The polypeptide contained a 21-amino-acid signal sequence, and the mature protein contained 276 amino acids. The deduced amino acid sequence of the MOMP exhibited four cysteine residues and in general was rich in glycine and aromatic amino acids. With the exception of cysteine residues at positions 7 and 16 of the processed polypeptide, there was good correlation between the deduced amino acid sequence and that obtained by direct sequencing of the purified MOMP subunits (27). In this study, we confirmed the results of earlier work regarding the number of cysteine residues in the monomers. Two cysteine residues were found in the amino terminus while two were found in the carboxyl region of the molecule. A search of nucleic acid and protein data was extended with 50 U of avian myeloblastosis virus reverse transcriptase, and the cDNA was suspended in sample buffer and resolved on a polyacrylamide gel beside a sequencing ladder generated from the same oligonucleotide primer with a HindIII-Pstl DNA fragment in M13 as the template. The transcription start is located in a CCC sequence flanked by AT-rich sequences. bases revealed that no other proteins showed amino acid sequence homology with OmpS. There was no homology with the Chiamydia sp. 40-kDa MOMP, which contains nine cysteine residues and participates in inter-and intramolecular disuffide bonding (46). Interestingly, some homology was noted with the 69-kDa outer membrane protein of B. pertussis and with the macrophage integrin protein CR1, a receptor for complement (3,37). The homology with CR1 appeared to be in'o a variable-repeat region (34). While the observed similarities might be merely coincidence, it should be noted here that while C3 components of complement bind to the L. pneumophila MOMP, the specific binding site has not been identified.
Disulfide-cross-linked and peptidoglycan-bound outer membrane proteins are a characteristic feature of many members of the genus Legionella (5). A commercial monoclonal antibody diagnostic test for legionellosis recognizes an epitope on the MOMPs shared by all serogroups of L. pneumophila (18). Moreover, a study by Butler et al. (6) suggested that the MOMPs of other Legionella species might also contain genus-common epitopes. Souther blot analysis In panel A, the hybridization was done at a decreased stringency (15% mismatch in the DNA duplex) to detect relatedness among the different species. The stringency conditions for the hybridization depicted in panel B was at a 5% mismatch in the DNA duplex. The DNA probe used in these experiments was generated by PCR with oligonucleotides 24b and R3 as primers, and the amplified PCR fragment was radiolabeled as described in the text.
confirmed that all of the serogroups examined in this study contained highly conserved sequences, although some restriction polymorphism was evident. Under moderate-stringency conditions, DNA-DNA hybridizations were also noted with several other Legionella species. A number of other genes which contain genus-common sequences have been described for the legioneliae; these include mip (14), htpAB (25,38), and a gene encoding a 19-kDa peptidoglycanassociated protein (15,22,32). In contrast, the gene encoding the cytotoxic metalloprotease is common only to the species L. pneumophila (39).
Since the MOMP is the most abundant protein synthesized by L. pneumophila, information regarding the expression of the gene is of particular interest. Although primer extension analysis identified the transcription start at 97 bp upstream of the translation start, no E. coli-like promoter sequences were seen in regions 10 (TAATAAAAT) and 35 (TCAAT GAG) bp upstream. These promoter sequences differ from those noted for other cloned L. pneumophila genes expressed in E. coli, including mip (14), htpAB (25), the protease gene (4), and recA (52). An unusual tandem promoter sequence has been reported for the omplL2 gene of C. trachomatis (47). The omplL2 gene is developmentally regulated, and the MOMP is the most abundant protein synthesized by C. trachomatis. In contrast, Northern blot analysis of L. pneumophila RNA revealed a single transcript of approximately 1 kb (27), confirming a single promoter and a monocystronic gene structure. Interestingly, secondary structure predictions for the mRNA showed a substantial ability of the 5' region to form loops, which may function to stabilize the message in L. pneumophila but possibly affect translation in E. coli (9). The ompS gene, like the chlamydial omplL2 gene, may be environmentally regulated. We have observed that little radiolabel is incorporated into the 28-kDa MOMP subunit of virulent L. pneumophila cells during the early stages of invasion of HeLa cell or L-cell monolayers (1 to 3 h postinfection) (16). In contrast, radiolabel is incorporated into stress proteins, a phenomenon that can also be reproduced in tissue culture medium in the absence of host cells (25). When mRNA levels were monitored, the ompS message was decreased relative to the htpAB message. The observation that avirulent isogenic strains of L. pneumophila do not regulate MOMP levels or show decreased mRNA levels when cells are placed in tissue culture medium implies that these cells no longer sense changes in the environment or perhaps no longer produce the necessary regulatory factors. Environmentally regulated transcription has been well characterized for ompF-ompC porin genes of E. coli (34) and for the vir (bvg) genes encoding many of the membrane-associated proteins of B. pertussis (1). We are presently constructing gene fusions to begin addressing the regulatory aspects of ompS gene expression as well as putative regulatory differences between virulent and avirulent isogenic strains.
The L. pneumophila MOMP (putative porin) (17) is unique from the porins of other bacteria in that it is covalently bound to peptidoglycan and the subunits are cross-linked via interchain disulfide bonds (5,27). On the basis of the deduced amino acid sequence, we found two cysteine residues in the N-terminal amino acid sequence (positions 7 and 16 of the processed polypeptide) and two at positions 194 and 197 in the carboxyl region. The first two cysteine residues are found in an amino acid sequence of high 3-sheet-forming potential, suggesting that at least one of the cysteine residues may be within the outer membrane lipid bilayer, while the other might be either external or near the outer surface. These cysteine residues are separated by an essentially hydrophobic stretch of eight amino acids, and the cysteine residue at position 16 is followed by the charged amino acids glutamate and arginine. Cysteine residue 194 is preceded by the charged amino acids aspartate and asparagine and separated from cysteine residue 197 by two hydrophobic amino acids. While all cysteine residues could potentially participate in interchain disulfide bonding among subunits, the minimum number of bonds that each subunit could participate in and still maintain the trimeric form would be two. It is conceivable that subtle changes in the conformation of the putative porin might result from different combinations of disulfide bonding among the cysteine residues. Such combinations might also be envisioned to change the pore size or other physical characteristics of the molecule. Variable pore sizes noted for Pseudomonas aeruginosa OprF have been attributed to different patterns of disulfide bond cross-linkages (35). A prominent difference between virulent and avirulent isogenic strains of L. pneumophila is in the tolerance of sodium chloride (7). While the porin molecule has been shown to exhibit an anion preference (17), changes in conformation could affect the magnitude of this selectivity. It is perhaps noteworthy that the amino-terminal cysteine residues are in a region substantially rich in proline. Recently, the Mip protein has been shown to be highly homologous with a class of proteins known as peptidylprolyl cis-trans isomerases (50). These enzymes are capable of changing proline residues from one isomer to the other (31). Such changes might affect the secondary and perhaps, in the case of interchain disulfide bonds, the quaternary structures of the porin. The role of chaperone proteins in porin assembly has only recently been addressed (25), and continued study of these mechanisms might lead to new insight into assembly processes external to the cytoplasmic membrane. It will be important in future studies to address the possibility that conformational changes, perhaps mediated by the mechanisms described here, might be partially responsible for the phenomenon of avirulence acquired through the selection for high salt tolerance under laboratory conditions.