| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
,
Adriana Oliveira Stahl,2,
Sonja-V. Albers,1
Jessica C. Kissinger,2
Arnold J. M. Driessen,1 and
Mechthild Pohlschröder3*
Department of Molecular Microbiology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Kerklaan 30, 9751 NN Haren, The Netherlands,1 Center for Tropical and Emerging Global Diseases and Department of Genetics, University of Georgia, C210 Life Sciences, Athens, Georgia 30602-7223,2 Biology Department, University of Pennsylvania, 415 University Avenue, Philadelphia, Pennsylvania 191043
Received 4 October 2006/ Accepted 8 November 2006
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
|
Interestingly, analysis of the predicted Sulfolobus solfataricus secretome revealed that certain membrane-bound substrate binding proteins (SBPs) of this crenarchaeon are also synthesized as preproteins with class III signal peptides (2). Consistent with this observation, the S. solfataricus prepilin peptidase homolog (PibD) could cleave both the flagellin subunit and the precursor of the glucose binding protein (4). While the biological roles of class III signal peptides associated with binding proteins are still unclear, it has been proposed that, similar to archaeal flagellins, these proteins also assemble into a cell surface structure (bindosome) upon secretion and signal peptide cleavage (1, 5). A function of the bindosome might be to locally increase the concentration of sugars for more efficient transport into the cell (5). Proteins with putative class III signal peptides were also observed in the Natronomonas pharaonis and Thermoplasma volcanium genomes (6, 17). The identification of archaeal nonflagellin proteins with class III signal peptides, which thus far have only been shown to be associated with subunits of cell surface structures (e.g., bacterial pili and archaeal flagella), suggests a diverse set of archaeal cell surface structures.
In this study, a PERL program (FlaFind [http://signalfind.org/]) was developed to screen archaeal genomes for proteins with class III signal peptides. In silico and in vivo analyses of FlaFind positives revealed the presence of a diverse set of proteins with class III signal peptides, including a subset of pilin-like proteins that are specifically cleaved by a novel prepilin peptidase. Colocalization of these FlaFind positives with bacterial type IV pilin assembly genes, as well as the structural resemblance of many of the FlaFind positives with homologs of bacterial pilin-like substrates, suggests that they may be subunits of archaeal cell surface structures. The identification of distinct classes of subunits of putative extracytoplasmic structures provides valuable data for future molecular and cell-biological investigations of archaeal cell surface structures, such as archaeal pili, which thus far have not been described in molecular detail.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
Sequence analyses. The FlaFind-positive set was characterized using Pfam v19.0 with the default parameters and the Pfam_fs database (10). DUF361-like sequences were identified among FlaFind positives using a modified FlaFind program in which the motif was [KR][GA][Q][X] [STA][X][DE], where X is any amino acid. Pfam v19.0 was also used to scan the 22 genomes for the DUF361 domain.
Ortholog identification was done with OrthoMCL version 1.2, with the inflation parameter set to 1.01 and the cutoff P value for WU-BLAST set to 1e10. OrthoMCL creates clusters of orthologous and paralogous protein sequences and, together with the Pfam analyses, can clarify the functions and relationships of the various FlaFind positives.
The chromosomal environment for FlaFind-positive genes was determined using Genomapper (http://www-archbac.u-psud.fr/Genomap/GenomapBrowser.html). This tool displays the location, frame, and direction of transcription of a given gene from completely sequenced microbial genomes. Genes in the genomic environment of FlaFind substrate genes were considered to be linked (i.e., in an operon) if the intergenic distance was less than 100 base pairs and the genes were transcribed in the same direction. An exception was made when a substrate gene met all the above criteria but was transcribed in the opposite direction to the remainder of the gene cluster. Putative operons were screened for the presence of genes encoding additional FlaFind substrates and/or type IV pilus biogenesis protein homologs, i.e., TadA-like ATPases, TadC-like membrane proteins, and type IV pilin-like signal peptidases.
To visualize sequence conservation in selected FlaFind substrates, N-terminal sequences were aligned manually (until residue +30 relative to the cleavage site) with the putative signal peptidase cleavage site as reference and analyzed using the Weblogo server (14). To quantify local sequence conservation in an alignment, a PERL script was written that uses the ClustalW consensus symbol output with an adjustable window size. The script quantifies "identical" and "conserved" consensus symbols and produces two output files, one containing the number of identical amino acids per window and one with identical plus similar amino acids. The results, using a window size of 10, were plotted as percent sequence conservation versus amino acid position (see Fig. S1 in the supplemental material).
Plasmid construction. Genomic DNA of Methanococcus maripaludis S2 was a gift from John Leigh (University of Washington). The plasmids used in this study are listed in Table S3 in the supplemental material. MMP0233/epdA, MMP0237/epdC, and MMP1667/flaB2 open reading frames were amplified by PCR from M. maripaludis genomic DNA, with appropriate restriction sites in the primers and with the native stop codons deleted. The PCR fragments were cloned into NcoI/BamHI-cut pZA7 (39), which added a C-terminal hemagglutinin epitope tag, resulting in pZA10, pZA11, and pZA12, respectively. The MMP0232/eppA and MMP0555/flaK genes were amplified in a similar way and cloned into pSA4 (4), yielding pZA13 and pZA14, respectively. Precursor genes including epitope tags were transferred as NcoI/HindIII fragments into pBAD/Myc-His A (Invitrogen, Breda, The Netherlands). Plasmids suitable for coexpression of substrates and peptidases were constructed as follows. First, an NcoI/HindIII fragment of pZA13 or a BglII/HindIII fragment of pZA14 was transferred into the corresponding restriction sites of pUC18-pibD (unpublished data), a construct that contained an SphI cassette including a T7 promoter, the pibD open reading frame, a C-terminal six-histidine tag, and a T7 terminator. In this way, the pibD gene was replaced by the respective peptidase genes. From the resulting plasmids, the SphI cassette was transferred into the unique SphI restriction site of the pBAD/Myc-His A precursor constructs, resulting in coexpression plasmids with all combinations of precursor and peptidase genes (see Table S3 in the supplemental material).
Growth conditions and preparation of E. coli crude membranes. BL21(DE3)(pLysS) was used in all overexpression studies. Bacterial strains were grown to an optical density at 600 nm of 0.6 to 0.8. Then, expression of the precursor genes was induced by addition of L-arabinose for 2 h. Full induction of the araBAD promoter often resulted in strong overexpression of substrate genes, leading to protein degradation. Therefore, the induction conditions were optimized and L-arabinose was added to final concentrations of 0.2% (constructs containing epdA), 0.004% (epdC), and 0.001% (flaB2). Subsequently, peptidase genes were induced with 0.1 mM IPTG (isopropyl-ß-D-thiogalactopyranoside) for 2 h. The culture was harvested by centrifugation, and the cell pellets were resuspended in 2 ml of buffer (50 mM Tris-HCl, pH 7.5, 1 mM EDTA). Crude membranes were isolated as described previously (4) and resuspended in 50 mM Tris-HCl, pH 7.5. Cleavage of substrates was determined by sodium dodecyl sulfate-polyacrylamide gel electrophoresis and Western immunoblot analysis of 5 µg (EpdC and FlaB2 membranes) or 10 µg (EpdA membranes) of crude membranes. Substrate proteins were detected using monoclonal anti-hemagglutinin antibodies (Sigma).
| RESULTS |
|---|
|
|
|---|
In silico analysis identifies different classes of archaeal proteins with class III signal peptides. (i) Overview of FlaFind output. FlaFind identified 388 proteins in 22 archaeal genomes, 102 of which were annotated as homologs of proteins with predicted functions (Table 1; see Table S1 in the supplemental material). Of these, 77 belonged to classes that had previously been shown to contain class III signal peptides, including 44 flagellins and 33 substrate binding proteins. The majority of the remaining 25 substrates belonged to different classes of extracytoplasmic proteins, including proteases and redox proteins. Only 4 of the 102 substrates were likely cytoplasmic proteins, suggesting that the rate of detection of proteins lacking a signal sequence is low (3.9%).
(ii) Chromosomal localization of FlaFind positives. Bacterial type IV pilin-like structures consist of one major and several minor subunits. Genes encoding these subunits are often found in the same transcriptional unit, which also encodes proteins involved in the biosynthesis of bacterial type IV pilin-like structures (38). Consistent with their presence in cell surface structures, 120 FlaFind positives were predicted to be coregulated with additional FlaFind positives and/or genes coding for a TadA, TadC, and/or a type IV pilin peptidase homolog (Table 1 and Fig. 2; see Fig. S1 and Table S2 in the supplemental material). Moreover, in several cases, structural conservation of operons encoding homologs of FlaFind positives was observed among different organisms (Fig. 2 and Fig. 3). For example, the genes encoding the S. solfataricus FlaFind positives SSO0117 and SSO0118 are in an operon with tadA and tadC homologs, and this feature is conserved in the genomes of the three sequenced Sulfolobus species (Fig. 3). Interestingly, these operons, as well as an operon in each of the sequenced Pyrococcus strains that contained at least two FlaFind positives, were coregulated with an Lhr-like DNA helicase homolog, raising the possibility that these small substrates might be involved in DNA uptake or transfer (Fig. 3).
|
|
(iv) Pfam analysis. Distinct from OrthoMCL, Pfam identifies small highly conserved domains within a protein (see Materials and Methods). To identify possible common themes among the large number of hypothetical proteins, all FlaFind positives were analyzed using Pfam (10). Consistent with the hypothesis that a diverse set of archaeal SBPs contains class III signal peptides, Pfam classified eight additional FlaFind positives as SBPs. Thus, in 14 of the 22 genomes, FlaFind identified at least one SBP, including (among others) sugar, dipeptide, and phosphate binding proteins (see Table S1 in the supplemental material).
Most striking, Pfam identified 19 euryarchaeal proteins with a domain of unknown function (DUF361), which is comprised of the amino acid motif QXSXEXXXL, where Q is the +1 position in the putative cleavage site of these proteins. Frequently, genes encoding DUF361-containing proteins were present in the same operon as genes encoding FlaFind positives, with a slightly varied domain sequence. In these "DUF361-like" domains, the serine was replaced by threonine or alanine, the glutamate was replaced by aspartate, and/or the leucine was replaced by a different hydrophobic amino acid (see Fig. S1 in the supplemental material). All FlaFind positives were screened for the presence of a DUF361-like domain with the modified FlaFind motif 2[KR] 1[GA] +1[Q] +2[X] +3[STA] +4[X] +5[DE]. This analysis revealed an additional 16 proteins, most of which were associated with the DUF361-containing proteins (Fig. 2A; see Table S1 in the supplemental material).
Interestingly, several genes encoding proteins with this conserved domain were found in operon structures, together with a gene encoding a novel subclass of euryarchaeal type IV prepilin peptidases, EppA. EppA, while homologous to FlaK, is substantially larger due to the presence of four additional predicted transmembrane segments (Fig. 2B). The chromosomal localization of eppA homologs and the fact that homologs of this prepilin peptidase were identified only in the eight euryarchaea that encoded DUF361-containing FlaFind positives strongly suggest a role as a specific signal peptidase for this class of preproteins (Table 1; see below).
EppA specifically cleaves proteins with DUF361-like domains. FlaFind identified 14 substrates in M. maripaludis, including 3 flagellins and 10 DUF361-containing proteins. Three of these DUF361-containing proteins were coregulated with eppA (Fig. 2A). A similar operon is found in the genome of Methanocaldococcus jannaschii. However, there it is split and contains additional genes that are unique to the species (Fig. 2A). To determine whether the M. maripaludis EppA homolog specifically cleaves DUF361-containing proteins, either one of two genes encoding proteins with this conserved domain (MMP0233 and MMP0237) or a flagellin (MMP1667/flaB2) was coexpressed in E. coli with eppA from inducible promoters. In addition, the preproteins were coexpressed with flaK, the gene encoding the previously characterized M. maripaludis preflagellin peptidase (8). Processing of either of the two DUF361-containing proteins tested in E. coli could be observed only in cells that coexpressed EppA (Fig. 2C). Conversely, the novel peptidase was not able to cleave the flagellin subunit, strongly suggesting that the requirements for substrate recognition by the two subclasses of type IV pilin peptidases are distinct. Thus, here we will refer to MMP0233 and MMP0237 as EppA-dependent proteins (EpdA and EpdC, respectively).
The distinct substrate recognition characteristics of M. maripaludis FlaK and EppA sites were further demonstrated by an experiment in which the amino acids from positions 2 to +2 in EpdA were replaced with those of FlaB and vice versa (Fig. 2D). EppA was able to process the modified FlaB(RGQI) and did not cleave EpdA(KGAS) and EpdC(KGAS), indicating its requirement for the conserved glutamine at position +1. Conversely, FlaK was still able to cleave FlaB(RGQI), suggesting that it had a much broader cleavage site recognition capability and that its inability to cleave EpdA and EpdC was due to a distinct substrate recognition pattern. Consistent with this, FlaK was unable to process either of the EpdA(KGAS) and EpdC(KGAS) signal peptides despite the presence of the potential FlaK-processing site.
| DISCUSSION |
|---|
|
|
|---|
FlaFind, the program developed as part of this study, effectively identifies archaeal substrates containing class III signal peptides, as the program identified 41 of the 42 predicted archaeal flagellins and 19 of the 20 archaeal proteins containing a DUF361 domain, both classes of proteins that have been shown to be processed by a type IV prepilin-like peptidase (4, 8). In fact, all but one of the 14 M. maripaludis FlaFind positives were flagellins or DUF361-like substrates (3 and 10, respectively) suggesting that, certainly in this archaeon, the program successfully distinguishes class III signal peptides from other N-terminal signal peptides or transmembrane segments.
Consistent with the correct identification of class III signal peptide-containing proteins, the majority of nonflagellin FlaFind positives with annotated functions were SBPs, three of which had been shown experimentally to contain this class of signal peptide (3, 16). Moreover, only four of the annotated proteins were predicted cytoplasmic proteins, indicating that the rate of detection of proteins lacking a signal sequence is less than 4%. While the vast majority of FlaFind positives (75%) were annotated as hypothetical proteins, the chromosomal localization of many of the genes encoding these proteins, as well as the results of sequence homology and pattern searches among the FlaFind positives, strongly support the accurate identification of many of these proteins by FlaFind as class III signal peptide-containing proteins. It should be noted that, while Picrophilus torridus lacks any apparent TadA, TadC, or pilin peptidase homologs, FlaFind identified eight substrates in the archaeon. It is likely that, due to the lack of this peptidase, there is no selective pressure against the presence of a cleavage site-like pattern in secretory signal sequences and they are in fact false positives. Moreover, in different organisms, distinct consensus cleavage sequences may have evolved. For example, in Sulfolobus species, the +2 position is almost exclusively serine, while in other archaea, this position seems less important. Thus, it is unlikely that one will be able to define a perfect "global" consensus sequence for all archaeal class III signal peptides. However, our systematic genomic approach, in concert with additional in silico and in vivo analyses, has proven to yield valuable information about the diversity of predicted archaeal cell surface structures. To facilitate future studies on newly released genomes, the interactive version of FlaFind allows modification of the search pattern.
The substrates identified by FlaFind clustered into several distinct groups, including flagellins, SBPs, DUF361-containing proteins, and orthologous groups of small proteins, such as the Sulfolobus or Pyrococcus FlaFind positives that colocalized with a helicase (Fig. 3). The latter observation is particularly intriguing, as UV-induced exchange of genetic material between cells by a yet-unknown conjugational mechanism has been observed in Sulfolobus acidocaldarius and Haloferax volcanii (18, 26, 35).
Our data not only imply that the majority of FlaFind positives are indeed secreted proteins with N-terminal class III signal peptides, they are also consistent with the hypothesis that substrates with these signal peptides are subunits of cell surface structures, as (i) reminiscent of the coregulation of major and minor bacterial pilins and pseudopilins, genes encoding FlaFind positives were frequently cotranscribed with genes encoding additional FlaFind positives, and (ii) a significant number of FlaFind positives were encoded by genes located on an operon with homologs of genes encoding the pilin assembly components TadA and TadC. Moreover, a large number of FlaFind positives contain a negative charge at position +5, including 35 hypothetical proteins in which this charged amino acid is part of a conserved Pfam domain of unknown function, DUF361. The negative charge in DUF361-like proteins is embedded in a characteristic motif with the consensus sequence [RK][GA]QhShE (amino acid positions 2 to +5, with h being a hydrophobic residue). Similarly, a short characteristic motif was identified at the N terminus of a subclass of type IVb pilins (20). The presence of a negative charge at position +5 is a typical feature of bacterial type IV pilin-like subunits and is required for pilus assembly in Pseudomonas aeruginosa (30, 37). However, the absence of the +5 charge in many of the FlaFind positives does not suggest that these proteins lack the ability to form structures, as most archaeal flagellins do not possess a charge at this position. Distinct from poorly conserved amino acid sequences in the hydrophobic stretches of Tat and class I/II Sec signal sequences, the sequence of the hydrophobic stretch in the signal sequences of archaeal flagellins is highly conserved. This has also been observed for type IV pilins (12) and presumably allows optimal subunit-subunit interaction (11, 29). Consistent with the requirement for a highly conserved N-terminal hydrophobic "assembly domain," several orthologous groups of FlaFind positives and substrates encoded by genes that cluster together share substantial sequence homology at their N termini (see Fig. S1 in the supplemental material).
Finally, our in vivo processing studies clearly demonstrated that DUF361-containing FlaFind positives were specifically cleaved by the novel subclass of prepilin peptidases, EppA, and that part of the "domain of unknown function" 361 is required for substrate recognition. While the involvement of FlaK in flagellum biogenesis has been demonstrated (7), future studies will reveal the specific function of EppA in the assembly of putative extracytoplasmic structures. However, it is tempting to speculate that the additional membrane-spanning segments in EppA are required for a function of this enzyme other than substrate cleavage, such as interaction with proteins involved in pilus assembly. Alternatively, the enzyme might exhibit an additional activity similar to bacterial prepilin peptidases that methylate the N terminus of the cleaved substrate. Also, the colocalization of eppA and substrate genes suggests coregulation. Recently, a second prepilin peptidase (FppA) was described in Pseudomonas aeruginosa (15). FppA is specific for a subclass of type IV pilins from the same organism, and it does not cleave the major pilin PilA, which in turn is a substrate of PilD. It is intriguing that two completely unrelated organisms apparently developed similar strategies to distinguish between various classes of pilin-like substrates.
The study described here opens many new directions for structural and genetic studies of archaeal extracytoplasmic structures. Future in vivo studies of the native archaeal hosts should provide validation of additional substrates (allowing refinement of the program) and identify additional archaeal components involved in the biosynthesis of extracellular structures, like the novel prepilin peptidase EppA. Additionally, data presented here raised countless intriguing questions, including the following. (i) What is the advantage of assembling SBPs into proposed cell surface structures? (ii) What are the functions of the FlaFind-positive hypothetical proteins? (iii) How do the distinct structures assemble? (iv) What is the significance of having two subclasses of prepilin peptidases if (as proposed earlier) cleavage of substrates occurs independently of their assembly?
| ACKNOWLEDGMENTS |
|---|
Support for this work was provided to M.P. by a National Science Foundation grant (reference no. MCB-0239215), to J.C.K. by an NIH/Fogarty International Center grant (reference no. 5D43TW007012-03), to S.-V.A. by VENI and VIDI grants from the Dutch Organization for Scientific Research (NWO), and to A.J.M.D. by the Van der Leeuw Programme of the Earth and Life Sciences Foundation (ALW), which is subsidized by NWO.
| FOOTNOTES |
|---|
Published ahead of print on 17 November 2006. ![]()
Supplemental material for this article may be found at http://jb.asm.org. ![]()
Z.S. and A.O.S. contributed equally to this work. ![]()
| REFERENCES |
|---|
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Appl. Environ. Microbiol. | Infect. Immun. | Eukaryot. Cell |
|---|---|---|
| Mol. Cell. Biol. | J. Virol. | Microbiol. Mol. Biol. Rev. |
| ALL ASM JOURNALS |