Identification of Functional Tat Signal Sequences in Mycobacterium tuberculosis Proteins

ABSTRACT The twin-arginine translocation (Tat) pathway is a system used by some bacteria to export proteins out from the cytosol to the cell surface or extracellular environment. A functional Tat pathway exists in the important human pathogen Mycobacterium tuberculosis. Identification of the substrates exported by the Tat pathway can help define the role that this pathway plays in the physiology and pathogenesis of M. tuberculosis. Here we used a reporter of Tat export, a truncated β-lactamase, ′BlaC, to experimentally identify M. tuberculosis proteins with functional Tat signal sequences. Of the 13 proteins identified, one lacks the hallmark of a Tat-exported substrate, the twin-arginine dipeptide, and another is not predicted by in silico analysis of the annotated M. tuberculosis genome. Full-length versions of a subset of these proteins were tested to determine if the native proteins are Tat exported. For three proteins, expression in a Δtat mutant of Mycobacterium smegmatis revealed a defect in precursor processing compared to expression in the wild type, indicating Tat export of the full-length proteins. Conversely, two proteins showed no obvious Tat export in M. smegmatis. One of this latter group of proteins was the M. tuberculosis virulence factor phospholipase C (PlcB). Importantly, when tested in M. tuberculosis a different result was obtained and PlcB was exported in a twin-arginine-dependent manner. This suggests the existence of an M. tuberculosis-specific factor(s) for Tat export of a proven virulence protein. It also emphasizes the importance of domains beyond the Tat signal sequence and bacterium-specific factors in determining if a given protein is Tat exported.

In bacteria, protein export across the cytoplasmic membrane represents the first step in the delivery of proteins to the cell envelope or extracellular space. Two conserved systems are responsible for the majority of this protein export: the general secretion (Sec) and the twin-arginine translocation (Tat) pathways (for reviews, see references 31 and 35). Both systems export proteins that are synthesized as precursors with aminoterminal signal sequences. In both cases, the signal sequences are comprised of a tripartite structure: a charged amino-terminal region, a hydrophobic region, and a carboxy-terminal region containing a signal peptidase cleavage site (31,62). With most exported proteins, the signal sequence is cleaved from the precursor during or immediately after translocation, which liberates the mature exported substrate.
A feature that distinguishes Tat signal sequences from Sec signal sequences is the presence of a consensus twin-arginine motif, which is defined as S/T-R-R-x-F-L-K (5). The arginine dipeptide, RR, in the motif is a major targeting determinant of the signal sequence as shown by conservative replacement of "RR" with a lysine pair, KK, preventing Tat export of proteins (13,14,25,54). Computational Tat signal sequence prediction programs, based on sequence and structural conservation, have been developed (4,45,52). These programs are valuable for identifying Tat substrates that adhere to the consensus motif; however, they cannot account for species-specific differences, unless modified to do so, and it remains to be established how useful they are for comprehensive identification of Tat-exported proteins. There are two Tat-exported proteins known to exist in nature that lack the twin arginines (26,27). These exceptions may be members of a larger group of yet-to-be identified Tat proteins that rely on features other than the twin-arginine motif for Tat export.
Another distinguishing feature of the Tat export pathway is that Tat substrates are translocated across the membrane in a folded state, with folding being a prerequisite for Tat export (15). Some Tat substrates require cytoplasmic chaperones for export. These chaperones may be specific to one Tat substrate, or they can have a more general effect (23,28,(38)(39)(40). In these cases, it is thought that chaperones function in folding substrates or targeting them to the membrane-localized Tat translocase complex once folding is complete. The Tat translocase is composed of TatA, TatB, and TatC proteins, although not all bacteria with functional Tat export systems have TatB.
The Tat pathway is present in many, but not all, bacteria. In several bacterial pathogens, the Tat pathway plays an important role in exporting virulence factors (9,10,17,30,36,42,46,60). Mycobacterium tuberculosis is the bacterial pathogen re-sponsible for tuberculosis, which kills 1.8 million people a year (64). Mycobacteria have a functional Tat pathway. In the fastgrowing nonpathogenic species Mycobacterium smegmatis, mutants lacking tatA, tatB, or tatC genes have multiple phenotypes including slow growth on agar and sensitivity to ␤-lactam antibiotics (34,41). The latter phenotype is attributed to a failure to export the chromosomally encoded ␤-lactamase. In both M. smegmatis and M. tuberculosis the endogenous ␤-lactamases possess Tat signal sequences. ␤-Lactamases, which destroy cell wall-targeting ␤-lactam antibiotics, must be exported to protect bacteria from the drugs. In M. tuberculosis it has not been possible to construct tat mutants (47). This indicates that, in pathogenic M. tuberculosis, the Tat pathway is essential under standard laboratory conditions. Without an M. tuberculosis tat mutant, there are fewer approaches available for identifying Tat-exported proteins and studying the significance of Tat export in this pathogen.
In this study, we used the M. tuberculosis ␤-lactamase (BlaC) as a reporter to identify M. tuberculosis proteins that possess functional Tat signal sequences. A truncated ЈBlaC, lacking its endogenous signal sequence, is not exported and is unable to protect a mycobacterial ␤-lactam-sensitive mutant (M. smegmatis ⌬blaS or M. tuberculosis ⌬blaC strain) from the ␤-lactam antibiotic carbenicillin (34). When a signal sequence from a Tat-exported M. tuberculosis protein is fused to ЈBlaC, the hybrid protein is exported and confers carbenicillin resistance on ⌬bla mutant mycobacteria. Exported ЈBlaC fusion proteins can be identified by direct selection of drug-resistant colonies on agar containing carbenicillin. Importantly, the ЈBlaC reporter is Tat specific. It works only when fused to Tat signal sequences and requires both the twin-arginine motif and a functional Tat pathway (34).
Using an M. tuberculosis genomic library constructed upstream of the ЈblaC reporter, we identified signal sequences capable of exporting ЈBlaC in a Tat-dependent manner. In addition to the demonstrated virulence factor phospholipase C (PlcB) (29,43), we identified proteins with potential roles in carbohydrate and lipid metabolism, copper homeostasis, cell envelope maintenance, and nutrient import. The proteins identified included one lacking a twin-arginine dipeptide and one not predicted by in silico analysis. We also investigated full-length versions of a subset of the proteins identified. Importantly, full-length PlcB was exported and was twin arginine dependent when expressed in its native host, M. tuberculosis. However, when expressed in M. smegmatis this M. tuberculosis protein did not appear to be exported. This suggests the existence of an M. tuberculosis-specific factor(s) that is required for Tat export of a proven virulence protein.

MATERIALS AND METHODS
Bacterial strains and culture methods. Bacterial strains used during this work are listed in Table 1. Luria-Bertani (LB) medium (Fisher) was used for culturing of Escherichia coli. Middlebrook 7H9 or 7H10 medium (Difco; BD Biosciences) was used for the culturing of M. smegmatis and M. tuberculosis. For M. smegmatis, Middlebrook medium was supplemented with 0.5% glycerol and 0.2% dextrose. For M. tuberculosis, Middlebrook medium was supplemented with 0.5% glycerol and 1ϫ ADS (0.5% bovine serum albumin, fraction V [Roche]; 0.2% dextrose; and 0.85% NaCl). When necessary, medium was supplemented with 0.05 to 0.1% Tween 80 (Fisher). As required, antibiotics were added to Middlebrook media at the indicated concentrations: hygromycin B (Roche Applied Science), 50 g/ml; carbenicillin (Sigma), 50 g/ml; and kanamycin (Acros Chemicals), 20 g/ml.
L-Lysine at 40 and 80 g/ml was added to agar and liquid media, respectively, for growth of M. smegmatis strains PM759 and JM578.
Molecular biology procedures. Standard molecular biology techniques were employed (48). The Expand High-Fidelity PCR system (Roche) was used in all PCRs, and 5.0% dimethyl sulfoxide was included in select reactions. DNA sequencing was performed either by the UNC-CH Automated DNA Sequencing Facility (Chapel Hill, NC) or by Eton Bioscience Inc. (San Diego, CA).
Construction of BlaC reporter libraries. All plasmids used in this study are listed in Table 2. Two library plasmids were constructed with a truncated version of M. tuberculosis blaC (referred to as ЈblaC), amplified from pJM106 by PCR using the primers LibЈblaCfor and LibЈblaCrev (see Table S1 in the supplemental material).
(i) Library 1. The resulting ЈblaC amplicon was ligated into the multicopy mycobacterial shuttle vector pMV206.hyg, which had been digested with ClaI and NcoI. The final plasmid was pJES113 (Fig. 1A). Genomic DNA was isolated from M. tuberculosis strain H37Rv as previously described (6) and partially digested with AciI and HpaII, and digests with DNA fragments between 0.5 and 5.0 kbp were selected. The genomic digest was cloned into the unique ClaI site immediately upstream of ЈblaC in pJES113. The resulting ligation reaction mixture was transformed into E. coli XL1-Blue (Stratagene). Approximately 1 ϫ 10 6 hygromycin-resistant E. coli transformants were pooled for plasmid DNA isolation (Qiagen).
(ii) Library 2. The second library vector, pJM157, was constructed to carry a mycobacterial promoter upstream of the unique ClaI site. A fragment carrying the promoter and the ϩ1 transcriptional start site from M. tuberculosis hsp60 was amplified by PCR from pMV261.hyg using the primers Hsp60for-BstBI and Hsp60rev2-ClaI (see Table S1 in the supplemental material). The resulting PCR product was ligated into pJES113, which had been linearized with ClaI, to produce pJM157 (Fig. 1B). For construction of library 2, genomic DNA was isolated from M. tuberculosis strain PM638 (⌬blaC). Digestion of genomic DNA and ligation into the single ClaI site upstream of ЈblaC in pJM157 were conducted as described above. After electroporation into E. coli DH5␣ (Invitrogen), approximately 8 ϫ 10 5 CFU were pooled and used to isolate plasmid DNA (Qiagen).
Selection of exported BlaC fusions. Library DNA was electroporated into M. smegmatis ⌬blaS strain PM759 (6,18). The resulting transformants were plated onto 7H10 agar medium without Tween, containing 40 g/ml lysine (Fischer Scientific), 50 g/ml hygromycin, and carbenicillin at concentrations that ranged from 35 to 75 g/ml, and incubated at 37°C for a minimum of 4 days. The drug resistance of colonies that grew up on 7H10 agar with hygromycin and carbenicillin was confirmed by spot test analysis on 7H10 agar plates containing (i) both 50 g/ml hygromycin and 45 g/ml carbenicillin and (ii) 50 g/ml hygromycin only. Strains were further analyzed if spots revealed confluent growth on plates containing hygromycin and carbenicillin in comparison to the negative-control strain (⌬blaS M. smegmatis with plasmid pJM113, which carries a promoterless ЈblaC gene) (34).
Recovery of blaC fusion plasmids. Plasmid DNA was transferred from carbenicillin-and hygromycin-resistant ⌬blaS M. smegmatis to E. coli DH5␣ by electroduction (2). A small amount of M. smegmatis was transferred from a  Table S1 in the supplemental material). Anti-BlaC antibody. A six-histidine-tagged copy of M. tuberculosis BlaC was expressed from Y49 pTrcHisB-BlaC in E. coli DH5␣ (provided by Doug Ker-nodle) and purified by nickel affinity chromatography using HIS-Select nickel affinity gel (Sigma) as described previously (59). Purified BlaC was eluted from the nickel column with 300 mM imidazole at a concentration of 1.3 mg/ml and used to immunize rabbits together with TiterMax Gold adjuvant (Sigma). Rabbit immunizations and polyclonal antiserum collection were carried out by the Bio-Source Custom Immunology Department (Hopkinton, MA).
Immunoblot analysis. Whole-cell lysates of M. smegmatis and whole-cell lysates of formalin-killed M. tuberculosis were prepared as described previously (7,19,33) and analyzed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and immunoblotting. Polyclonal BlaC antiserum was used in immunoblot analysis at a dilution of 1:10,000, and polyclonal 19-kDa antiserum (provided by Douglas Young) was used at a dilution of 1:40,000. Anti-rabbit peroxidase-conjugated antibodies (Bio-Rad) were used as secondary antibodies for both anti-BlaC and anti-19-kDa antiserum. Both monoclonal hemagglutinin (HA) antiserum (Covance) and monoclonal GroEL HAT5/IT-64 antiserum (World Health Organization collection) were used at a dilution of 1:20,000, and anti-mouse peroxidase-conjugated antibodies were used as secondary antibodies (Bio-Rad).

Construction of specific M. tuberculosis-BlaC fusions.
The nucleotides encoding the signal sequences of the Rv2041c and Rv2525c genes were PCR amplified using the primers rv2041ssF and 2041ssR or 2525F and 2525R, respectively (see Table S1 in the supplemental material). Both the forward and reverse primers encoded a BstBI site, and the forward primers also included the Shine-Dalgarno sequence from M. tuberculosis hsp60. The amplified fragments were ligated into pCR2.1 (Invitrogen), digested with BstBI, and then ligated into ClaI-digested pJM157, upstream of ЈblaC.
Construction of expression constructs for HA-tagged M. tuberculosis proteins. (i) hsp60 promoter-driven HA fusions: Rv0774c-HA, ⌬ssRv0774c-HA, Rv2843-HA, and ⌬ssPlcB-HA. Oligonucleotide primers were designed to amplify fulllength or truncated genes from M. tuberculosis genomic DNA. Forward and reverse primers were designed each with 5Ј extension sequences carrying MscI and HindIII restriction sites, respectively (see Table S1 in the supplemental material). The resulting PCR product was first cloned into pCR2.1 (Invitrogen) and sequenced. The cloned gene was then digested with MscI and HindIII, and the appropriate fragment was isolated and ligated into the mycobacterial shuttle vector pJSC77 (21), which was digested with MscI and HindIII and which carries the hsp60 promoter and multiple cloning site upstream of a C-terminal HA tag ( Table 2). Due to an MscI site within the Rv2843 gene, the strategy was revised in this instance and an NruI site was included instead of MscI on the forward primer (see Table S1 in the supplemental material).
(ii) Native promoter-driven HA fusions: PlcB-HA and Rv0315-HA. Oligonucleotide primers were designed to amplify a fragment of M. tuberculosis genomic DNA which included the full-length gene of interest and upstream sequence containing the putative native promoter. Forward primers were designed each with 5Ј extension sequences carrying XbaI and HindIII restriction sites, respectively (see Table S1 in the supplemental material). The resulting PCR product was first cloned into pCR2.1 (Invitrogen) and sequenced. The cloned gene and promoter were then digested with XbaI and HindIII, and the appropriate fragment was isolated and ligated into the mycobacterial shuttle vector pJSC77, which had been digested with XbaI and HindIII, which removes the hsp60 promoter (Table 2).
(iii) PlcB(KK)-HA. The construct in which the codons for the twin-arginine pair of PlcB were mutated to encode lysines (KK) was generated as follows. The PCR primers plcBKKfor2 and plcBKKrev3 overlapped at the site of mutation and were used in inverse PCR to generate a product from pJM199. The resulting PCR product was then end repaired using the EndIt kit (Epicentre) following the manufacturer's instructions and self-ligated to create pJM216. Subcellular fractionation. Cell wall, membrane, and soluble fractions were prepared by differential ultracentrifugation as described previously (19,44). Briefly, 100-ml cultures of M. tuberculosis were harvested by centrifugation at 3,000 ϫ g. Cell pellets were sterilized by gamma irradiation in a JL Shephard Mark I 137Cs irradiator (Department of Radiobiology, University of North Carolina at Chapel Hill) with a dose of 2.4 megarads. All subsequent steps were performed at 4°C. Pellets were resuspended in 4 ml of breaking buffer (phosphate-buffered saline, 1 mM phenylmethylsulfonyl fluoride, 0.6 g/ml each of DNase and RNase, and a cocktail of protease inhibitors [2 g/ml each of aprotinin, E-64, leupeptin, and pepstatin A and 100 g/ml Pefabloc SC]) and then lysed in a French pressure cell. Unbroken cells were pelleted at 3,000 ϫ g for 20 min to generate a clarified whole-cell lysate, which was centrifuged at 27,000 ϫ g for 30 min to pellet the cell wall. The supernatant was centrifuged at 100,000 ϫ g for 2 h to separate the membrane fraction from the soluble fraction. The cell wall and membrane fractions were washed once and then resuspended in phosphatebuffered saline.
Bioinformatic identification of putative twin-arginine signal sequences. and selected only those proteins that had a predicted tripartite signal peptide with a Tat motif according to the prediction program. We also reviewed the list of predicted Tat-exported proteins of M. tuberculosis defined by the TigrFAM motif (TIGR01409) (52). This list was obtained directly from the TIGR website (http://cmr.tigr.org/tigr-scripts/CMR/GenomePage.cgi?databaseϭntmt02).

Selection of exported M. tuberculosis ORF-BlaC fusions.
A truncated M. tuberculosis ␤-lactamase ЈBlaC, lacking its native signal sequence, can be used in mycobacteria as a reporter of export (34). A special feature of the ЈBlaC reporter is that it works only when exported by the Tat pathway. In a ␤-lactamsensitive background, such as the M. smegmatis ⌬blaS strain, exported ЈBlaC fusion proteins can be detected by their ability to promote growth in the presence of the ␤-lactam antibiotic carbenicillin.
To experimentally identify M. tuberculosis ORFs with functional Tat signal sequences, we constructed genomic M. tuberculosis libraries upstream of the truncated ЈblaC reporter. Two libraries were constructed in multicopy vectors, pJES113 and pJM157, each of which carries truncated ЈblaC immediately downstream of a unique ClaI cloning site (Fig. 1). The difference between the vectors is that pJM157 contains the mycobacterial hsp60 promoter upstream of the ClaI site to drive expression from genomic fragments that lack a promoter. The hsp60 sequence in pJM157, however, does not include a Shine-Dalgarno site or start codon; these elements must be provided by the genomic insert. For library 1, M. tuberculosis genomic DNA was prepared from wild-type strain H37Rv, and for library 2, the genomic DNA was prepared from the ⌬blaC mutant of M. tuberculosis. In both cases the genomic DNA was cut with ClaI-compatible endonucleases for ligation into the vectors.
The libraries were electroporated into the ⌬blaS mutant of M. smegmatis and directly plated on 7H10 medium containing carbenicillin. Plasmids expressing exported fusion proteins were selected by their ability to promote growth in the presence of carbenicillin. For library 1, 101 carbenicillin-resistant With one notable exception, all plasmids sequenced revealed an in-frame fusion with ЈblaC. The exception was plasmids in which the full-length M. tuberculosis blaC gene was cloned. In these cases, the blaC insert did not need to be cloned in frame with the reporter sequence on the plasmid. BlaC was identified in 50% of the carbenicillin-resistant clones from library 1. In library 2 this problem was avoided by using genomic DNA from ⌬blaC M. tuberculosis. From the two libraries, we identified amino-terminal sequences of 10 unique M. tuberculosis proteins that promote export of the ЈBlaC reporter (Table 3). To confirm that the fusion proteins identified were exported in a Tat-dependent manner, we rescued the fusion plasmids and electroporated them into a double ⌬blaS ⌬tatA M. smegmatis mutant. This allowed us to test for export in the absence of a functional Tat pathway. All 10 fusions that conferred carbenicillin resistance in a ⌬blaS mutant background failed to confer carbenicillin resistance in the ⌬blaS ⌬tatA strain. This indicated that all the fusion proteins identified require the Tat pathway to export functional ЈBlaC.
Direct testing of candidate M. tuberculosis Tat signal sequences. The M. tuberculosis sequences fused to ЈBlaC in the 10 active fusions were all predicted to contain signal sequences, according to the Signal P 3.0 prediction algorithm ( Fig. 2A) (3). Evaluation of the exported fusions revealed that the junction with ЈBlaC always occurred close to the predicted signal sequence cleavage site of the M. tuberculosis protein. The greatest distance that we observed between a predicted cleavage site and the ЈBlaC fusion junction was 34 amino acids. This revealed a requirement for an appropriately positioned restriction enzyme site near the cleavage site in order to identify a Tat signal sequence in our libraries. It also suggested that some proteins may have been missed for this reason. For example, in previous work we showed that the PlcA signal sequence promotes Tat export of the ЈBlaC reporter (34), but PlcA was not identified in the libraries. From genome gazing and bioinformatic predictions (discussed below), we selected Rv2041c and Rv2525c as candidate Tat substrates that may have been missed. Construction and direct testing of ssRv2041c-ЈBlaC and ssRv2525c-ЈBlaC fusion proteins revealed that both confer resistance to ␤-lactam in a Tat-dependent manner. This provides experimental validation of these additional M. tuberculosis Tat signal sequences (Fig. 2B).
Alignment of functional Tat signal sequences identified with the BlaC reporter. From studies largely conducted with E. coli, the generally accepted twin-arginine consensus motif is S/T-R-R-x-F-L-K (x is any polar amino acid). The twin arginines are nearly always invariant, and the frequency of occurrence of the other amino acids is reported to exceed 50% (5,31,39). Amino acid alignment of the M. tuberculosis signal sequences that we identified revealed that all but one contained the twin-arginine dipeptide (Fig. 2). The exception was the Rv0063 signal sequence, which has a glutamine in the position where the second arginine would be (R-Q-T-F-L). In terms of the other amino acids in the consensus, all were present in Ն40% of the sequences except for the final lysine (K). The alignment also revealed conservation of a hydrophobic residue just prior to the S/T amino acid in the consensus.
Comparison to in silico-predicted Tat signal sequences. Multiple Tat signal prediction programs exist, but their ability to accurately and comprehensively identify Tat substrates within mycobacteria is unresolved. We applied two of these web-based programs, TatP v.1.0 (4) and TATFIND v1.4 (45), to the M. tuberculosis H37Rv genome sequence and compared the output to Tat signal sequences predicted by a third prediction program devised by The Institute for Genomic Research (TIGR) (TIGR01409) (52). Of the 4,056 ORFs in the M. tuberculosis H37Rv genome (11), 95 were predicted to encode proteins with Tat signal sequences by at least one of the prediction programs (see Table S2 in the supplemental material). There is surprisingly limited overlap between the algorithms, with only 11 proteins being predicted by all three programs (Fig. 3A).
We next compared the signal sequences that we identified experimentally (Table 3; Fig. 2) to the in silico predictions of the annotated M. tuberculosis genome. Eight of the proteins that we identified were predicted by all three programs, and three were predicted by two programs (Fig. 3B). Of the remaining two signal sequences, one was Rv0063, which lacks the twin arginine and was identified only by the TigrFAM prediction program, and the other was LprQ, which was not predicted by any program. Assessment of Tat dependence for full-length proteins with functional Tat signal sequences. In addition to there being a requirement for a Tat signal sequence, the mature domain also plays a role in whether a protein is a Tat substrate. For this reason, we tested full-length versions of a subset of the proteins that we identified to see if they were Tat dependent.
Signal sequence cleavage is commonly used as an indicator of protein export (37). To establish the utility of this approach for monitoring export of M. tuberculosis Tat substrates expressed in M. smegmatis, we first assayed signal sequence cleavage of full-length BlaC. Polyclonal antibodies were raised against BlaC and used to detect BlaC expressed in whole-cell lysates of ⌬blaS M. smegmatis by immunoblotting (Fig. 4). In the presence of a functional Tat apparatus, BlaC is observed as a predominant band running at 30 kDa, which is the predicted size of the exported and processed mature species. Expression of BlaC(KK), which has a KK substitution for the RR dipeptide, produced a slower-migrating species, which is consistent with a lack of export and accumulation of full-length unprocessed BlaC precursor (Fig. 4). In the absence of a functional Tat pathway (in a ⌬blaS ⌬tatA mutant), a larger presumptive precursor species was observed, although some smaller mature protein was detected as well. These experiments indicated that the protein species observed for each of these strains was a good indicator of Tatand twin-arginine-dependent export.
We then tested full-length versions of four additional proteins identified with the ЈBlaC reporter in wild-type and ⌬tatC M. smegmatis. Proteins were expressed as a C-terminal fusion to the HA epitope tag, and immunoblot analysis was performed on whole-cell lysates. For Rv0315 and Rv2843, Tatdependent processing was observed in M. smegmatis, which is indicative of the full-length proteins being Tat exported. A lower-molecular-weight and presumably processed form of the protein was seen when expressed in the wild type, but this species was significantly reduced in abundance in the ⌬tatC mutant (Fig. 5A). Three bands were observed for Rv0315-HA expressed in wild-type M. smegmatis. The highest-molecular-mass band is similar in size (ϳ34 kDa) to that of the predicted Rv0315 precursor and is present in both wild-type and ⌬tatC strains. The two smaller protein species are absent or greatly reduced in the ⌬tatC mutant background. This suggests that Rv0315 is subject to multiple processing events. For the other two full-length proteins tested, no obvious Tat dependence was observed in immunoblots of M. smegmatis whole-cell lysates (Fig. 5B). One of these proteins was the virulence factor PlcB-HA. The protein species seen in the wild-type and ⌬tatC M. smegmatis strains migrated at the predicted precursor size of 57 kDa. Moreover, compared to a truncated form of PlcB-HA lacking the predicted signal sequence (⌬ssPlcB-HA), the full-length product that we ex-pressed migrated slower than the expected mature product did (Fig. 5B). Similar results were obtained with Rv0774c-HA and a truncated ⌬ssRv0774c-HA protein expressed in M. smegmatis. This suggested that these two predicted Tat substrates were not being processed or exported when expressed in M. smegmatis.
Full-length PlcB is exported by the Tat pathway when expressed in its native host, M. tuberculosis. Since M. smegmatis does not have phospholipase C homologs, we considered the possibility that PlcB-HA can be exported only by its native host, M. tuberculosis. To test this idea, we expressed full-length PlcB-HA in M. tuberculosis H37Rv and assayed whole-cell lysates by immunoblot analysis. Unlike what was observed in M. smegmatis, immunoblot assays for PlcB-HA in M. tuberculosis whole-cell lysates yielded two products: a larger species that migrated like the full-length precursor seen in M. smegmatis and a smaller species that migrated like the expected mature ⌬ssPlcB-HA product (Fig. 6A). This suggested that in M. tuberculosis, PlcB-HA is exported and processed. Subcellular fractions prepared from the M. tuberculosis strain were used to show that the observed faster-migrating product was exported. Soluble, membrane, and cell wall fractions were prepared from whole-cell lysates of H37Rv expressing PlcB-HA and analyzed by immunoblotting (Fig. 6B). Of the two protein species seen in the whole-cell lysate, the larger product, the presumptive nonexported precursor, was in the soluble cytosolic fraction and the smaller product was primarily in the cell wall. Thus, when expressed in M. tuberculosis the PlcB-HA protein was processed and exported to the cell wall. As controls for the fractionation, we showed that the GroEL protein and the 19-kDa lipoprotein were enriched in the soluble and cell envelope (cell wall and membrane) fractions, respectively.
The essential nature of the Tat pathway in M. tuberculosis precludes testing full-length PlcB-HA in an M. tuberculosis ⌬tat mutant. To address whether full-length M. tuberculosis PlcB is a Tat substrate, the RR pair in the PlcB signal sequence was replaced with KK. When the PlcB(KK)-HA protein was expressed in M. tuberculosis, a single precursor-sized species was observed (Fig. 6C). Together, these experiments demonstrated that PlcB is a Tat substrate exported to the cell wall in M. tuberculosis but not in M. smegmatis.

DISCUSSION
The Tat pathway is important to the virulence and physiology of several bacterial pathogens. Because it is absent in mammals, it has been considered a target for development of new antibacterial agents (8,31). Here we used ЈBlaC as a reporter to identify proteins with functional Tat signal sequences and to begin understanding the role that Tat export plays in M. tuberculosis. This approach allows for the experimental investigation of Tat export in a system where obtaining a ⌬tat mutant is not feasible. ЈBlaC is not the only recognized Tat-specific reporter (14,63), but it is the only such reporter that works in a direct selection, as opposed to a screen. This property of ЈBlaC allowed us to exploit it on a genome-wide level. Previously, Tat reporters have been used only to validate preselected candidates (49,57,58,63).
All active ЈBlaC fusions obtained from our libraries were in frame with an ORF, had predicted signal sequences, and were confirmed to be exported in a Tat-dependent manner. This attests to the power of the ЈBlaC reporter. Our objective was to comprehensively identify the Tat signal sequences of M. tuberculosis. Our two libraries were composed of ϳ2 ϫ 10 6 plasmids combined, which is sufficiently complex to have Ͼ99% probability of every M. tuberculosis ORF being represented as a single in-frame ЈblaC fusion (48). However, the unanticipated requirement for a fusion junction to be positioned just beyond the signal sequence cleavage site means that, despite this level of complexity, some proteins were not picked up. This restriction most likely reflects extended sequences of a mature protein preventing ЈBlaC folding into an active conformation compatible with Tat export. Future libraries could be improved by increasing the complexity and using smaller randomly sheared genomic DNA inserts to overcome limitations that include the lack of available restriction enzyme cleavage sites.
Of the 13 Tat signal sequences identified with the reporter, many are in proteins (Rv0063, Rv0315, Rv0774c, Rv2525c, BlaC, and PlcB) previously shown to be secreted or cell wall associated by proteomics (22,24,32,43,47,51). Six of them (Rv0846c, Rv2041c, Rv2843, BlaC, LprQ, and UgpB) are predicted lipoproteins with a lipobox motif in their signal sequence ( Fig. 2A), which predicts cell envelope localization (56). Recently, in the archaeon Haloferax volcanii it was shown that lipoproteins can be Tat substrates (20). BlaC, PlcA, and PlcB are the only proteins that we identified that have demonstrated functions (␤-lactamase and phospholipase C activities, respectively) (29,43,59). The Plc proteins are also shown to function in M. tuberculosis pathogenesis. Simultaneous deletion of multiple plc genes results in attenuated virulence of M. tuberculosis in a mouse model of infection (43).
Much less is known about the remaining Tat signal sequence-containing proteins that we identified (Table 3). Sequence analysis suggests that these proteins have diverse functions as lipases (Rv0519c and Rv0774c), a copper oxidase (Rv0846c), a glycosyl hydrolase (Rv0315), an oxidoreductase (Rv0063), and substrate-binding proteins of sugar uptake systems (UgpB and Rv2041c). Rv2525c is one of the proteins identified with no predicted function. On the basis of bioinformatic predictions, this protein was previously hypothesized to be an M. tuberculosis Tat substrate and, because of coregulation with other genes, was suggested to have a role in cell wall biogenesis (47). Both UgpB and Rv2041c are predicted by transposon saturation mutagenesis to be essential to M. tuberculosis (50), providing a potential clue as to why the Tat pathway cannot be inactivated in M. tuberculosis.
An advantage of using a genetic reporter to directly identify functional Tat signal sequences is that there is no imposed requirement that conserved sequence elements, as defined by other studies, need be present. In this regard, the functional Tat signal sequences that we identified should be useful for refining Tat prediction algorithms. Most of the sequences that we experimentally identified were predicted by at least two of three Tat signal sequence prediction programs consulted. This suggests that the common core elements of these programs are the best predictors. The signal sequence of Rv0063, which possesses RQ in the Tat motif, was the only one of our experimentally identified sequences lacking a twin-arginine dipeptide. Our identification of Rv0063 is consistent with the recent demonstration that its signal sequence can direct Tat export of an agarase reporter when expressed in Streptomyces lividans   (26,27). In addition, an RQ substitution in the Tat signal sequence of the E. coli TorA protein is able to promote Tat export of a green fluorescent protein reporter (14). Interestingly, the TigrFAM program predicted Rv0063 to have a Tat signal sequence (52). We also identified one protein, LprQ, which was not identified by any of the three prediction programs. LprQ was likely missed due to its extended N terminus upstream of the twin-arginine motif ( Fig. 2A). However, it is also possible that the true start codon of LprQ is incorrectly annotated, as there are additional GTG codons between the annotated start and the twin-arginine motif.
In addition to having an amino-terminal Tat signal sequence, there are features of the mature domain of a protein, which must be folded, that determine whether it can be translocated by the Tat pathway (15). Recently, some putative Tat signal sequences of E. coli were shown to be able to promote export of a fused reporter through Sec or Tat pathways, depending on the unfolded or folded nature of the reporter (58). These apparently promiscuous signal sequences highlight the importance played by the mature domain of a protein. The basic question of how often it is that a functional Tat signal sequence is present on a bona fide Tat substrate has only begun to be investigated (58). For this reason, we examined full-length versions of a subset of proteins that we identified in wild-type and ⌬tat strains of M. smegmatis. Three proteins (BlaC, Rv0315, and Rv2843) showed a Tat-dependent effect, which indicates that the native proteins are subject to Tat export. For Rv0315-HA, the immunoblot analysis revealed three protein species in whole-cell lysate of wild-type M. smegmatis and two species in lysate of tat mutant M. smegmatis (Fig.  5A). A homologous ␤-glucanase of Bacillus circulans is subject to progressive proteolytic processing postexport (61). Similar proteolytic processing of Rv0315 would explain the multiple species seen by immunoblotting.
For two other M. tuberculosis proteins tested, there was no difference in the protein species seen in whole-cell lysates from wild-type and ⌬tat strains of M. smegmatis (Fig. 5B). In these cases, the full-length protein species observed was larger than the predicted mature protein. Thus, no signal sequence cleavage or obvious export occurred when full-length versions of these proteins were expressed in M. smegmatis.
The fact that PlcB-HA was unaffected in an M. smegmatis tat mutant was surprising since the Plc proteins are demonstrated to be exported to cell wall fractions of M. tuberculosis (43). Moreover, virtually all phospholipase C orthologs in a wide variety of organisms have predicted Tat signal sequences (16), and there are bacterial Plc enzymes proven to be exported by the Tat pathway (9,36,46,60). In contrast to what was seen in M. smegmatis, when expressed in M. tuberculosis two species of PlcB-HA were evident by immunoblot analysis. The faster of the two species was exported to the cell wall and migrated at the same molecular weight as did ⌬ssPlcB-HA, the expected size of processed mature PlcB. A KK substitution for the twinarginine motif of full-length PlcB prevented its processing in M. tuberculosis. These results demonstrate that in its native host PlcB is an authentic twin-arginine-dependent protein.
These data provide an important link between the Tat pathway and virulence factor export in M. tuberculosis.
The above result is also interesting in suggesting that Tat export of an M. tuberculosis virulence factor requires a pathogen-specific component. There are a small number of Tat substrates shown to require dedicated chaperones for export (28,38,39). There are also examples of chaperones, such as DnaK, that work with a broader collection of Tat substrates (23,40). Perhaps the pathogen-specific factor is a chaperone for PlcB that is present in M. tuberculosis but absent in M. smegmatis. Although the exact function of these chaperones remains to be discerned, proposals for how they work include promoting proper folding, protecting the Tat substrate from degradation or delivery to the translocon before folding is complete, and delivering the substrate to the translocase (39). Some of these functions relate to the mature domain of the protein, which could account for why an ssPlcB-ЈBlaC protein was exported by M. smegmatis but the full-length PlcB-HA was not. Notably, PlcH of P. aeruginosa requires PlcR chaperones for its export (12). However, no obvious PlcR homologs exist in M. tuberculosis. Another possibility is that a conserved component of the M. smegmatis and M. tuberculosis Tat systems differs in its substrate recognition abilities.
The work presented here demonstrates the power of using a Tat-specific reporter to identify functional Tat signal sequences without any preconceived bias regarding the features that define them. By examining full-length versions of proteins in wild-type and ⌬tat strains of M. smegmatis, we showed three of the proteins identified to be true Tat substrates. The results with the other two proteins tested emphasize the importance of domains beyond the Tat signal sequence and bacteriumspecific factors in defining a true Tat substrate. In our quest to prove that the virulence factor PlcB is a true Tat substrate, we discovered that this protein is exported by the Tat pathway only its native host, indicating the existence and requirement for pathogen-specific host factors in its Tat export.