Previous Article | Next Article ![]()
Journal of Bacteriology, July 2003, p. 4057-4065, Vol. 185, No. 14
0021-9193/03/$08.00+0 DOI: 10.1128/JB.185.14.4057-4065.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
School of Biological Sciences, University of Auckland, Auckland, New Zealand,1 St. Vincent's Institute of Medical Research, Fitzroy, Victoria 3065, Australia2
Received 7 February 2003/ Accepted 17 April 2003
|
|
|---|
|
|
|---|
60% for which functional annotations have been made are, however, of imperfectly described or uncertain function (for example, described simply as putative dehydrogenases), and some are likely to be wrong because functional annotations are in most cases derived by inference rather than by experiment, through the observation of some level of sequence identity in a gene product with a characterized gene product from another organism. We present here the structural analysis of a gene product from Mycobacterium tuberculosis which indicates that the annotated function in this and other bacterial genomes is likely to be wrong. The complete genome sequence for M. tuberculosis strain H37Rv was reported in 1998 (6). The global significance of this pathogen is immense. As the cause of tuberculosis, it kills two to three million people around the world each year, more than any other single infectious agent. It is further estimated that around one-third of the world's population is infected as a result of the ability of the organism to persist for many years inside activated macrophages in a semidormant or latent form (2, 3, 31). Although effective drugs are available, treatment regimens are long and difficult and multidrug resistance is rising (2, 3, 31). This has resulted in a resurgence of interest in the biology of the organism.
One initiative in worldwide efforts to understand the biology of tuberculosis and to characterize potential new drug targets has been the formation of the Tuberculosis Structural Genomics Consortium (http://www.doe-mbi.ucla.edu/TB), a group of collaborating laboratories in a number of countries whose aim is to coordinate and facilitate the determination of the three-dimensional structures of large numbers of proteins from M. tuberculosis.
Menaquinone (vitamin K) is an essential vitamin that is an obligatory component of the anaerobic electron transfer pathways that operate not only in strict anaerobes but also in aerobic gram-positive bacteria, including M. tuberculosis (25, 26). This function may be particularly important in M. tuberculosis under conditions of low oxygen and may thus play a role in the persistence of the bacteria within activated macrophages. Coupled with the observation that menaquinone is an essential nutrient that is not synthesized in animals, this makes enzymes of the menaquinone biosynthetic pathway attractive drug targets.
In Escherichia coli, the menaquinone biosynthetic pathway involves either seven or eight enzymes (26). Sequence comparisons have found homologues for seven of these enzymes in the M. tuberculosis H37Rv strain. One of these proteins, MenG, was identified with the open reading frame annotated Rv3853, based on sequence similarity with a gene product from the E. coli genome. The E. coli enzyme in turn had been annotated as MenG on the basis of its position adjacent to the menA gene in the genome and apparent sequence similarities with S-adenosylmethionine (SAM)-dependent methyltransferases (25); MenG was proposed to be the SAM-dependent methyltransferase that transfers a methyl group to demethylate menaquinone in this final step. Intriguingly, the MenG sequence, which comprises 157 amino acid residues and represents a polypeptide of 16.2 kDa, has none of the common methyltransferase motifs, and the M. tuberculosis genome encodes another protein, identified as UbiE (Rv0558), that could also catalyze this final step (22).
In order to clarify its function by revealing possible homologies that cannot be seen at the sequence level and to provide a template for possible drug design, determination of the structure of the Rv3853 gene product was undertaken in the context of the tuberculosis structural genomics initiative. While this work was in progress, the structure of the E. coli homolog of Rv3853 was independently determined (J. D. Robertus, personal communication); the two proteins were found to have essentially identical structures, in both cases suggesting an incorrect functional annotation.
|
|
|---|
Protein refolding and purification. The N-terminally His-tagged fusion protein was denatured by cell lysis in a phosphate-Tris buffer containing 9 M urea at pH 8.0 and purified by Ni2+ affinity chromatography. The protein was then refolded by dialysis at room temperature through sequential transfers, first into refolding buffer (100 mM L-arginine, 100 mM sucrose, 50 mM morpholineethanesulfonic acid [MES], 10 mM NaCl, 0.4 mM KCl, 1 mM EDTA, 1 mM dithiothreitol) and then into storage buffer (50 mM Tris-HCl [pH 8.0], 50 mM NaCl, 1 mM EDTA). The refolded protein was purified by size exclusion chromatography (Superdex 200; Pharmacia) and then further purified by anion exchange chromatography (Mono Q; Pharmacia). Light-scattering data for the final protein solution (2 mg of Rv3853 per ml) showed the protein to be a monodisperse solution of trimeric protein. The molecular mass calculated from the hydrodynamic radius was 64.5 kDa, compared with a monomer molecular mass for the His-tagged protein of 19.4 kDa.
Crystallization and soaking experiments. Rv3853 crystals were grown by using hanging drops at 18° by mixing 4 to 5 µl of protein solution (50 mM Tris-HCl, 140 mM NaCl [pH 8.0], 2 mg of Rv3853 per ml) with 1 µl of precipitant solution (0.45 M potassium-sodium tartrate). Hexagonal blocks typically emerged after 2 to 3 days and grew larger over several weeks. The crystals were hexagonal, space group P63, with cell dimensions a = b = 102.5 Å and c = 117.5 Å. Three molecules occupy the asymmetric unit, corresponding to a solvent content of 63.9% and a Matthews coefficient of 3.43 Å3/Da.
Soaking experiments with heavy atom compounds and other ligands were conducted by soaking crystals at room temperature in an artificial mother liquor comprising 0.4 M potassium-sodium tartrate to which the appropriate compound was added. For heavy atom derivative preparation, crystals were soaked in 1 mM mercuric acetate for 9 days. Other soaking experiments were carried out with 1 mM and 10 mM SAM, 1 mM L-methionine, 1 mM ATP (chosen because of its adenosyl moiety), 1 mM menadione (equivalent to the product menaquinone but lacking the isoprenyl tail), and 1 mM Zwitergent 3-12, a detergent with a dodecyl group that might approximate the isoprenyl tail.
Data collection and processing.
Data collection was done at 110 K with crystals that had been soaked in cryoprotectant (mother liquor plus 35% glycerol) immediately prior to freezing in a stream of cold N2 gas. Native Rv3853 data and Hg derivative data were collected with CuK
radiation (
= 1.5418 Å) from a Rigaku RU-H3R X-ray generator equipped with focusing mirrors and a Mar 345 imaging plate detector (Table 1). Subsequently, a high-resolution native data set was collected with synchrotron radiation (
= 0.8452 Å) at DESY Hamburg, beamline BW7V. The raw data were processed with DENZO (30) and subsequently scaled with Scalepack (30).
|
View this table: [in a new window] |
TABLE 1. Data collection and processing
|
![]() View larger version (42K): [in a new window] |
FIG. 1. Stereo views showing the electron density for the two small molecules bound to each of the Rv3853 monomers, the putative tartrate molecule (a) and the putative glyoxalate molecule (b). Electron density is from a 2Fo-Fc electron density map, contoured at 1.0 . In b, the red and blue colors indicate adjacent monomers. Figure drawn with Pymol (8).
|
|
View this table: [in a new window] |
TABLE 2. Refinement and model detailsa
|
Atomic coordinates. Atomic coordinates have been deposited with the Protein Data Bank, with accession code 1nxj.
|
|
|---|
Monomer fold.
The monomer is folded into a single domain that can be described as a three-layer ß/ß/
structure (Fig. 2). The first layer consists of a four-stranded antiparallel ß-sheet (strands S1, S12, S11, and S3) that sits adjacent to a two-stranded ß-ribbon (S9 and S10). This layer packs against a central six-stranded, mostly parallel ß-sheet (S8, S4, S5, S6, S7, and S2) that forms the second layer. The third layer of the "sandwich" comprises three parallel
-helices (H2, H3, and H4) that provide the S4-S5, S5-S6, and S6-S7 connections. A large extended loop region, comprising 20 residues, finishing with the short strand S8, wraps around layers 2 and 3 and leads back to layer 1. Located between the two ß-sheet regions (layers 1 and 2) is a hydrophobic groove, which is "capped" by the loop region connecting strands S2 and S3 of the two sheets. Outside the main ß/ß/
domain, the N-terminal
-helix H1 packs against the first ß-sheet and also forms an important part of the monomer-monomer interface in the trimer.
![]() View larger version (24K): [in a new window] |
FIG. 2. (a) Topology diagram for the Rv3853 monomer. The three layers in this ß/ß/ structure are shown in blue, red, and yellow. (b) Fold of the monomer, with ß-strands shown as orange arrows and -helices as yellow coils. The two bound ligands, tartrate (lower) and a putative glyoxalate (upper), are shown in stick mode.
|
pairs. The fold shared by Rv3853 and this phosphohistidine domain is described in SCOP (27) as a "swiveling" ß/ß/
fold and in CATH (29) as a three-layer ß/ß/
sandwich. Other structures classified under this fold and also recognized as being related to Rv3853 by DALI (15) include the phosphohistidine domain of enzyme I of the E. coli phosphoenolpyruvate:sugar phosphotransferase system (23), the small subunit of carbamoyl phosphate synthase (36), and a domain from aconitase (20). Quaternary structure. The Rv3853 trimer (Fig. 3) is donut shaped with a large hole (diameter approximately 8 to 10 Å) through the middle. At each monomer-monomer interface, the N-terminal helix (residues 7 to 16), the following H1-S2 connection (residues 21 to 27), and the C-terminal S11-S12 loop (residues 151 to 152) of one monomer pack into a cleft in the neighboring monomer that is formed between residues 115 to 121 of the extended loop joining S7 to S8 (Fig. 2) and the loops that connect the ß-strands of layer 2 with their respective helices (the S4-H2, S5-H3, and S6-H4 loops). The total surface area buried at each of the three monomer-monomer interfaces is 530 Å2, meaning that 9.6% of the surface area of each monomer is buried. Trimer formation is stabilized by a number of hydrogen bonds and four salt bridges (Asp13-Arg100, Asp13-Lys121, Asp23-His73, and Asp23-Arg120). A striking feature of the trimer, highlighted by a GRASP (28) plot (Fig. 3b), is a groove that runs the whole length of each monomer-monomer interface, incorporating both the tartrate and glyoxalate binding sites (see below), and a canyon of negative charge in which are found Asp10, Asp13, Asp150, and Asp152.
![]() View larger version (51K): [in a new window] |
FIG. 3. Rv3853 trimer, shown (a) as a ribbon diagram, drawn with Pymol (8), and (b) in a surface representation, drawn with GRASP (27), showing the distribution of surface charge. In both diagrams, the putative tartrate (A) and glyoxalate (B) molecules are shown in stick representation, bound to each of the three monomers. Adjacent to the tartrate binding site is a prominent, negatively charged canyon at the subunit interface that contains several residues conserved in all Rv3853 homologs.
|
![]() View larger version (104K): [in a new window] |
FIG. 4. Sequence alignment of 13 representative Rv3853 homologs chosen from both bacteria and plants, including Mycobacterium leprae, Shewanella oneidensis, Arabidopsis thaliana, Oryza sativa, Ralstonia solanacearum, Pseudomonas fluorescens, Xanthomonas campestris, Escherichia coli, Haemophilus influenzae, Thermobifida fusca, Vibrio cholerae, and Corynebacterium glutamicum. Fully conserved residues in these 13 sequences are indicated below each alignment, as are the locations of secondary-structure elements. The alignment was generated with FarOut.
|
Small-molecule binding sites. None of the soaking experiments with ligands related to the presumed substrate (menaquinone) or cofactor (SAM) showed any evidence of binding. However, every electron density map, for either native or soaked crystals, showed two well-defined pieces of nonprotein density that must represent bound small-molecule ligands. Both were present for all three independent monomers in the asymmetric unit of the crystal.
The first (Fig. 5a) occupies a shallow pocket between the N terminus of helix H3 and a portion of the long S7-S8 loop (Fig. 2) near the monomer-monomer interface. This density was interpreted as a bound tartrate ion on the basis of the excellent fit of tartrate to the density (Fig. 1a) and the presence of 0.45 M tartrate in the crystallization medium. Refinement supported this assignment, as all atoms assumed B factors that were similar to each other and similar to atoms in the surrounding protein structure (15 to 25 Å2). One carboxylate group is nicely positioned at the N terminus of helix H3, hydrogen bonded to the free peptide NH groups of residues 78 and 81 and to a conserved, well-defined water molecule that bridges to Arg100 and Asp101 (Fig. 5a), which are located in the loop region connecting S6 to H4. The other carboxylate group receives hydrogen bonds from Asn48 ND2, the peptide NH of Ser122, and the amino group of Lys124. On the other hand, the two tartrate hydroxyl groups make few or no hydrogen bonds. One is hydrogen bonded to Asn48 ND2 and (in two out of three monomers) to a water molecule that bridges to Lys52 NZ. The other makes no hydrogen-bonded interactions.
![]() View larger version (43K): [in a new window] |
FIG. 5. Stereo views of the two binding sites for small-molecule ligands. In a, the binding site for the tartrate ion is shown, with its protein ligands and a conserved water molecule found for all three monomers, and in b that for the tentatively assigned glyoxalate molecule is shown. In each case, hydrogen bonds are shown with broken yellow lines. In b, the two adjacent monomers that form the binding site are shown in blue and yellow, respectively.
|
In silico analysis. Nine binding sites were found by the SiteID analysis (Tripos Inc.), distributed symmetrically round the trimer, three per monomer. Each set of three sites was found to be located in the groove at the monomer-monomer interface (Fig. 3b). The largest (site 1, volume 28 Å3) is at the inner end of the interface, adjacent to the hole through the center of the trimer. In the crystal structure, this pocket is occluded by the glyoxalate molecule, which sits in the site entrance, leaving the majority of the volume unoccupied. This site is bounded by residues 22 to 28, 113, 117, and 151 to 154. Site 2 (volume 16 Å3) corresponds to the acidic canyon, with contributing residues including Phe6, Asp10, Gln34, Asp101, Ala102, Ala103, Asp150, Asp151, and Asp152. Site 3 (volume 14 Å3) is adjacent to site 2 and is completely filled by the tartrate ion described above.
When the regions around the above three sites were screened against our in-house database of potential ligands, there was a strong preference for planar, fused-ring systems, most containing at least one nitrogen atom, such as indole or nucleoside base derivatives. The best 50 hits for the region around site 1 gave ScreenScore values (34) from -39.7 down to -35.7, suggesting affinities in the submicromolar range. All of the binding orientations bury a substantial region of the ligand within the deep site 1 pocket, although a proportion extend beyond the pocket and interact with residues in site 2 as well. Hits in the region around sites 2 and 3 suggest a lower affinity (ScreenScore values of -37.6 to -27.9), but still high enough to suggest significant in vitro affinity of some ligands. Again there is a preference for planar, fused-ring compounds. Interestingly, the tartrate pocket (site 3) is poorly filled by many of the compounds, which prefer to bind beneath the pocket in a continuation of the groove between the monomers, where it wraps under the protein.
|
|
|---|
Two families of SAM-dependent methyltransferases have been characterized structurally. The predominant family has a conserved
/ß fold whose defining feature is a seven-stranded ß-sheet that has six parallel strands and one antiparallel and carries the SAM binding site (24). Although sequence identity is very low across the whole superfamily, structure-based alignment of the sequences of 28 family members shows that conserved amino acid sequence patterns are associated with SAM binding (24). These methyltransferases act on a wide variety of substrates, including nucleic acids, proteins, lipids, and small molecules, with diverse binding sites being created by a variety of extrusions from the canonical seven-stranded ß-sheet. The Rv3853 gene product has neither the fold nor the sequence patterns that are characteristic of this superfamily of methyltransferases.
The second family of SAM-dependent methyltransferases acts on histones, methylating lysine residues, and is defined by a conserved domain called the SET domain (37, 38). This is a small domain (
130 residues) with several small antiparallel ß-sheets, a relatively exposed SAM binding site, and several conserved sequence motifs (37, 38). Again, neither fold nor sequence motifs are shared by the Rv3853 gene product. Other SAM-binding proteins, such as the C-terminal domain of methionine synthase (11) and the cobalt precorrin-4 methyltransferase CbiF (33), also have folds very distinct from that of the Rv3853 gene product.
Bioinformatic analysis of the M. tuberculosis genome further suggests that Rv3853 is not MenG, the terminal enzyme in menaquinone biosynthesis. Homologs of five of the other enzymes from this biosynthetic pathway (MenA, MenB, MenC, MenD, and MenE) can be found clustered in close proximity in the genome, between Rv0534 and Rv0555, far removed from Rv3853. Also in this portion of the genome is a gene (Rv0558) that is annotated as ubiE, encoding a SAM-dependent methyltransferase that is the terminal enzyme in ubiquinone biosynthesis. Given that it has been shown experimentally in several bacterial species that UbiE can carry out the MenG reaction (17, 22) and that no other ubiquinone biosynthetic enzymes can be found in the M. tuberculosis genome, it is highly probable that it is the gene product of Rv0558 that performs this final methyl transfer step in menaquinone biosynthesis. Unlike Rv3853, Rv0558 does contain SAM-dependent methyltransferase sequence motifs and has homologs in several bacterial species that are actually annotated as MenG, with functional support (17, 32).
What, then, is the biochemical and cellular function of Rv3853 and its homologs in other organisms? Searches of the current structural database show that the closest structural relationships with Rv3853 involve the phosphohistidine domains of several proteins involved in phosphate transfer. In these proteins,a phosphate group is transferred from one substrate to another (the substrates being either small molecules or entire protein domains) via an active-site histidine residue that is transiently phosphorylated. Other, weaker matches are found with the transferrin receptor apical domain (z = 4.0) (21), part of the thermosome (z = 3.6) (10), subdomain 4 of the AICAR (5-aminoimidazole-4-carboxamide-ribonucleotide) transferase domain of AICAR transformylase, which is involved in purine biosynthesis (z = 3.2) (13), the substrate binding domain of D-2-hydroxyisocaproate dehydrogenase (z = 3.2) (9), and the apical domain of GroEL (z = 3.0) (4).
In the two closest matches, the phosphohistidine domains of pyruvate phosphate dikinase (14) and enzyme I of the E. coli phosphoenolpyruvate:sugar phosphotransferase system (23), the active-site histidine is at the N terminus of a helix corresponding to H4 in Rv3853 and is preceded by a loop that contains several conserved residues. Rv3853 does not have a histidine residue in this position, precluding a similar histidine-mediated phosphoryl transfer. It is intriguing to note, however, that in Rv3853 the equivalent loop carries Arg100, which is one of the few residues that is totally conserved in the all the homologous sequences that we have been able to examine. Moreover, this arginine lies between sites 2 and 3 at the monomer-monomer interface. Site 2 also contains two fully conserved residues, both aspartate, suggesting a functional importance for these two pockets and for Arg100 between them.
Considering the structure as a whole, the trimer exhibits a potential ligand-binding groove that encompasses most of each monomer-monomer interface. Within this groove, a series of obvious pockets exist which may represent the primary ligand-binding sites on the protein. Two of these sites are occupied in the crystal structure; a small pocket on the outside of the protein is filled by a molecule of tartrate, and the largest pocket on the protein on the inside of the ring is occluded by a single small molecule that is tentatively modeled as glyoxalate. These bound ligands must come from the purification and crystallization media, since the urea denaturation that preceded refolding of the protein should have dislodged any cell-derived metabolites or cofactors.
In silico screening of the three sites on the protein identified by SiteID analysis indicated that the larger pocket (site 1) has a predilection for heterocyclic fused 5,6 ring systems such as indoles and purines. This may indicate that the protein function is related to nucleotide manipulation, with this site being involved in base binding, and if this is the case, the positive charge at its opening is potentially relevant. The tartrate pocket is apparently a fairly poor binding site, with the 78% of hits in the region including the pocket actually binding elsewhere.
The location of the binding sites at the subunit interfaces of the trimer, coupled with its apparent stable association, suggests that this is the physiologically relevant entity. The extended grooves at each interface and probable binding pocket in the internal face of the ring suggest the possibility that the protein binds to its ligand in a manner analogous to the interaction of a sliding clamp with a nucleic acid. The hole through the center of the trimer is large enough (diameter
8 to 10 Å) to accept a single-stranded (but not double-stranded) polynucleotide or a polypeptide strand. These observations suggest that it could be part of a larger system, again analogous to the multimeric DNA polymerase complex that includes a sliding clamp. Alternatively, it may be part of a system for binding linear peptide sequences, with the peptide lying along the groove and some specificity for aromatic side chains, possibly tryptophan, arising from interactions in the internal pocket.
This work was supported by the New Economy Research Fund of New Zealand, the Health Research Council of New Zealand, and the Foundation for Research, Science and Technology for the award of a Bright Futures Scholarship to J.M.J.
|
|
|---|
S. Nat. Struct. Biol. 3:170-177.[CrossRef][Medline]
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»