Previous Article | Next Article ![]()
Journal of Bacteriology, November 2004, p. 7134-7140, Vol. 186, No. 21
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.21.7134-7140.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, and National Institute of Standards and Technology, Rockville, Maryland,1 Center for Synchrotron Radiation Research and Instrumentation, Biological, Chemical and Physical Sciences Department, Illinois Institute of Technology, Chicago, Illinois2
Received 6 May 2004/ Accepted 9 August 2004
|
|
|---|
|
|
|---|
The regulation of this dynamic metabolic system is complex and not completely understood. Since the GCS provides C1 units for the synthesis of amino acids like serine, methionine, and formylmethionine, it is subject to regulation by the general amino acid control system (34). In Escherichia coli, the components of the GCS are encoded by the genes of the gcvTHP operon, which is regulated by the global regulatory proteins Lrp, PurR, and cAMP receptor protein and by the gcv-specific transcriptional regulators GcvA and GcvR (28, 33). Together, these proteins modulate transcription of the operon in response to the levels of glycine and purines. The repressor capability of GcvR is realized through complex formation with GcvA, whereas the other proteins bind directly to the gcv control region (7).
Closely related to the GCS is the glycine betaine pathway that involves three demethylation steps catalyzed consecutively by betaine-homocysteine methyltransferase, dimethylglycine dehydrogenase (oxidase) (DMGO), and sarcosine dehydrogenase (oxidase) to produce glycine and C1 units (16). Oxidation of the methyl groups of betaine by these dehydrogenases generates formaldehyde, which is efficiently channeled into the folate-C1 pathways by the formation of 5,10-CH2-THF in the presence of THF. In some bacteria, such as Arthrobacter and Corynebacterium spp., the genes encoding SHMT (glyA), heterotetrameric sarcosine dehydrogenase (oxidase) (soxBDAG), 10-CHO-THF hydrolase (purU), DMGO (dmg), formiminotransferase/cyclodeaminase (fic), and 5,10-CH2-THF dehydrogenase (folD) form an operon and are subject to regulation by unidentified proteins (20). At the same time some of these enzymes may also play a regulatory role. For instance, 10-CHO-THF hydrolase functions to balance the pools of 10-CHO-THF and THF in response to the changes in the methionine/glycine ratio (24).
Identification of the regulators of C1 metabolism is important for understanding various cellular processes and ultimately may have implications for drug discovery. A number of human diseases and disorders, including neural tube defects, epithelial cancers, and cardiovascular disease, are associated with folate deficiency and disturbances in folate-mediated reactions (2, 15).
The protein described in this report, which is proposed to have a role in C1 metabolism, has emerged as a target in a structural genomics project aimed at the functional assignment of proteins through three-dimensional structure determination (5). The ygfZ gene of E. coli encodes an uncharacterized protein with a molecular mass of 36 kDa. Homologs of this protein are present in most bacteria and eukaryotes, but not in archaea. Based on marginal (15%) sequence identity to the T-protein of the GCS, the YgfZ protein has been annotated as a putative aminomethyltransferase. The crystal structure of YgfZ revealed a three-domain ring-like protein molecule with a deep hydrophobic cavity and no apparent active site. When the structure of DMGO from Arthrobacter globiformis (14) became available, it appeared that the two structures are strikingly similar. DMGO is an enzyme of the betaine pathway and a homolog of the T-protein (26% sequence identity). Based on structural superposition, a folate-binding site was identified in the central cavity of YgfZ, and the ability of YgfZ to bind folate derivatives was confirmed experimentally. However, in contrast to DMGO and T-protein, the YgfZ family lacks amino acid conservation at the folate site, which implies that YgfZ is probably not an aminomethyltransferase and may be not an enzyme at all but rather is a folate-dependent regulatory protein involved in C1 metabolism.
|
|
|---|
Crystallization and structure determination. YgfZ crystals were grown at room temperature by vapor diffusion in hanging drops by mixing 1.5 µl of a 15-mg/ml protein solution with 1.5 µl of a reservoir solution containing 30% saturated ammonium sulfate and 0.1 M sodium acetate (pH 4.5). The crystals reached the maximum size (0.2 mm) in few days. YgfZ crystals belong to space group P3121 with the unit cell parameters a = b = 151.0 Å and c = 68.0 Å. There is one protein molecule in the asymmetric unit, which gives a specific volume of 6.2 Å3/Da. Crystals were frozen in the mother liquor supplemented with 50% saturated lithium formate. X-ray data were collected at 100 K at the IMCA-CAT beamline 17-BM at the Advanced Photon Source (Argonne, Ill.) equipped with a Mar CCD (165-mm) detector, and they were processed with HKL2000 (29).
The structure was solved by the multiwavelength anomalous diffraction method by using one SeMet protein crystal. X-ray data at 2.8-Å resolution were measured at four wavelengths (Table 1). Six selenium sites were located by the shake-and-bake method (38) and were used for phasing with MLPHARE/DM (3). The atomic model was built by using O (11) and was refined with REFMAC (22) by using the low-energy data set (Table 1). The final model contains 325 amino acid residues. The remains of the His tag (three residues at the N terminus) and the C-terminal Glu326 are not visible in the electron density map. Pro157 is in the cis conformation. The atomic B-factors are high (64 Å2 on average), and the value corresponds to the mean B-factor calculated from the Wilson plot (Table 1). The polypeptide tracing is unambiguous, but the positional errors for many side chains may be quite high due to the low resolution of the data.
|
View this table: [in a new window] |
TABLE 1. X-ray data and refinement statistics
|
Fluorescence measurements. Fluorescence intensity was measured at 25°C by using a Fluoromax-2 spectrofluorometer (Jobin Yvon) in a 200-µl quartz cell. Tryptophan residues in the protein were excited at 280 nm, and emission was monitored at 335 nm with a slit width of 5 nm. The reaction mixture contained 1 ml of a 100 nM protein solution in 20 mM Tris-HCl (pH 7.5)-100 mM NaCl. It was titrated with increasing amounts of each ligand by sequential addition of 1 to 2 µl of stock solutions, so that the total dilution of the initial solution did not exceed 2.5%. Because both folic acid and THF display significant fluorescence with the maximum emission spectra at 355 nm, their final concentrations were limited to 3 and 0.3 µM, respectively, to maintain the ligand contribution to the measured fluorescence below 25%. All readings were corrected for background emission of the buffer and free ligand solutions (folic acid in water and THF in 0.1% mercaptoethanol [to prevent oxidation]).
The dissociation constant (Kd) was determined by using the following equation:
F =
Fmax[L]/([L] + [Kd]), where [L] is the free ligand concentration. The fraction of the protein containing a bound ligand molecule (
F/
Fmax) was defined as the fraction of the total quenchable tryptophan fluorescence that was quenched at each point of the titration.
Fmax was estimated by extrapolation at high [L] values of a plot of 1/
F versus 1/[L].
Accession code of the structure. The atomic coordinates and structure factors of the YgfZ protein have been deposited in the Protein Data Bank under accession code 1NRK.
|
|
|---|
The YgfZ homologs are widely represented in bacteria and in eukaryotes, including fungi, plants, insects, and mammals, but not in archaea. Besides the fingerprint motif, very few residues are conserved throughout the entire family (Fig. 1). These residues include the basic amino acids Arg and Lys in positions 68, 237, and 245 that must have functional importance and several glycine residues that typically play a structural role.
![]() View larger version (28K): [in a new window] |
FIG. 1. Structure-based sequence alignment of the protein families represented by YgfZ from E. coli (YGFZ), glycine cleavage system T-protein from E. coli (GCST), and dimethylglycine oxidase from A. globiformis (DMGO). Residues strictly conserved in each protein family are in black boxes, and residues conserved in 90% family members are in open boxes. The catalytic Asp in T-protein and DMGO is indicated by a star, and the folate-anchoring Glu is indicated by a triangle. The alignment was prepared with ESPRIPT (8).
|
Description of the structure.
The crystal structure of YgfZ was determined by the multiwavelength anomalous diffraction method by using a selenomethionine protein. The protein molecule has a globular shape, and the dimensions are 60 by 50 by 30 Å. The molecule consists of three domains arranged in a ring-like structure with a narrow central channel (Fig. 2). Domain A includes residues 1 to 26 and 114 to 196. Domain B includes residues 27 to 113 and may be considered an insertion in domain A. Both of these domains have a ferredoxin-like fold (i.e., an
+ß sandwich with an antiparallel ß-sheet and two
-helices on one side of the sheet). In a deviation from a canonical four-strand ß-sheet, domain B has an additional ß-strand, and domain A has two additional ß-strands. The ferredoxin fold is common in many functionally unrelated and nonhomologous proteins, such as metallochaperones, protease propeptides, and RNA-binding domains of various enzymes (23). Despite the topological similarity of domains A and B, no sequence homology between them was detected.
![]() View larger version (44K): [in a new window] |
FIG. 2. Ribbon diagram (upper diagram) and electrostatic surface potential diagram (lower diagram) (blue, positive; red, negative) of the YgfZ molecule. The linker between domains A and C is shown in green. Cys228 indicates the location of the fingerprint sequence K226GCYTGQE233. The THF molecule (not observed in the crystal structure) is the folate-binding site. The diagrams were produced with MOLSCRIPT (13), RASTER3D (19), and GRASP (25).
|
and in the gamma subunit of the initiation factor eIF2. These domains are involved in binding of the T
C loop of tRNA (26).
A 50-residue segment comprising residues 197 to 245 cannot be attributed to either domain. It is sandwiched between domains B and C and, with the exception of two short
-helices, lacks secondary structure elements.
The interfaces between the domains are extensive. The surface areas of domains A, B, and C buried in these interfaces are 1,400, 2,000, and 1,000 Å2, respectively. When these areas are compared to the total surface area of each domain (about 6,000 Å2), the values indicate the tight domain packing in the protein molecule. On the other hand, the three-domain structure provides sufficient conformational flexibility that may be utilized for, e.g., altering the shape of the central channel. Domain C is probably the least restricted part of the structure; it has the smallest interface and a significantly higher average B-factor (96 Å2) than domains A (45 Å2) and B (57 Å2). High B-factor values reflect loose packing of the protein in crystals that contain 80% solvent.
Structural similarity to DMGO. The structure of YgfZ is strikingly similar to that of DMGO from A. globiformis (14), an enzyme of the betaine catabolism pathway. DMGO is formed by fusion of two subunits that catalyze separate half-reactions, the flavin adenine dinucleotide-dependent amine oxidation of dimethylglycine and the THF-dependent conversion of the iminium intermediate to sarcosine. The two active sites are connected by an internal cavity that enables sequestration of the reactive iminium intermediate and avoids formation of toxic formaldehyde. YgfZ matches the THF-binding subunit of DMGO, which resides in the C-terminal region of the sequence.
A DALI (9) similarity search yielded DMGO as a top hit, with a Z score of 23.6, which corresponds to the root-mean-square deviation (RMSD) of 3.0 Å for 296 common C
atoms. The overall superposition of YgfZ and DMGO shows that the three domains composing the protein molecule have slightly different relative orientations in these structures. The domains can be individually superimposed with RMSD of 2.2, 1.2, and 1.6 Å for domains A, B, and C, respectively. The somewhat higher value for domain A is due to the shift of the helical segments.
Folate-binding site. The crystal complexes of DMGO with folic acid and 5-CHO-THF (folinic acid) indicate that the binding site of the cofactor is in the central channel of the ring structure (14). Folate is bound in a kinked conformation, with the pterin group deeply imbedded in the protein (Fig. 2). The binding pocket is predominantly hydrophobic, and the interacting residues come from domains A and B and a central portion of the linker. The carboxyl group of Glu658 forms a bidentate hydrogen bond with the pteroyl amino groups N2 and N3. From a comparison of the DMGO structures with different ligands it was noted that binding of THF is accompanied by conformational changes in the enzyme, most of which result in improved hydrophobic packing with the folate ring system (14).
Based on the structural similarity to DMGO, it was possible to locate the putative folate-binding site in YgfZ and to model the cofactor binding. Overall, the binding pocket is hydrophobic and has a large fraction of aromatic amino acids, which corresponds to the nature of interactions observed in DMGO (Fig. 3). Leu162 occupies a position equivalent to Glu658 in DMGO, implying that the hydrogen bonds to N2 and N3 of pterin are not preserved in YgfZ. Although some adjustments of the surrounding residues are needed to accommodate a folate molecule, the modeling study suggests that YgfZ is capable of binding folate derivatives.
![]() View larger version (68K): [in a new window] |
FIG. 3. Central channel entrance in YgfZ. Leu162 occupies the position of the folate anchor Glu658 in DMGO, and Asn72 is present in place of the catalytic Asp552 residue.
|
![]() View larger version (14K): [in a new window] |
FIG. 4. Quenching of tryptophan fluorescence upon addition of folic acid (A) and THF (B). The protein concentration was 100 nM. The graphs were prepared with PSI-Plot (Poly Software).
|
|
|
|---|
It should be noted that an aspartic acid occupies the position equivalent to Asp552 (Asp72 in E. coli YgfZ numbering) in all eukaryotic family members and in the representatives of the alpha division of the Proteobacteria. In all other members of the YgfZ family, neither Asp nor Glu is present at this position. Thus, it is possible that the family includes two subfamilies, one with a potential for enzymatic activity and the other without such a potential. The phylogenetic distribution among the YgfZ homologs implies that this division may be related to the evolution of this protein family.
The only fragment that bears residues conserved in the entire YgfZ family is the octapeptide fingerprint motif located in the interdomain linker on the surface of the protein. All residues of the motif except Tyr229 are exposed to solvent and are not in contact with each other or with the rest of the molecule. The strict conservation of these residues suggests their functional importance. As a theoretical possibility, one might consider a structural rearrangement that would bring the fingerprint loop to the folate-binding site. Although such a conformational transition cannot be ruled out, it seems to be unlikely given its massive scale.
As an alternative hypothesis, we suggest that YgfZ may function as a signal transducer by sensing certain folate derivatives (e.g., THF) for which it has a high affinity. Folate binding in the central cavity would trigger the conformational changes in the protein, such as the relative movement of the domains. The domain mobility in YgfZ can be predicted from the difference in the average B-factors and from superposition on the DMGO structure, as discussed above. Some adjustment of the hydrophobic pocket was required to accommodate even structurally similar ligands in DMGO (14), suggesting that the transfer from the free state to the bound state may cause significant conformational changes. The interdomain linker must be sensitive to the domain movements, and hence it would be an ideal location for the interaction with the target molecule. The conserved sequence motif would ensure specific target recognition. The presence of two glycine residues in the motif indicates the conformational flexibility of the polypeptide fragment that may be important for signal transduction.
The central residue of the motif, Cys228, bears a reactive thiol group. In the present structure, Cys228 forms an unusual intermolecular disulfide bridge with the equivalent residue from another YgfZ molecule. The protein is thus a symmetric dimer sitting on the crystallographic twofold axis. The interface is not particularly extensive and covers about 7% of the monomer surface. There are primarily nonspecific van der Waals interactions between the two molecules. With the exception of Cys228, none of the invariant residues is involved in the interaction, suggesting that the dimer is not functionally relevant. If a possible recognition role of the fingerprint motif is considered, Cys228 may be a key anchor in the molecular interaction.
Next to the putative recognition site is a concave positively charged surface between domains B and C (Fig. 2). Importantly, the basic character of the residues covering the surface is conserved in the YgfZ family. The charge and the shape of the surface suggest that it may bind a nucleic acid. The tRNA-binding role of the topologically similar domains in translation factors supports this suggestion (26). It is tempting to speculate that YgfZ may act as a transcriptional regulator. One such uncharacterized protein has recently been shown to recognize a conserved CATCN7CTTCTT motif present in the promoter regions of the yeast gcv genes (10). The formation of the complex is responsive to THF, indicating that glycine-specific control may be mediated via folate derivatives. The CATCN7CTTCTT motif is also found in the promoter of the DFR1 gene encoding dihydrofolate reductase, which catalyzes de novo synthesis of THF.
Transcription factors typically bind DNA with a pair of helix-turn-helix motifs that interact with nucleotide bases in the major groove (21). However, the concave surface in YgfZ seems to favor unspecific DNA binding and does not reveal any pattern for recognizing a particular nucleotide sequence. On the other hand, the presence of two recognition surfaces, one for targeting a protein and the other for nucleic acid binding, suggests that YgfZ may be part of a nucleoprotein complex. Targeting a transcription factor is a definite possibility.
Taking into account the sensitivity to folates, one may hypothesize that YgfZ is a folate-dependent regulatory protein that may affect the expression of the proteins in response to changes in the environment. Consistent with this hypothesis is the fact that the yeast homolog of YgfZ, CAF17 (CCR4-associated factor), is a component of the transcriptional regulatory complex. The data were obtained from yeast two-hybrid whole-genome screening (36). The CCR4 regulator is an evolutionarily conserved transcriptional regulator involved in controlling mRNA initiation, elongation, and degradation (4).
Another observation related to the putative DNA-binding function of YgfZ is particularly interesting because it involves a THF-dependent enzyme. A single-stranded DNA-binding activity has been reported for C1-THF synthase from different organisms (37). This protein does not bind double-stranded DNA or RNA but does bind single-stranded DNA with high affinity and in a sequence-independent fashion. Among various possibilities, we believe that C1-THF synthase might regulate gene expression of one of the enzymes involved in C1 metabolism.
Analysis of the genomic context indicates that ygfZ does not belong to any particular gene string, although in some enterobacteria, including E. coli, Yersinia pestis, and Salmonella enterica serovar Typhimurium, the ygfZ gene is located upstream of the gcv operon coding for the GCS proteins. This proximity may reflect involvement of YgfZ in the regulation of C1 pools related to the GCS. Further experiments should identify the molecular partners of YgfZ and reveal its role in C1 metabolism. The three-dimensional structure of YgfZ helps to establish a new protein family widely represented in bacteria and eukaryotes whose members are topologically similar to, but functionally different from, the T-protein-like enzymes.
This work was supported by National Institutes of Health grant P01-GM57890. Use of the APS was supported by the U.S. Department of Energy Basic Energy Sciences Office of Science under contract W-31-109-Eng-38.
Certain commercial materials, instruments, and equipment are identified in this paper in order to specify the experimental procedure as completely as possible. In no case does such identification imply a recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the materials, instruments, or equipment identified are necessarily the best available for the purpose.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»