John Toedt,1,
Michael Y. Galperin,2 and
Gary L. Gilliland1,
Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute and National Institute of Standards and Technology, Rockville, Maryland,1 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland2
Received 28 February 2005/ Accepted 11 May 2005
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Sia catabolism in bacteria involves cleavage of cell surface glycoconjugates by sialidases, transport of free Sia molecules through the membrane, and degradation of the molecules to N-acetylmannosamine and pyruvate through the action of Neu5Ac aldolase (lyase) (59). N-Acetylmannosamine is then phosphorylated by a specific kinase and isomerized to N-acetylglucosamine 6-phosphate, which enters the amino sugar metabolic pathways. Sia thus can serve as the sole carbon or nitrogen source in bacteria and as a source of amino sugars for cell wall synthesis (45).
In many bacteria the genes involved in Sia catabolism form an operon (59, 60). In Escherichia coli the operon includes the nanATEK-yhcH genes coding for the aldolase, the transporter, the epimerase, the kinase, and a protein with an unknown function, respectively. Expression of the operon is controlled by a repressor protein encoded by the upstream gene nanR (31). Since the discovery of Sia catabolism in E. coli (60), the pathway has also been described in Clostridium perfringens (61) and Haemophilus influenzae (58). The complete bacterial genomic DNA sequences revealed that nan systems are present in diverse species, including gamma-proteobacteria, clostridia, streptococci, staphylococci, and fusobacteria (59).
In this study, we focused on the uncharacterized protein encoded by the yhcH gene of H. influenzae. This protein emerged as a target in a structural genomics project aimed at the functional assignment of proteins through determination of their three-dimensional structures (17). It is highly expressed in H. influenzae and E. coli cells growing in rich medium (33, 34). The YhcH homologs are present in gram-negative and gram-positive bacteria but not in archaea or eukaryotes. No functional information for this protein family has been available, other than that one of its members, E. coli EbgC, showed up as a subunit of an experimentally evolved beta-galactosidase (18). Although this observation did not provide any direct clue to the protein function in vivo, it has been used for functional assignment in the Swiss-Prot and COG databases (5, 52). In contrast, the Pfam database (6) lists YhcH homologs as members of the Domain of Unknown Function (DUF386) protein family.
The YhcH protein was cloned and expressed, and the crystal structure was determined at 2.2-Å resolution. Analysis of the structure and of the genome context suggested that YhcH may function as a copper-dependent sugar isomerase. A possible role in Sia catabolism may involve the processing of exogenous glycolated neuraminic acid.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Equilibrium sedimentation. The oligomeric state of the protein was investigated by equilibrium sedimentation ultracentrifugation. The data were collected at 25°C and 4°C in 50 mM Tris-HCl, pH 7.5, 0.1 mM dithiothreitol, 0.1 mM EDTA buffer at a range of concentrations (0.1 to 2.2 mg/ml) and rotor speeds. The data were fitted to an ideal single-species model. No attempt to model a mixture of oligomeric states was made.
Crystallization and structure determination. YhcH crystals were grown by the vapor diffusion hanging drop method at room temperature from 0.1 M HEPES, pH 7.5, 25% polyethylene glycol 4000, 1 M sodium acetate. These crystals belong to space group P21 with the following unit cell parameters: a = 41.9 Å, b = 153.9 Å, c = 53.8 Å, and ß = 112.9°. There are four polypeptide chains in the asymmetric unit with a solvent content of 45%. For X-ray data collection, the crystals were flash-frozen in liquid propane in the crystallization solution.
The structure was solved by the two-wavelength anomalous diffraction method (MAD) using a mercury derivative. Crystals were soaked in 2 mM KHgSCN overnight and appeared to be nonisomorphous compared to the native crystals (R-merge = 44.3%) with unit cell deviations of up to 3% (a = 43.1 Å, b = 152.3 Å, c = 53.4 Å, ß = 113.6°). The 2.6-Å diffraction data for the derivative and the 2.2-Å data for the native crystal (Table 1) were collected on the IMCA-CAT beamline at the Advanced Photon Source (Argonne, IL) equipped with a MAR charge-coupled device detector. The following programs were used: HKL2000 (43) for data processing, SnB (37), MLPHARE (42), and DM (13) for phasing, O (30) for model building, and REFMAC (40) for refinement. The atomic model was built into the MAD-phased electron density and refined against the Hg derivative data. It was used for further refinement against the native data at 2.2-Å resolution. No noncrystallographic symmetry restraints were applied to the four independent protein molecules. The refinement statistics are shown in Table 1. Water molecules were added at the (Fo-Fc) electron density peaks using a cutoff level of 3
. The same native crystal was used for an X-ray fluorescence absorption experiment at the copper edge (wavelength, 1.3804 Å). A complete data set was collected at the peak wavelength (1.3766 Å) and used for anomalous Fourier calculations. Programs from the CCP4 suite (12) were used for crystallographic calculations, CLUSTALW (53) and ESPRIPT (26) were used for sequence alignment, and MOLSCRIPT (35) and RASTER3D (36) were used for ribbon diagrams.
|
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
atoms. The maximum deviations do not exceed 1.3 Å. Clearly, the structure is not influenced by crystal contacts. The YhcH structure is composed of two antiparallel ß-sheets consisting of six ß-strands each (Fig. 1A). The ß-sheets form a sandwich of a jelly roll type. One of the ß-sheets is twisted by almost 180°, as measured between the ß-strands at the opposite ends of the sheet. This leaves the ß-sandwich open at one end, so that the overall shape of the ß-structure resembles a funnel.
|
Cupins represent one of the most functionally diverse protein superfamilies (16, 32). 2-Oxoglutarate and Fe2+-dependent dioxygenases constitute perhaps the largest group both in terms of the range of species and in terms of the substrates. Bacterial antibiotic synthases are the best-characterized members of this group. Another group of cupins, according to the SCOP database (41), includes germin-like seed storage proteins (62, 63), oxalate decarboxylase (3), phosphoglucose and phosphomannose isomerases (7, 11, 51), dTDP-4-dehydrorhamnose 3,5-epimerase RmlC (10, 14, 24), and dioxygenases acting on homogentisate (54), acireductone (46), and quercetin (21). These are the proteins that exhibit the greatest structural similarity to YhcH, as revealed by a DALI (27) search. Quercetin 2,3-dioxygenase (QDO) is ranked first, with the Z-score of 6.4. The r.m.s. deviation between the superimposed structures is 1.9 Å for 85 common C
atoms when the "catalytic" N-terminal domain of QDO is used. Curiously, the inactive metal-free C-terminal domain of QDO fits YhcH better, with an r.m.s. deviation of 1.6 Å for the same 85 C
atoms. The largest deviations between the structures occur in the two loops that are partially disordered in YhcH.
Oligomeric structure. Cupins typically form dimers that may further assemble into hexamers. The dimer consists either of two separate polypeptide chains or of topologically identical domains within a single polypeptide. A common theme in dimer formation is the incorporation of an N-terminal segment of one subunit in the ß-sheet of the other subunit. Such an arrangement yields a symmetrical dimer with an extensive interface. The active site of the cupin protein is located in the crevice between the ß-sheets. In the dimer, both active sites remain accessible, although in some proteins (e.g., RmlC) the N-terminal protrusion from the other subunit forms part of the substrate binding site (24). There have been no reports on the cooperativity of substrate binding in these oligomeric enzymes.
YhcH also exists in a dimeric form according to the equilibrium sedimentation data collected at 25°C and 4°C at pH 7.5. The crystal structure reveals two tightly associated dimers in the asymmetric part of the unit cell. The solvent-accessible area buried upon dimerization is over 2,000 Å2, which is one-quarter of the total surface area of the monomer. However, the association of monomers in the YhcH dimer differs from that in the typical cupin dimer. The interface is formed by ß-strands ß1 and ß9 at the narrow end of the ß-funnel and their symmetry-related equivalents in the other molecule (Fig. 1A). The twofold molecular symmetry yields a continuous ß-sandwich spanning the dimer. Besides the main chain hydrogen bonds between the ß-strands, there are a few other contacts that include residues of the loop following ß1. YhcH dimerization leaves the putative active sites accessible from the opposite ends of the dimer.
Metal binding. The YhcH molecule contains a cation binding site at the opening of the ß-funnel. The ion in the native crystal was identified as copper by using X-ray fluorescence spectroscopy. Scanning the crystal in the appropriate X-ray energy range revealed an absorption edge at 8,982 eV (1.3804 Å), which corresponds to the value for copper. The anomalous signal from the data collected at a peak wavelength of 1.3766 Å confirmed the presence of the Cu ion in the structure. Since copper was not added to the protein during purification and crystallization, this result suggests that copper is the physiological metal for YhcH.
The coordination of the Cu ion is different in the crystallographically independent molecules. In molecule B four residues (Glu63, His65, Asp70, and His130) and a water molecule are involved in the metal coordination (Fig. 1B). Both carboxylate groups are monodentate ligands so that the geometry can be described as a distorted square pyramid. The bond lengths are in the range from 1.9 to 2.2 Å for all ligands except Glu63, which is 2.7 Å from Cu. In molecule A the electron density at the solvent position is great enough to accommodate a four-atom molecule. The ion was modeled as an acetate ion because it was present at a high concentration in the crystallization solution. The Cu-O distances for the acetate are 2.1 and 2.9 Å. On the other hand, Glu63 in molecule A is farther away from the metal, so that the Cu geometry is close to tetrahedral (Fig. 1C). In molecules C and D, the electron density is not so well defined. It was modeled with the solvent position occupied by water and Glu63 oriented away from the Cu ion. The observed flexibility of Glu63 may have functional importance, as discussed below. It should be noted, however, that the difference in Cu coordination may reflect the effect of partial chelation by EDTA during protein purification. The structural differences between the four subunits are primarily restricted to the coordination sphere of the metal. There are no significant differences in the rest of the protein structure.
In proteins, copper has been observed in one of the two oxidation states, Cu+ or Cu2+ (28). While Cu+ is preferably complexed by cysteine and methionine residues, Cu2+ is ligated mostly by histidine, hydroxyl groups of serine, threonine, or tyrosine residues, and water. From this point of view, the likely species of the metal in YhcH is Cu2+.
Most cupins contain metal ions bound at the site observed in YhcH. Typically, two or three amino acid ligands (one or two histidines and a carboxylic acid) are located in a short stretch of the sequence that matches strands ß4 and ß5. Another ligand, which is invariably a histidine, may be separated by up to 150 residues in the sequence but spatially comes from a ß-strand next to ß4 (ß11 in YhcH). The wide diversity of metal ions and their coordination geometries in the cupins contribute to the variety of reactions catalyzed by these enzymes. Interestingly, QDO (21) is the only Cu-dependent enzyme in this structural superfamily.
Comparison of the metal-binding sites in QDO and YhcH revealed remarkable similarity between the two proteins. First, the geometry of the site is the same. The amino acid ligands of Cu2+ in QDO, His66, His68, Glu73, and His112, match the YhcH ligands Glu63, His65, Asp70, and His130, respectively. These ligands are associated with the same secondary structural elements in both proteins. Second, QDO is the only known protein with carboxylate ligation of a Cu ion. Therefore, YhcH is possibly the first example of double carboxylate ligation. Third, the alternate conformations of the glutamate ligand have been observed in both structures. In apo-QDO, the metal is predominantly bound in a tetrahedral geometry by three histidines and a water molecule (21). In complexes with substrates and substrate analogs, Cu2+ is pentacoordinated with Glu73 bound to both the metal and the substrate (50). In YhcH different coordination states are observed in one crystal. In the apo form represented by molecule B, Cu2+ is bound by all four protein groups and a water molecule. When an acetate ion replaces a water ligand, Glu63 leaves the coordination sphere of Cu2+. Thus, in both proteins the metal coordination is sensitive to the presence of an exogenous molecule, and the glutamate ligand follows this rearrangement, albeit in opposite ways.
Amino acid sequence analysis. A BLAST (2) search in combination with a PROSITE (4) search using Cu-coordinating residues as a template identified over 40 YhcH homologs. These homologs are widely represented in gamma-proteobacteria as well as in streptococci, clostridia, and Mollicutes. No homologs have been found in archaea and eukaryotes. The levels of amino acid identity in the family range from 88% (between Salmonella enterica serovar Typhi STY4129 and Klebsiella oxytoca YiaL) to 18% (between S. enterica serovar Typhi STY4129 and Mycoplasma pulmonis MYPU6600). Three groups of highly conserved residues can be identified from the sequence alignment (Fig. 2). One group includes Glu63, His65, Asp70, and His130 involved in Cu2+ coordination. Another group includes residues that are likely important for the stability of the three-dimensional structure. These residues are Gly37 preceding ß2, Gly77 in the loop between ß5 and ß6, Asp101 H bonded to the amino groups of the ß4-ß5 loop, and Pro126 in the ß10-ß11 loop. All of them are located in loops providing the necessary conformational flexibility (glycine) or rigidity (proline) at the sharp turns of the polypeptide chain. Asp101 stabilizes the reverse turn between ß4 and ß5 through hydrogen bonds to the main chain amino groups. The proper fold of this fragment is particularly important as it supports the conformation of the metal binding site.
|
There are three other strictly conserved residues (Glu79, Lys145, and Lys149) that are located close to the metal binding site and may therefore be functionally important. Together with Gln72 they form a network of H-bonded side chains that connects the Cu2+-bound carboxylate of Asp70 with the solvent-inaccessible carboxylate of Glu79 located deep in the active site cavity (Fig. 1B). Gln72 is replaced by a histidine in some members of the family, while it retains the ability to be part of the network. The buried position of Glu79 surrounded by hydrophobic residues implies its basic character and suggests that the network may function as a relay system.
Some bacteria possess several genes coding for the YhcH homologs. E. coli, for instance, has three such paralogs (YhcH, YiaL, and YjgK), and the levels of sequence identity between them are around 30%. Each of these three proteins belongs to a separate subfamily, the members of which are characterized by higher levels of sequence similarity to each other than to the proteins belonging to the other subfamilies. Thus, the entire family is usually referred to as YhcH/YiaL/YjgK. The three subfamilies must have the same fold but may differ in substrate specificity or regulation.
The YhcH crystal structure indicates three residues that may define the substrate specificity of the group of proteins from proteobacteria. Asn48, Met50, and Lys60 are located at the rim of the active site entrance (Fig. 1B) and may directly interact with a substrate bound close to the Cu ion. Their conservation in proteobacteria (Fig. 2) suggests a common substrate for this group of proteins (e.g., an amino sugar with a particular substituent). The same positions in the YjgK subfamily are occupied by Leu, Ser, and Arg, which are also highly conserved in the sequences. The lack of a conservation pattern in the YiaL subfamily may reflect broader substrate specificity among the members of this subfamily.
Genome context. In bacteria, metabolism of Sia can proceed by either of two routes; the molecule can be catabolized to GlcNAc and eventually enter glycolysis, or it can be used for sialylation of the surface lipopolysaccharide (57, 58). Besides these routes, pathogenic bacteria have developed a pathway for Sia biosynthesis from GlcNAc that includes GlcNAc phosphorylation and epimerization (siaA, neuC, or nnaA) and consecutive synthesis of Neu5Ac (siaC, neuB, or nnaB) and CMP-Neu5Ac (siaB, neuA, or nnaC), which is incorporated into the polysaccharide by a specific transferase (neuS or siaD) (19, 20, 23). The corresponding genes are part of an operon that is present in pathogens such as Campylobacter jejuni, Neisseria meningitidis, Fusobacterium nucleatum, and E. coli K1. However, the operon is missing from nonpathogenic strains of E. coli K-12 and H. influenzae KW20, suggesting that the gene products of the nan and nna operons have nonoverlapping functions despite the similar reactions catalyzed by the enzymes.
Analysis of the genome context that relies on characteristics such as conserved gene neighborhoods, phylogenetic patterns, and coexpression in microarray experiments may provide certain clues to the function of a "hypothetical" protein (22, 44). Complete genome sequences are available for all members of the YhcH/YjgK/YiaL family. Although the H. influenzae HI0227 gene itself does not belong to any apparent gene string, its homologs in many other organisms are part of the nan operon that encodes the enzymes of the Sia degradation pathway (33, 59). The three-dimensional structure of the protein suggests an isomerase (epimerase) function, as it is typical for cupins. However, an epimerase, which catalyzes the interconvertion of N-acetylmannosamine 6-phosphate and GlcNAc-6-phosphate, is encoded by the nanE gene. As an epimerase, YhcH may have different substrate specificities depending on the tolerance of the nanT and nanA gene products involved in the first steps of Sia uptake. Neu5Ac aldolase (NanA) is specific to Neu5Ac as the most ubiquitous Sia in host organisms. However, the original study using the E. coli deletion strains indicated that the nature of the C-5 amino substituent in Sia does not affect transport or degradation (60). The aldolases from C. perfringens and E. coli are capable of cleaving a range of neuraminic acid derivatives with different substituents at C-5, including formyl, succinyl, and glycolyl neuraminic acids (1, 48). Regarding the NanT permease, it has been established that many bacteria, both gram negative and gram positive, exhibit an active proton symporter-type mechanism (59). Since it is highly specific for Sias, the NanT transporter can bind a range of neuraminic acid derivatives. For instance, inhibition studies with Pasteurella hemolytica revealed that N-glycolylneuraminic acid, Neu5Ac methyl ester, and 2,3-dihydro-2-deoxy-Neu5Ac may be taken up by a common transport system (49).
The cupin superfamily includes dioxygenases, isomerases/epimerases, and sugar binding proteins lacking any enzymatic activity (16, 32). Given that the YhcH protein is encoded in the nan operons of several strictly anaerobic bacteria, such as C. perfringens and F. nucleatum (33, 59), it is very unlikely that it could function as a dioxygenase. We suggest that YhcH may be an epimerase specific to neuraminic acid derivatives other than Neu5Ac, so that its activity would be complementary to NanE. This would allow utilization by YhcH-encoding bacterial pathogens of alternatively substituted neuraminic acids, such as those found in blood (9). One possible candidate for the YhcH substrate is a hydroxylated form of Neu5Ac, N-glycolylneuraminic acid. This molecule is one of the two major Sias on the surfaces of most primate cell types (29, 39).
In Haemophilus ducreyi, the neu gene locus is part of a larger cluster that also includes rmlBACD genes responsible for the synthesis of L-rhamnose for incorporation into lipopolysaccharide (25). The rmlC gene product, dTDP-4-keto-6-deoxy-D-glucose 3,5-epimerase, catalyzes the third step of the pathway. This enzyme belongs to the cupin structural superfamily, although unlike most cupins, it is metal independent. Assuming that YhcH may also be involved in sugar processing, this structural similarity between YhcH and RmlC may be a case of protein fold accommodation for different but structurally similar substrates. Such cases, which often occur within a single pathway, have been observed for many functionally related proteins (55). For instance, in amino sugar metabolism, the gene products of neuA (Neu5Ac cytidylyltransferase) and glmU (GlcN-1P uridyltransferase) have a common fold (8, 38).
An alternative evolutionary path, adaptation of structurally unrelated proteins for the same biochemical activity, has also been documented. For sugar isomerization, there is the case of phosphoglucose isomerase (PGI), which is represented by two distinct protein families. In most organisms, the enzyme is a homodimer of 60- to 70-kDa subunits with an
ß
sandwich topology. The general acid-base catalysis by PGI is metal independent (47). PGI of the second type has been found in some Euryarchaeota species. This type forms dimers of 21-kDa subunits with the cupin fold and catalyzes Glc-6P isomerization in a metal (presumably Fe2+)-dependent manner (7). If YhcH is a sugar epimerase, this PGI may represent the closest analog in terms of the reaction mechanism.
| ACKNOWLEDGMENTS |
|---|
This work was supported by National Institutes of Health grant P01-GM57890. The use of the Advanced Photon Source was supported by the U.S. Department of Energy, Basic Energy Sciences, Office of Science, under contract W-31-109-Eng-38.
Certain commercial materials, instruments, and equipment are identified in this paper in order to specify the experimental procedure as completely as possible. In no case does such identification imply a recommendation or endorsement by the National Institute of Standards and Technology or the National Institutes of Health, nor does it imply that the materials, instruments, or equipment identified is necessarily the best available for the purpose.
| FOOTNOTES |
|---|
Present address: National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Md. ![]()
Present address: Department of Physical Science, Eastern Connecticut State University, Willimantic, Conn. ![]()
Present address: Centocor Inc., Radnor, Pa. ![]()
| REFERENCES |
|---|
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Appl. Environ. Microbiol. | Infect. Immun. | Eukaryot. Cell |
|---|---|---|
| Mol. Cell. Biol. | J. Virol. | Microbiol. Mol. Biol. Rev. |
| ALL ASM JOURNALS |