Previous Article | Next Article ![]()
Journal of Bacteriology, May 2009, p. 3068-3075, Vol. 191, No. 9
0021-9193/09/$08.00+0 doi:10.1128/JB.01838-08
Copyright © 2009, American Society for Microbiology. All Rights Reserved.
,
Department of Life Sciences, Ben Gurion University, Beersheva 84105, Israel
Received 31 December 2008/ Accepted 14 February 2009
|
|
|---|
|
|
|---|
Recently, an annotated version of the H. volcanii genome has become available (20; http://archaea.ucsc.edu/cgi-bin/hgGateway?db=haloVolc1). Accordingly, in an effort to identify novel components of the H. volcanii N-glycosylation process not identified through earlier homology-based searches, open reading frames (ORFs) found adjacent to those genes known to participate in this posttranslational modification were investigated. Such an approach had earlier served to identify aglG, which is situated between the oligosaccharide transferase-encoding aglB and the aglI genes and is a homologue of Campylobacter jejuni pglI, the product of which adds a glucose branch to the lipid-linked polysaccharide structure that is ultimately attached to protein targets in this bacterium (15). Encouraged by this finding, in the present study we have examined the region of the H. volcanii genome between aglJ (HVO_1517) and aglB (HVO_1530). This region also includes aglG (HVO_1529), aglI (HVO_1528), and aglF (HVO_1527). Analysis of the genome region between aglJ and aglF in the hope of identifying novel N-glycosylation genes is hampered by the fact that the automated annotation of the H. volcanii genome currently available inadvertently includes at least one error, which became evident only upon subsequent manual annotation efforts. In those studies, it was shown that aglE, encoding a protein involved in adding the fourth subunit of the pentasaccharide decorating the H. volcanii S-layer glycoprotein, corresponds to the 5' portion of HVO_1523 as well as most of the nonannotated region between HVO_1523 and HVO_1524 (3).
In the present report, computer-based approaches and visual inspection were employed to reannotate ORFs in the stretch of the genome between nucleotide 1382311, i.e., the start of aglJ, and nucleotide 1398898, i.e., the start of aglB (2), with the aim of identifying novel H. volcanii N-glycosylation pathway genes, including those not previously identified as a result of misannoation. Such reassessment of the H. volcanii agl gene cluster, followed by studies conducted at the RNA and protein levels, has served to reveal two novel gene sequences, annotated as aglQ and aglR, and to confirm that aglP is a true gene, in agreement with earlier results (12). The proximity and cotranscription of these genes with sequences known to be involved in N glycosylation point to aglP, aglQ, and aglR as also participating in this posttranslational modification.
|
|
|---|
RT-PCR. Reverse transcriptase PCR (RT-PCR) was performed as described previously (1). Briefly, specific forward and reverse oligonucleotide primers were designed for each H. volcanii gene under consideration (see Table S1 in the supplemental material). RNA isolation was carried out using an Easy-spin RNA extraction kit (Intron Biotechnology, Kyungki-Do, Korea) according to the manufacturer's instructions. The RNA concentration was determined spectrophotometrically. After contaminating DNA was eliminated with a DNAFree kit (Ambion, Austin, TX), single-stranded cDNA was prepared for each sequence from the corresponding RNA (2 µg) using random hexamers (150 ng) in a SuperScript III first-strand synthesis system for RT-PCR (Invitrogen, Carlsbad, CA). The cDNA was then used for PCR amplification, together with appropriate forward and reverse primer pairs. cDNA amplification was monitored by electrophoresis in 1% agarose gels. The sequences of the PCR products were determined to confirm their identity. In control experiments designed to exclude any contribution from contaminating DNA, PCR amplification was performed on total RNA prior to cDNA preparation.
Generation and detection of GFP fusion proteins. To generate constructs encoding the putative protein products of the ORF of interest fused to green fluorescent protein (GFP), DNA sequences corresponding to HVO_1518, HVO_1521, HVO_1522, HVO_1526, ORF 2, ORF 3, ORF 5, ORF 6, ORF 9, ManHv1, or ManHv2, together with the 200-bp region located directly preceding the 5' end of each sequence, were PCR amplified using forward and reverse primers (see Table S1 in the supplemental material) designed to introduce XbaI and BglII restriction sites at the 5' and 3' ends of the fragments, respectively. The amplified fragments were digested with XbaI and BglII, purified by electrophoresis in 1% agarose gels, and ligated into plasmid pJAM1020 (19), which was predigested with the appropriate restriction enzymes to yield plasmids encoding the various fusion proteins, with the promoter normally found in the plasmid being replaced by the 200-bp region preceding the 5' end of each ORF being considered.
To detect the expression of GFP fusion proteins, proteins were electrotransferred from sodium dodecyl sulfate-polyacrylamide gels to nitrocellulose membranes (0.45 µm; Schleicher & Schuell, Dassel, Germany) and incubated with anti-GFP antibodies (1:1,000; Roche) and then with horseradish peroxidase-conjugated goat anti-mouse antibodies (1:2,500; KPL, Gaithersburg, MD). Detection of antibody binding was achieved using ECL Western blotting detection reagent (Amersham, Bucks., United Kingdom).
Nucleotide sequence accession numbers. The sequences of H. volcanii aglP, aglQ, and aglR have been deposited in the EMBL/GenBank/DDBJ databases and assigned accession numbers FM955369, FM955370, and FM955371, respectively.
|
|
|---|
In several instances, as reflected in Fig. 1 and Table 1, the two algorithms identified identical ORFs. Specifically, the originally annotated HVO_1517, HVO_1522, HVO_1527, HVO_1528, HVO_1529, and HVO_1530 sequences correspond to the ORF 1, ORF 4, ORF 11, ORF 12, ORF 13, and ORF 14 sequences identified by ORF Finder, respectively. Of these sequences, HVO_1517 (ORF 1) corresponds to aglJ (M. Abu-Qarn et al., unpublished data); HVO_1527 (ORF 11), HVO_1528 (ORF 12), and HVO_1529 (ORF 13) correspond to aglF, aglI, and aglG, respectively (23). HVO_1530 (ORF 14) corresponds to aglB (2). In contrast, although each algorithm reported the presence of nine ORFs between nucleotides 1382311, i.e., the start of aglJ, and 1393000, i.e., the start of aglF, the two algorithms did not concur on the boundaries of the ORFs in this region, apart from the stretch between nucleotides 1385553 and 1386272, simultaneously annotated as HVO_1522 and ORF 4. In addition, HVO_1526 and ORF 10 can be distinguished by the presence of an additional ATG codon at the start of ORF 10 not found in HVO_1526, while HVO_1524 and ORF 7 overlap apart from an additional 51 nucleotides found at the start of ORF 7. Moreover, ORF 3 and ORF 8 differ by only 12 nucleotides, 11 of which do translate into differences at the deduced amino acid level. The remaining sequences of the original annotation, i.e., HVO_1518, HVO_1519, HVO_1520, HVO_1521, HVO_1523, and HVO_1525 are not repeated in the set comprising ORF 2, ORF 3, ORF 5, ORF 6, ORF 8, and ORF 9. Indeed, the two sets of ORFs can be further distinguished by the differential assignment of set members to the positive and negative strands of the genome (Table 1). Finally, ORF 6 has been shown to correspond to aglE (3).
![]() View larger version (11K): [in a new window] |
FIG. 1. Schematic depiction of the agl cluster of the annotated H. volcanii genome. Shown are annotations automatically generated by the Glimmer (http://archaea.ucsc.edu/cgi-bin/hgGateway?db=haloVolc1; listed as HVO numbers) or ORF Finder (listed as ORF numbers) algorithm, as well as by visual inspection. Functional annotations are reflected as agl gene names.
|
|
View this table: [in a new window] |
TABLE 1. ORFs recognized in the H. volcanii agl cluster
|
![]() View larger version (10K): [in a new window] |
FIG. 2. Schematic depiction of the transcripts generated by RT-PCR. +, sequences that generated transcripts; –, sequences that failed to generate transcripts. Primers raised against either the proposed 5' and 3' regions of the gene of interest or sequences spanning the 3' region of a sequence and the 5' region of the downstream sequence, designed to address cotranscription of the sequences of interest and listed in Table S1 in the supplemental material, were employed.
|
Visual inspection reveals novel ORFs in the agl cluster. As a PCR product was obtained when DNA spanning the end of HVO_1524 (a region shared by ORF 7) and the beginning of ORF 6 (algE) served as the template whereas no product was generated when either HVO_1524 or ORF 7 served as the PCR template, the possibility exists that the automated annotation algorithms described above failed to recognize the true start site of the gene erroneously annotated as HVO_1524 or ORF 7 within this section of the genome. Moreover, the finding that a PCR product only 131 bp long was generated when HVO_1521 served as the template raised the question of whether here too the automated annotation had correctly identified the start site of the ORF in question. Therefore, the region downstream of the predicted start site of HVO_1524 and ORF 7 (corresponding to largely overlapping sequences, both found on the reverse strand) and the region upstream of HVO_1521 (found on the forward strand) were manually analyzed to identify alternative potential in-frame start sites. To then confirm whether any of these potential start sites are indeed employed, RT-PCR was performed using appropriate primers.
To manually analyze that region of the H. volcanii genome automatically annotated as either HVO_1524 (predicted to begin at position 1389884) or ORF 7 (predicted to begin at position 1389935), RT-PCR was performed using a forward primer encompassing those nucleotides surrounding position 1389689, together with the same reverse primer used in unsuccessful attempts to amplify HVO_1524 and ORF 7. The use of this new primer pair now yielded a PCR product. Hence, the amplified sequence, spanning the region between positions 1388454 and 1389689 identified through visual inspection, was annotated as ManHv2 (Table 1).
The sequence automatically annotated as HVO_1521 is proposed to begin at position 1385335. When RT-PCR was performed with a forward primer that encompassed those nucleotides surrounding position 1385200 (containing the only other start codon in the region between HVO_1520 and HVO_1521 and which could yield an in-frame product) together with the same reverse primer as used in the earlier amplification of the shorter sequence, a PCR product was generated. Therefore, the region spanning positions 1385200 to 1385463, a 264-bp sequence putatively encoding an 88-residue polypeptide, also now identified through visual inspection, was annotated as ManHv1 (Table 1).
Bioinformatics analysis of deduced ORF protein products. As a next step toward identifying ORFs that encode N-glycosylation-related proteins in that portion of the agl gene cluster between aglJ and aglF, protein expression was considered. Initially, the deduced amino acid sequences of the products of HVO_1518-HVO_1526, ORFs 2 to 5, ORFs 7 to 10, and ManHv1 and -2 were analyzed. Apart from ORF 9, all of the ORFs under consideration can be translated into amino acid sequences expected for true proteins. In the case of ORF 9, 50% of the protein is deduced as corresponding to Ser and Thr residues, casting doubt as to whether this DNA sequence indeed represents a real gene.
The isoelectric points of the amino acid sequences deduced from the various ORFs under study were next considered. Haloarchaeal proteins are generally characterized by an enhanced acidic amino acid content, a property related to the ability of such proteins to remain properly folded in hypersaline surroundings (10, 13). In contrast to the acidic pI values predicted for many of the sequences examined (Table 2), the highly basic pI values of HVO_1524 (pH 9.0), ORF 3 (pH 10.2), ORF 7 (pH 8.7), ORF 8 (pH 9.9), and ORF 9 (12.1) raise the question of whether these sequences indeed encode true H. volcanii proteins.
|
View this table: [in a new window] |
TABLE 2. Bioinformatics analysis of nonannotated ORFs in the H. volcanii agl cluster
|
Confirmation of ORF identification at the protein expression level. While offering some information on the H. volcanii sequences under investigation, the various computer-based approaches enlisted and discussed in the previous section are not sufficient to determine whether a given ORF encodes a true protein or not. To this end, a more direct strategy designed to assess protein translation was adopted. In this approach, H. volcanii cells were transformed to express constructs in which the ORF under study and a region containing the preceding 200 bp upstream (possibly containing the promoter of that ORF) were fused in frame (at the 3' end of the ORF) to DNA encoding a version of GFP designed to work in the hypersaline environment of this haloarchaeon (19). Expression of fusion proteins containing the C-terminal GFP tag was then revealed by immunoblotting using anti-GFP antibodies.
When cells were transformed to express either GFP-tagged ORF 5 or ORF 6 (aglE), which overlap by 3 nucleotides, protein bands of 66 and 58 kDa, respectively, in addition to GFP-sized bands (26 kDa) and intermediate-sized breakdown products, were labeled by the anti-GFP antibodies (Fig. 3). As reported above, both of these H. volcanii ORFs generated transcripts, with the ORF 6 product having been previously shown to correspond to AglE, a known component of the H. volcanii N-glycosylation apparatus (3). A construct encoding a GFP-tagged version of HVO_1522 (ORF 4) also generated a protein band migrating at a slightly higher mass than the combined predicted molecular masses of the H. volcanii ORF product and the GFP moiety (approximately 24 and 26 kDa, respectively), in keeping with the detection of a transcript for this H. volcanii sequence (see above). In this case, no intermediate breakdown products or GFP-sized bands were detected. Indeed, a previous proteomics-based study had also reported that HVO_1522 (ORF 4) encodes a true protein (12). That study did not, however, consider either ORF 5 or ORF 6 (aglE).
![]() View larger version (44K): [in a new window] |
FIG. 3. The appearance of GFP-containing fusion proteins reveals genes that are expressed. As described in Materials and Methods, H. volcanii cells were transformed with plasmids carrying a sequence of interest fused at its 3' end to DNA encoding GFP. This construct was preceded by the 200-bp region immediately preceding the sequence of interest. In a control experiment, plasmid pJAM1020 (19), directing the expression of GFP alone, was employed. Expression of the GFP-containing chimeras was then assessed by immunoblot analysis using antibodies against GFP. The positions of molecular mass markers (100, 70, 55, 40, 35, and 25 kDa) are shown on the left.
|
Based on these studies, it can be concluded that the H. volcanii portions of the full-length, C-terminally GFP-tagged fusion proteins detected, i.e., HVO_1522 (ORF 4), ORF 5, and ORF 6 (aglE), indeed encode expressed proteins. An earlier proteomic study also confirmed the translation of HVO_1522 (ORF 4) (12). In the cases of ManHv1, ManHv2, ORF 2, ORF 3, and ORF 9, sequences for which RNA was transcribed, it is not clear whether only the GFP portion of each fusion protein was expressed or whether degradation of the expressed full-length chimera had occurred. Similarly, in the case of the transcribed HVO_1518, HVO_1521, and HVO_1526 (or the highly similar ORF 10) sequences, one cannot distinguish between failure of expression (possibly in response to the growth conditions employed) and instability of the translated polypeptides. Thus, one cannot discount the possibility that these latter two groups of genes also encode protein products.
In other halophilic archaea, agl gene homologues are also found in a gene island. In addition to that of H. volcanii, the genomes of four other halophilic archaea, namely, Halobacterium sp. strain NRC-1 (Halobacterium salinarum) (18), Haloarcula marismortui (4), Haloquadratum walsbyi (5), and Natronomonas pharaonis (9), are publicly available. The detection of agl gene homologues and the proven or likely presence of S-layer glycoproteins in these species suggest that they too perform N glycosylation. Accordingly, efforts were directed at addressing the distribution of H. volcanii agl gene homologues in these other haloarchaea. As schematically depicted in Fig. 4, gene islands that include homologues of H. volcanii agl genes, in addition to other ORFs annotated as encoding products involved in glycosylation, are found in other haloarchaeal genomes, with the homologue of H. volcanii aglB, encoding the sole component of the archaeal oligosaccharide transferase (1, 6, 11), serving as the island-defining component. In Halobacterium sp. strain NRC-1, the gene region between VNG1053G and VNG1069C includes homologues of H. volcanii aglB, aglE, aglF, aglG, aglI, and aglJ, as well as of H. volcanii HVO_1522 (ORF 4) and HVO_1524, ORF 7, or ManHv2. Like H. volcanii, Halobacterium sp. strain NRC-1 has also been experimentally confirmed as performing N glycosylation (8, 14). The Halobacterium sp. strain NRC-1 genome also includes VNG0318G, a homologue of H. volcanii aglD, the product of which is involved in adding the final subunit to the pentasaccharide decorating the H. volcanii S-layer glycoprotein. In both haloarchaea, aglD is found outside the agl cluster described here. The Haloarcula marismortui genome also contains a gene cluster that includes homologues of H. volcanii aglB, aglF, aglI, and aglJ, together with homologues of H. volcanii HVO_1522 (ORF 4) and HVO_1524, ORF 7, or ManHv2. The Haloquadratum walsbyi gene island that includes a homologue of H. volcanii aglB also includes homologues of H. volcanii aglE, aglF, aglG, and aglJ. The same region also includes other ORFs annotated as a glycosyltransferase and nucleoside diphosphate sugar epimerases. In contrast, the N. pharaonis homologue of H. volcanii aglB (i.e., NP3720) is quite distant from the cluster situated between NP4654A and NP4680A that includes homologues of H. volcanii aglF and aglJ, as well as other sequences annotated as UDP-glucose dehydrogenases, phosphohexomutases, and nucleoside diphosphate epimerases (not shown).
![]() View larger version (18K): [in a new window] |
FIG. 4. Schematic representation showing the presence of gene clusters containing H. volcanii agl gene homologues in other haloarchaea. Regions of the genomes of Halobacterium sp. strain NRC-1, Haloarcula marismortui, and Haloquadratum walsbyi that include H. volcanii agl gene homologues, homologues to other previously annotated sequences found within the H. volcanii agl gene-containing island, homologues to H. volcanii ORF sequences defined in this study, or homologues to other sequences automatically annotated as participating in protein glycosylation are shown (black arrows). The gray arrows in each gene cluster represent sequences automatically annotated as hypothetical proteins or proteins assigned functions apparently unrelated to protein glycosylation. The arrows representing each ORF do not reflect the relative length of a given sequence; instead, they are of arbitrary but equal length. The number above each arrow corresponds to the identification number of that ORF in the genome sequence of that haloarcheaon.
|
|
|
|---|
Of the other H. volcanii sequences considered in this study, no transcripts were detected in the cases of HVO_1519, HVO_1520, HVO_1524, HVO_1525, and ORF 7. The reannotation efforts of the present study explain, in some instances, why this was the case. HVO_1519 was reannotated to encompass ORFs 2 and 3, while HVO_1525 was reannotated to encompass ORFs 8 and 9. All of these novel ORFs were transcribed. Visual inspection revealed that HVO_1524 and ORF 7 in fact contain ManHv2, which was transcribed. No GFP-ManHv2 protein product was detected, however. Similarly, despite the reannotation of HVO_1521 as the longer ManHv1 (both of which were transcribed), no GFP-tagged ManHv1 protein could be generated. Moreover, no protein products were translated from HVO_1518, HVO_1521, ORF 2, ORF 3 (or the highly similar ORF 8), ORF 9, or HVO_1526, despite the observed transcription of these sequences. It is thus possible that appropriate conditions for the transcription and/or expression of at least some of the sequences listed above were not realized in the present study. Therefore, gene products for some of the sequences could appear under suitable growth conditions. Indeed, differential expression of proteins involved in mediating H. volcanii N glycosylation could be related to the proposed adaptive role of this posttranslational modification in this organism (1, 23). Of course, the possibility remains that the set of sequences for which no transcription was obtained also includes ORFs that do not correspond to true genes.
Within the H. volcanii agl gene cluster, several sequences are cotranscribed. In addition to a previous report showing aglF and aglI to be simultaneously transcribed (22), the present study reveals cotranscription of at least two additional pairs of genes. A single transcript encoding both HVO_1522 (ORF 4) and ORF 5 was observed, while in the case of aglE and ManHv2, a single transcript corresponding to regions extending some 100 bp into each partner sequence, together with the stretch linking these sequences, could be generated. Hence, given their proximities to genes known to participate in N glycosylation, BLAST-predicted glycosylation-related roles in the case of HVO_122 (ORF 4) and ManHv2 (Table 2), and, in the case of ManHv2, cotranscription with aglE, a known N-glycosylation component pathway (3), it is reasonable to assume that HVO_1522 (ORF 4), ORF 5, and ManHv2 also participate in protein N glycosylation. Accordingly, HVO_1522 (ORF 4) is now renamed aglP, ORF 5 is now renamed aglQ, and ManHv2 is now renamed aglR (Fig. 5). In keeping with this reannotation, mass spectrometry studies have verified the participation of AglP and AglQ in S-layer glycoprotein N glycosylation (S. Yurist-Doutsch et al., unpublished data).
|
View larger version (9K): [in a new window] |
FIG. 5. Revised version of the H. volcanii agl gene cluster. Numbers along the horizontal bar denote nucleotide positions in the genome. Proven or putative N-glycosylation genes are termed agl (archaeal glycosylation) sequences. For genes not assigned roles in N glycosylation, proposed functions are given below each sequence.
|
In conclusion, genes involved in most of the steps of H. volcanii N glycosylation have now been identified. Future efforts can now determine the precise roles of products of these genes, including the newly delineated sequences reported here, in H. volcanii N glycosylation.
J.E. is supported by grants from the Israel Science Foundation (grant 30/07) and the U.S. Air Force Office for Scientific Research (grant FA9550-07-10057). S.Y.-D. is the recipient of a Negev-Faran Associates Scholarship.
Published ahead of print on 27 February 2009. ![]()
Supplemental material for this article is available at http://jb.asm.org/. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»