Previous Article | Next Article ![]()
Journal of Bacteriology, June 2007, p. 4510-4519, Vol. 189, No. 12
0021-9193/07/$08.00+0 doi:10.1128/JB.01896-06
Copyright © 2007, American Society for Microbiology. All Rights Reserved.
,
*
M. Gaillard,1,
D. Flament,1*
K. Rouault,1,2
M. Le Romancer,1
D. Prieur,1 and
G. Erauso1
Laboratoire de Microbiologie des Environnements Extrêmes, IUEM, UBO, CNRS, IFREMER, UMR 6197, Technopôle Brest-Iroise, Place Nicolas Copernic, 29280 Plouzané, France,1 Laboratoire de Génétique Moléculaire et Génétique Épidémiologique, INSERM U613, Brest 29220, France2
Received 15 December 2006/ Accepted 12 April 2007
|
|
|---|
|
|
|---|
The vast majority of the hyperthermophilic viruses were isolated from the Crenarchaeota phylum and found to infect members of the genera Sulfolobus, Thermoproteus, Acidianus, and Pyrobaculum. The unusual morphological and genomic properties of these viruses have led to the definition of seven new families (Fuselloviridae, Lipothrixviridae, Rudiviridae, Guttaviridae, Globuloviridae, Bicaudaviridae, and Ampullaviridae) encompassing the 20 representatives identified to date (3, 4, 8, 22-25, 28, 34, 43, 56, 59). They all carry double-stranded DNA (dsDNA) genomes, either circular or linear ranging from 15 to 75 kbp in size. Sequence similarities between genes of the different crenarchaeal viral families are generally limited, and most predicted genes have homologues only in other members of the same family (46).
In the euryarchaeal phylum, which mainly includes extreme halophiles, methanogens, and hyperthermophilic sulfur reducers represented by the Thermococcales order, almost all the viruses described were isolated from mesophilic hosts. Most of the euryarchaeal viruses possess a linear dsDNA genome of 14.9 to 230 kbp (47, 51) with two exceptions: the A3 virus-like particle (VLP), isolated from Methanococcus voltae, has a circular dsDNA genome (23 kb) and can integrate into the host chromosome (like fuselloviruses) (61) and
Ch1 isolated from Natrialba magadii which strikingly contains RNA in addition to DNA and may also integrate into the host chromosome (60). The majority of the characterized viruses display the "classical" head and tail morphology and are distributed between the two well-known families of Myoviridae and Siphoviridae found mostly represented in the Bacteria domain. Four viruses, His1, His2, and SH1 isolated from Haloarcula hispanica and the particle A3 VLP, however, are exceptions. SH1 is a polyhedral virus, A3 VLP is oblate, and His1 and His2 are lemon-shaped viruses (6, 15, 42, 61). Direct electron microscopy observations of hypersaline waters have shown that lemon-shaped and round VLPs are not an exception but the predominant morphotypes in this particular environment, while head and tail particles are less common but represent the majority of reported haloviruses (15, 38).
Lemon-shaped viruses are widespread and have been isolated from a broad host range in both archaeal phyla. The best-studied lemon-shaped viruses were isolated from the hyperthermophilic genus Sulfolobus of the Crenarchaeota phylum. Sulfolobus spindle-shaped viruses all have a circular dsDNA genome of about 15 kbp with approximately 34 open reading frames (ORFs). Their genomes integrate into the host tRNA genes. These Fusellovirus ORFs show little or no similarity to genes in the public databases. In contrast, 18 ORFs are common to all and may represent the minimal set defining this viral group (59).
Only one VLP has been described so far from hyperthermophilic euryarchaeotes. PAV1, a lemon-shaped VLP (120 nm x 80 nm), was isolated from Pyrococcus abyssi strain GE23, a deep-sea isolate previously described in our laboratory (18). We found that host cells spontaneously release few PAV1 particles without lysis in the growth cycle with a maximum reached in stationary phase. PAV1 contains a circular dsDNA of 18 kb which is present at a high copy number and in a "plasmidic" form within the host cytoplasm (no chromosomal integration detected) (20).
The resemblance of PAV1 and His1 viruses to spindle-shaped virus SSV1 in morphology and genome size first led to the proposal that His1 and PAV1 be included in the Fuselloviridae family. The isolation and characteristics of a second spindle-shaped halovirus, His2, showed that His1 and His2 are distantly related to each other but are not related to members of the Fuselloviridae. Analysis showed that these viruses have a lytic life cycle; their linear genomes replicate by a protein-primed DNA synthesis and encode a DNA polymerase. All these features differ fundamentally from those of the fuselloviruses, which led to the classification of His1 and His2 into the genus Salterprovirus (6). How should PAV1 be classified? Here, we present the results of analysis of the complete double-stranded sequence of PAV1 genome with its transcriptional map. We show that the PAV1 genome notably shares little similarity with those of other archaeal viruses. We discuss the putative functions of proteins encoded by its genome, and finally we propose that PAV1 be the first member of a new virus genus or family.
|
|
|---|
To determine the cccDNA copy number of PAV1 virus, total DNA from exponentially growing cultures of the Pyrococcus abyssi GE23 host strain was cleaved successively with HindIII and NaeI. Appropriate dilutions of the digested DNA were run on a 0.8% agarose gel and transferred to a nylon membrane (Hybond N+; Amersham). Southern hybridization was performed with an equimolar mixture of two fluorescein-labeled probes, a 1.7-kbp HindIII fragment of the PAV1 genome and a 0.9-kbp NaeI cloned fragment of the 16S rRNA gene of strain GE23. The probes were labeled using ECF random-prime labeling kit (Amersham). Hybridization and detection were carried out following the ECF system procedure. Hybridization signals were recorded and quantified using a Typhoon scan imager (Amersham). PAV1 cccDNA copy number was calculated by measuring the average ratio of the PAV1-specific signal to the 16S rRNA gene signal corrected for the fragment length difference.
Detection of putative single-stranded DNA (ssDNA) intermediates of replication was carried out as previously described (17). The different forms present in the native PAV1 cccDNA preparation were separated by electrophoresis on 0.8% agarose gels containing 0.5 µg/ml of ethidium bromide. To detect ssDNA only, DNA in the gel was directly transferred in 10x SSC (1x SSC is 0.15 M NaCl plus 0.015 M sodium citrate) to a nylon membrane (Hybond N; Amersham), omitting the denaturation and neutralization steps of the standard protocol (49). Other gels were transferred under standard denaturing conditions. After transfer, DNA was immobilized on the membranes by UV cross-linking. RNA probes specific to each of the PAV1 DNA strands were prepared. For that purpose, a PCR product of 1.1 kb of the PAV1 genome was cloned into the pGEM-T vector (Promega); the cloning site is flanked by a T7 RNA polymerase promoter on one side and an SP6 RNA polymerase promoter on the other side. The plasmid linearized by cutting either with SalI (for the T7 promoter) or SphI (for the SP6 promoter) was used as a template for runoff transcription using either the T7 or SP6 RNA polymerase and the digoxigenin-labeled UTP ribonucleotide mix (Roche) for probe labeling. Hybridization and detection were carried out following the digoxigenin-labeled UTP procedure according to the manufacturer's instructions (Roche).
Determination of the nucleotide sequence. Purified PAV1 cccDNA was completely digested with HindIII and BamHI,, and all the fragments obtained were cloned in the corresponding sites of pUC28 to obtain an overlapping clone library of PAV1 genome. Sequencing reactions were carried out with the BigDye terminator kit (Applied Biosystems) and analyzed on an ABI PRISM 377 automatic sequencer (Applied Biosystems). Each insert was sequenced from both ends using the M13 forward and M13 reverse universal primers. Gaps in the sequence were filled by using specific primers either directly for sequencing on library clones or to sequence PCR amplicons obtained with PAV1 cccDNA as the template. The sequences were trimmed and assembled using the SeqMan II program (Lasergene, Inc., Madison, WI) with both strands completely sequenced and with a minimum threefold coverage.
Sequence analysis and annotation.
GLIMMER (14) and RBS finder (53) was used to find ORFs. Each ORF was submitted to sequence similarity searches (BLASTN, BLASTP, and BLASTX) against the NCBI nonredundant protein and nucleic acid databases (October 2006). ORFs were also analyzed with the library of hidden Markov model pfam version 14.0 with the HMMER package (16) and by the fold recognition method GenTHREADER (36). Membrane-spanning region in ORFs were predicted by the TMHMM program (30), and protein membrane topology was predicted using the TMpred program (26). Cumulative GC skew analysis was performed using GC Skew Tool (http://bioinformatics.upmc.edu/SKEW/index.html). It was calculated according to the following formula:
(G C)/(G + C), using a sliding window of 20 nucleotides (nt).
Analysis of the helical stability is computed using the nearest-neighbor thermodynamics algorithm (27) using the web-based program WEB-THERMODYN (http://wings.buffalo.edu/gsa/dna/dk/WEBTHERMODYN/).
Finally, the similarity between sequences was calculated using the GAP program (Wisconsin package version 10.3, accelrys).
RT-PCR. Total RNA isolation was prepared using the TRIzol reagent and the procedure described by Invitrogen from 100-ml batch cultures of P. abyssi strain GE23 arrested in the late exponential phase (3 x 108 cells ml1). RNA pellets (50 to 80 µg) were resuspended in 50 µl of 5 mM Tris-HCl (pH 8.0). The concentration and purity of the RNA were determined by measuring the absorbance at 260 and 280 nm, and the RNA quality was estimated by electrophoresis of an aliquot on a 1.2% agarose gel under denaturing conditions (48). Contaminant DNA was removed by treatment with RNase-free DNase (Promega) according to the manufacturer's instructions. After deproteinization by phenol-chloroform extraction, RNA in aqueous phase was ethanol precipitated, washed with 75% ethanol, air dried, and finally resuspended in 10 mM Tris-HCl (pH 8.0). Reverse transcription-PCR (RT-PCR) was performed by using the Superscript RT-PCR kit (Invitrogen). Aliquots of 1 µg RNA were used for each RT-PCR in a 50-µl reaction volume containing 0.5 U of RNase inhibitor (Promega) and the appropriate primers at a final concentration of 0.2 µM. The following program was run on a PCR machine: cDNA synthesis at 48°C for 25 min, predenaturation at 94°C for 3 min, and then 30 cycles consisting of denaturation at 94°C for 30 s, annealing at 48°C for 30 s, and elongation at 70°C for 1 min. This was followed by one final extension step at 70°C for 7 min. Negative controls were obtained by replacing the reverse transcriptase/Taq mixture by the same unit amount of Taq. The products were separated on a 1.5% agarose gel.
A list of primers used for mapping transcripts and agarose gels showing the results of RT-PCR assays are given in the supplementary material.
Isolation and identification of a major protein of PAV1. PAV1 VLPs were purified as previously described (20). P. abyssi strains GE23 and GE9 (closely related to GE23 but VLP-free) were used to distinguish residual cellular proteins from viral proteins. The two strains were centrifuged at 6°C for 15 min at 3,000 x g. The supernatant was filtered through Ascrodisc PF 0.8/0.2-µm filters (Pall Gelman Laboratory) and concentrated by another centrifugation at 6°C for 3 h at 100,000 x g. The pellet was resuspended in 50 µl of TE buffer (10 mM Tris-HCl, 0.1 mM EDTA [pH 8]). Equal volumes of sample and loading buffer (187.5 mM Tris-HCl [pH 6.8], 30% glycerol, 6% sodium dodecyl sulfate [SDS], 15% ß-mercaptoethanol, 0.15% bromophenol blue) were mixed in microtubes and boiled at 95°C for 4 min before loading on 15% acrylamide-SDS gels. Proteins in the sample were analyzed by the method of Sambrook et al. (48). Gels were stained by Coomassie blue. The major band was cut off, and the N-terminal region of the protein in the band was sequenced by the Edman technique using the facilities of the genomic platform at INRA, Jouy-en-Josas, France.
Nucleotide accession number. The complete PAV1 genome sequence was deposited in GenBank under accession number EF071488.
|
|
|---|
Sequence analysis in the six possible frames allowed us to identify 25 putative ORFs encoding at least 50 amino acids which cover 95% of the total sequence. The positions of the ORFs are indicated in Fig. 1, and their main features are listed in Table 1. The majority of the putative genes (22 ORFs) have the same orientation, and only three are present on the complementary strand. The average G+C content of the coding regions is 46%, which is similar to the overall G+C contents of the PAV1 genome (47.15%) and host genome (42.8%).
![]() View larger version (31K): [in a new window] |
FIG. 1. PAV1 genome map. Predicted genes are represented by thick arrows; light gray shading indicates ORFs with no similarity and no assigned function, and dark gray shading indicates either conserved hypothetical ORFs or ORFs with a hypothetical function; hatching indicates ORFs encoding a putative membrane-associated protein. Transcript locations mapped by using RT-PCR (T1 to T6) are shown by arrows inside the circular genome map. The approximate location of the origin (Ori) of replication as predicted by cumulative GC skew is also indicated. Protein motif or domain names: Lam G, laminin G; Leu zip, leucine zipper; P-loop, nucleoside triphosphate binding site or Walker motif A; wHTH, winged helix or winged helix-turn-helix.
|
|
View this table: [in a new window] |
TABLE 1. General features of the predicted genes (ORFs and operons) of PAV1 virus
|
Hypothetical origin of replication. In an attempt to identify the origin of replication, a cumulative GC skew analysis has been performed on the PAV1 genome (Fig. 2A). Although the diagram reveals a strand asymmetry between the leading and lagging strands, it does not allow us to identify a clear origin of replication. Yet, two inflections can be observed, indicating a change in base composition bias. The inflection between positions 16000 and 17386 bp corresponds to the end of the putative membrane protein operon (see below), the genes in this region are tightly packed, and no repeated domains could be detected. In contrast, the DNA sequence between positions 17386 and 200 bp contains the two largest intergenic regions, 140 bp and 111 bp, respectively. Furthermore, helical stability analysis indicates that positions 17987 to 1 correspond to the lowest helical stability region, suggesting that this region contains a DNA unwinding element (DUE). It has been shown that this element corresponds to the DNA sequence that first unwinds during the initiation of genome replication for different species as well as for certain viruses and phages (27). In particular, two DUEs have also been identified within the oriC region that corresponds to the origin of replication of P. abyssi (35). In addition, this DNA sequence portion also contains two large inverted repeats, as well as six copies of an irregular 12- to 16-bp repeat that has the ability to form a stem-loop structure (Fig. 2B).
![]() View larger version (17K): [in a new window] |
FIG. 2. Structure of the putative DNA replication origin region. (A) Cumulative GC skew (window size of 20 bp). (B) Identification of a DNA unwinding element (DUE) and nucleotide repeats (indicated by arrows). The longer inverted repeats are indicated by the thick black arrows.
|
Sequence similarities of predicted proteins from PAV1. Half of the ORFs (12 of 25) have been predicted to have transmembrane regions.
To infer hypothetical functions to the predicted proteins of PAV1, their amino acid sequences were compared to those in the public sequence databases (see Materials and Methods). Sixty-five percent of the predicted proteins had no significant similarity with any protein sequences stored in the public databases. Among them nine of the ORFs have been predicted to have membrane-spanning regions.
ORF 59 encodes a small protein of 59 amino acids. Multiple sequence alignments with a domain of 41 amino acids of ORF 59 with various members of the CopG family, as well as secondary structure prediction, suggest that the ORF 59 gene product has a ribbon-helix-helix (RHH) arrangement (E value of 3.6e05). The RHH domain proteins are known to be transcription regulators.
ORF 180a encodes a putative protein of 22.1 kDa that is 31% identical to ORF 181 of plasmid pRT1 from Pyrococcus sp. strain JT1 (E value of 4e17) (58). It was found that ORF 181 has a level of identity of approximately 50%, on a 35-amino-acid region at the C terminus, with ORF 80 of the Sulfolobus plasmid family pRN (32).
The overall organization of the gene products ORF 676 and ORF 678 is similar. The predicted peptides are composed of a typical signal peptide and two transmembrane segments at the carboxy-terminal regions. The prediction of the membrane topology suggests that both peptides are exposed to the surface of the enveloped virus and anchored to the membrane by the C-terminal transmembrane regions.
After two iterations of PSI-BLAST search, ORFs 676 and 678 show significant similaritiess with very large proteins that have not been assigned a clear function but seem to be involved in adhesion. This set mainly includes VCBS protein (IPR 010221) and laminin G-like jellyroll fold (LamGL) containing proteins (IPR 006558). This result is in accordance with the fact that ORF 676 and ORF 678 contain one and two occurrences, respectively, of the LamGL domain (Fig. 3). ORF 678 also presents similarities with ORF 175 of the bacteriophage S-PM2, a T4-type bacteriophage that infects the marine photosynthetic bacteria Synechococcus spp. (33). ORF 175 of S-PM2 also contains two occurrences of sequence similar to the LamGL domain. This domain is a member of the concanavalin-like lectin/glucanase superfamily. ORF 175 has been assigned a putative function in host recognition on the basis of both the primary structure of the protein and the genomic environment of the encoding gene (33).
![]() View larger version (49K): [in a new window] |
FIG. 3. ORF 676 and ORF 678 possess domains related to laminin G-like jellyroll fold. Amino acid sequence alignment of the internal repeats of ORFs 676 and 678 with various members of the concanavalin A/glucanase structural family (human serum amyloid P component [1sac_A], hypothetical protein from bacteriophage S-PM2 [gi 58532986], VCBS from Pelodyction luteolum [gi 78186255], hypothetical protein from Rhodopirellula baltica [gi 32471540], VCBS from Prosthecochloris vibrioformis [gi 71481241], and S-layer protein from Clostridium thermocellum [gi 67915998]) are shown. The numbers in brackets indicate the positions in the sequences. The secondary elements of the human serum amyloid P component are displayed above the alignment.
|
The putative gene product of ORF 375 presents a predicted ATP binding site of the canonical form (GX4GK[T/S]) and is thought to be an ATP/GTP binding protein.
ORF 153 had no sequence similarity to sequences in public databases; however, it displays a conspicuous domain organization and location. It is indeed located at the putative origin and contains a coiled-coil domain (positions 13 to 42) followed by a winged helix DNA binding domain (positions 86 to 148; Table 1). The alpha-helical representation of the N-terminal coiled-coil domain shows a clear distribution of hydrophobic residues (L, I, and V) in a leucine zipper region (positions 13 to 42). This domain organization might indicate that the leucine zipper region could pack into a coiled coil to form a homodimer and that the C-terminal region could have DNA binding properties via the winged helix DNA binding domain.
ORF 528 also contains a "winged helix" DNA binding domain (positions 352 to 414).
Polycistronic mRNA analysis and potential transcription signals. A transcription map was made using RT-PCR assays. Primers were designed on two colinear genes, so that cotranscribed ORFs could be amplified. This procedure was repeated step by step all along the genome, and the resulting transcript map is shown in Fig. 1.
By this approach, we found that all predicted genes of PAV1 were transcribed in six mRNAs whose approximate sizes (T1 [380 nt], T2 [2,530 nt], T3 [580 nt], T4 [440 nt], T5 [13,860 nt], and T6 [470 nt]) were estimated by the identification of potential transcriptional signals upstream and downstream of the putative transcription unit (see below) (Table 1). The largest one, T5, is a polycistronic messenger which covers about 75% of the entire genome and 16 ORFs. Most of these ORFs have been predicted to have membrane-spanning regions, suggesting that this large operon might encode polypeptides inserted into the membrane of the enveloped particle. T1 corresponds to ORF 59 which carries the RHH motif (CopG) and to ORF 52a located just downstream of the putative replication origin. T2 is composed of three ORFs: 87, 52b, and 528, which contains a "winged helix" DNA binding domain. T3 contains only ORF 180a. T4 and T6 are in the opposite direction compared to other transcripts. T4 overlaps ORFs 82 and 62. T6 mRNA corresponds to ORF 153, which is located at the putative origin and contains a leucine zipper motif and "winged helix" DNA binding domain. Sequences resembling the consensus promoter signal of Pyrococcus genes (A/G)AAAT(A/T)(A/T)(A/T)A were found in front of all transcripts, although the two putative promoters corresponding to the transcripts on the complementary strand (T4 and T6) loosely fit the consensus. Most of these putative promoters are centered 35 to 40 nt upstream of the start codon of the first gene in the operon, as previously observed for Pyrococcus genes. CT-rich sequence stretches, typical of archaeal terminators, were also found at the end of most of the transcripts (Table 1).
Experimental identification of the most abundant protein of PAV1. In a previous study, three major proteins (6, 13, and 36 kDa) were observed by SDS-polyacrylamide gel electrophoresis of purified VLPs (20). These proteins were visible by silver staining but in quantities too low to permit identification by matrix-assisted laser desorption ionization-time of flight analysis. Novel preparations were performed and have shown only the 13-kDa band after staining with Coomassie blue (not shown). The 13-kDa band was cut off the gel for microsequencing. Its N-terminal sequence (MMDALEDV) was found to correspond to part of the N-terminal region of the deduced amino acid sequence of ORF 121 of the PAV1 genome. The calculated size of ORF 121, 13.4 kDa, fits well with that of the major protein observed by SDS-polyacrylamide gel electrophoresis. However, the predicted peptide extends 26 amino acid residues upstream of the N terminus determined experimentally on the 13-kDa protein.
|
|
|---|
All together, the structure of the largest intergenic regions, the identification of a DUE, and the presence of an ORF similar to an ORF of a member of the CopG family downstream of the DUE supports the hypothesis that a replication origin is located between ORF 153 and ORF 59. Despite the presence of a CopG homologue, a copy number regulator frequently found in RC replicon, and the general trend of the GC skew graph, we showed that PAV1 replication most likely does not progress via the RC mechanism. Moreover, as most of the ORFs are oriented in the same direction, it is tempting to hypothesize that replication is probably unidirectional and proceeds by strand displacement or by a theta mode mechanism.
Half of the ORFs (12 of 25) have been predicted to have transmembrane regions. It is comparable to the content of membrane proteins predicted for the Sulfolobus shibatae SSV1 virus, where 11 out of the 32 ORFs are potentially membrane proteins (40). Strikingly, the genomic distributions of the putative membrane proteins are similar for the PAV1 and SSV1 genomes. In both genomes, most of the predicted genes encoding membrane proteins are clustered on one half of the genome. In the case of PAV1, all but two of the predicted genes are part of the larger transcript T5, including ORF 121, encoding the major VLP protein of 13 kDa, which is the first gene of the operon. Such organization suggests that the T5 transcript encodes all the structural proteins of PAV1.
As found in other sequenced genomes of archaeal viruses, a majority of the ORFs in the PAV1 genome do not have a known function. Only six of the predicted ORFs have been assigned a putative function.
A small protein (ORF 59) contains a domain similar to the RHH motif. This domain is found in CopG proteins (21). CopG proteins have been reported to be responsible for the regulation of RC plasmid copy number. It binds to the cop-rep promoter and controls synthesis of the plasmid replication initiator protein RepB (1). However, the search of sequence similarities failed to identify any ORFs homologous to the RepB protein in the genome of PAV1. The RHH domain was also found in Arc repressor from Salmonella bacteriophage P22 (12) and in the methionine repressor MetJ (52). A recent study showed that the most common gene products in crenarchaeal viruses are small proteins containing the RHH domain (47). The authors of this study wondered about the fact that no RHH domain proteins were detected in the available genomes of euryarchaeal viruses infecting mesophilic or moderately thermophilic hosts, suggesting that these small and compact proteins are particularly proficient for transcription regulation in hyperthermophiles. Our study seems to confirm this hypothesis, since PAV1 was isolated from a hyperthermophilic euryarchaeote.
The genome contains two other proteins with DNA binding domains, ORF 153 and ORF 528. These proteins display both a winged helix DNA binding domain. Many different proteins with diverse biological functions possess this domain, including hypothetical transcriptional factors, such as ORF 93 of the Fusellovirus SSV1 (29). In addition, the domain organization of ORF 153 is similar to the basic region-leucine zipper (b/Zip) family of eukaryotic transcription factors (57). Therefore, it is tempting to hypothesize that the products of these genes could be involved in transcription regulation.
A putative protein of 22.1 kDa (ORF 180a) is 31% identical to ORF 181 of plasmid pRT1 from Pyrococcus sp. strain JT (58). It has been shown that ORF 181 is 50% identical to ORF 80 of the Sulfolobus plasmid family pRN in a 35-amino-acid region at the C terminus. The ubiquity and high degree of sequence conservation of ORF 80 proteins in Sulfolobales plasmids suggest they have an important function, although their precise physiological role remain obscure (31, 32). Yet, it has been shown that these proteins are sequence-specific dsDNA binding proteins and that the basic C-terminal portions of these proteins are involved in DNA binding. The binding sites of ORF 80 members has been defined and consist of two palindromic sequences separated by 65 bp (31). Whereas the canonical TTAAN7TTAA motif was not identified upstream of the ORF 180a gene, it is worth noting that two 14-bp inverted repeats of TATAACCAAAATTG with about the same spacing (68 bp) were present in the region upstream of the gene (positions 2554 to 2567 and 2625 to 2638), suggesting that this domain on ORF 180a could participate in the binding of this protein to DNA to achieve structural or regulatory functions.
ORF 676 and ORF 678 contain one and two occurrences, respectively, of a LamGL domain that belongs to the structural superfamily of concanavalin A-like lectin/glucanase. In addition, they display similarities to large proteins likely involved in adhesion that also contain several occurrences of this LamGL domain.
The taxonomy report of the BLAST search displays a curious distribution that is similar to the species distribution of the LamGL domain, as already reported in the SMART database (accession number SM00560). Indeed, the significantly similar sequences of ORFs 676 and 678 are found in prokaryotes and metazoans. In addition, the vast majority of the prokaryotic species have been isolated from marine or aquatic environments.
The LamGL domain displays binding activity to complex carbohydrates, either in the context of storage and transport of carbohydrates (50), catalysis of glucans (37), or cell recognition and adhesion (13, 55). In particular, this domain is present, as a pair or in a single module, in numerous extracellular matrix proteins in eukaryotes, where it has been shown to be involved in interaction with extracellular sulfate ligands like heparin (54). In a comparative study of sulfated polysaccharides from marine angiosperms, Aquino et al. (2) suggest that a convergent adaptation due to environmental pressure may explain the occurrence of high concentrations of sulfated polysaccharides in many marine organisms, suggesting that the occurrence of sulfated polysaccharides seems to be an adaptation to marine life. It is thus tempting to hypothesize that the LamGL domain of ORFs 676 and 678 could also be involved in interaction with sulfated carbohydrates in the aquatic environment. In this context, the species distribution of sequences similar to ORFs 676 and 678 could reflect an adaptation of this domain to the binding of sulfated ligands that are abundant both in the marine environment and in the extracellular matrix of metazoans.
Up to now, very little is known concerning the composition and modification of the surface layer proteins from Thermococcales species; however, it has been reported that S-layer proteins of hyperthermophilic archaea have more charged residues than their mesophilic counterparts and that S-layer glycoproteins in archaeal halobacteria contain sulfated glucuronic acid residues (10). Altogether, this suggests that ORFs 676 and 678 are exposed to the surface of the enveloped virion and might be implicated in host recognition and attachment via the binding capacity of the LamGL domain to sulfated glycoproteins exposed to the surface of P. abyssi.
This hypothesis seems to be corroborated by the fact that ORF 678 presents similarities with ORF 175 of the bacteriophage S-PM2, which also contains two occurrences similar to the concanavalin A-like domain (33). The genome of S-PM2, a T4-type bacteriophage that infects marine photosynthetic bacteria Synechococcus spp., was analyzed. This analysis revealed that ORF 175 is part of a cluster potentially involved in recognition and attachment of virus to its host.
The adhesion activity of ORF 676 and ORF 678 and the composition of the S-layer of the host remain to be determined in order to better understand virus-host interactions at the cellular level.
Transcription analysis performed by a RT-PCR method suggests that all predicted genes are actually transcribed in five polycistronic mRNAs (T1 to T5) and a small monocystronic mRNA (T6). The longest transcript, T5, is remarkably long, as it covers about 75% of the genome. As generally observed in other archaeal viral genomes, the PAV1 promoters identified in the upstream sequence of each transcript resemble that of its host, P. abyssi, carrying a TATA-like box and a transcription factor B-responsive element. Most of the predicted PAV1 genes harbor a typical Shine-Dalgarno motif (in particular all the genes carried by the largest transcript T5) as previously observed in Pyrococcus genomes. However, at this point, several precautions must be taken concerning the interpretation of these results. The method we used (RT-PCR) did not allow estimation of the relative abundance of the different transcripts or detection of shorter transcripts that terminated before the predicted terminator signal or that started from internal promoters. Therefore, we cannot rule out the possibilities that additional transcripts may exist and that the longest version of T5 transcript represents only a minor fraction of the mRNA produced. Classical Northern experiments are commonly used to estimate both the size and relative abundance of major transcripts but are not reliable to detect transcripts of low abundance (7). Indeed, initial experiments to detect PAV1 transcripts by Northern analysis gave very weak and nonreproducible hybridization signals (not shown) probably because of the very low level of PAV1 transcripts under the culture conditions tested. This relative low abundance of PAV1 transcripts may be surprising when compared to the high copy number of "plasmidic" PAV1 DNA detected in the host cells (ca. 60 copies per host genome), but there is no indication that all these copies are actively transcribed at the same time.
Analysis of the protein composition of the PAV1 virion identified a major protein of 13 kDa. Its N-terminal sequence was compared with the whole viral sequence and found to correspond to ORF 121 of PAV1. The N-terminal sequence was not consistent with the theoretical initial methionine which has been located 26 residues upstream, suggesting either that the ORF 121 gene was not properly annotated or that the protein could be produced as a preprotein further processed after being transported to the membrane. In support of the latter hypothesis, ORF 121 is predicted to have two membrane-spanning regions. In addition, the length of the putative peptide signal and the presence of an alanine at the 1 position are consistent with the features of archaeal signal peptides (5). Furthermore, some viral proteins were shown to be N-terminally processed as in the methanophage
M2 (41). Such proteolytic cleavage has already been shown for head proteins (9). In conclusion, we assume that the ORF 121 gene product constitutes the main protein of the PAV1 VLP coat to which it is specifically sent and integrated after N-terminal processing. The ORF 121 protein shows no sequence similarity to VP1 or VP2, the coat proteins of fuselloviruses.
In spite of its lemon shape, which could assign PAV1 to the Fuselloviridae family, the genomic properties of PAV1 most likely suggest instead that it represents the first archaeal member of a novel virus genus or family. Indeed, the genome of PAV1 displays unique features at the nucleic and proteinic levels compared to the genomes of archaeal viruses indexed in the public databases. Thorough comparison with other archaeal genomes, in particular, with the lemon-shaped viruses His1 and His2 from Haloferax and spindle-shaped viruses from Sulfolobus did not reveal any sequence similarities to the PAV1 genome. The uniqueness of PAV1 is undoubtedly a consequence of its individual evolutionary history. PAV1 was isolated from a deep-sea hydrothermal vent, whereas fuselloviruses commonly inhabit acidic hot terrestrial springs. Consequently, the unique features of the PAV1 genome could be the result of its geographic isolation and an adaptation to the particular features of the hydrothermal environments. Indeed, given the extreme nature of hydrothermal environments, namely, high temperature, high hydrostatic pressure, and the specific microbial diversity (mainly methanogens, sulfato reducers, and sulfur reducers) in deep-sea vents, we hypothesized that there would be a severe barrier to gene flow from organisms, reflecting a long evolution of host and virus in a relatively closed gene pool.
Published ahead of print on 20 April 2007. ![]()
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
Claire Geslin and Mélusine Gaillard contributed equally to this work. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»