Previous Article | Next Article ![]()
Journal of Bacteriology, May 2004, p. 2810-2817, Vol. 186, No. 9
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.9.2810-2817.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
School of Veterinary Science,1 Department of Microbiology and Immunology, University of Melbourne, Victoria 3010,3 CSIRO Health Sciences and Nutrition and CRC for Diagnostics, Victoria 3052, Australia2
Received 13 December 2003/ Accepted 30 January 2004
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
About 204 prokaryotic viruses have been fully sequenced (National Center for Biotechnology Information [NCBI] database, December 2003), and many are related (for example, the dairy bacteriophages and the lambdoid phages). Comparison of related viruses is important as it can shed light on their evolution and structure-function relationships, but to adequately assess the breadth of virus diversity and the degree of genetic exchange between viruses, novel viruses from across the two prokaryotic domains need to be isolated and sequenced (13). While thousands of bacteriophages are known, it is unfortunate that only a small number of archaeal viruses have been isolated and few of these have had their genomes completely sequenced (20, 24, 33-35, 38). Most of these genome sequences are very different from each other (and from those of other viruses), but some are sufficiently close that comparisons can be made (e.g., viruses of Sulfolobus [36] and Methanobacterium [24]), and these have given some indication about how archaeal viruses have evolved.
The first haloarchaeal virus to be discovered, Hs1, was described in 1974 (41). Thirty years later, only about 15 haloviruses have been described, with most having head-tail morphologies and linear double-stranded DNA (dsDNA) genomes and being specific for Halobacterium salinarum (reviewed in references 12 and 48). Of these, the best studied at the molecular level include
H (see reference 37 and references therein),
Ch1 (20), and HF2 (38). There have also been some elegant studies of halovirus ecology (7, 43, 44), the presence of high concentrations of virus-like particles in natural hypersaline waters (14, 31), and restriction systems (8).
Haloviruses HF1 and HF2 were isolated from the same pond at the same time, and although they were initially thought to be distinct, they have subsequently been found to be closely related (30). They have identical head-tail morphologies, contractile tails, and similar protein profiles by sodium dodecyl sulfate-polyacrylamide gel electrophoresis, and their genomes showed a cross-hybridization of at least 80%. However, they differed in their sensitivity to inactivation by chloroform and have significantly different and nonoverlapping host ranges (29, 30). Unlike other haloviruses, HF1 has a very broad host range, including genetically tractable species such as Halobacterium salinarum and Haloferax volcanii. The genomes of both of these species are either complete (28) or near completion (C. Daniels, personal communication), and the combination of a fully sequenced halovirus with experimentally convenient hosts would provide a valuable system for the study of archaeal viruses and their genes. Here, we report the complete HF1 DNA sequence, an analysis of its predicted open reading frames (ORFs), a reappraisal of the previously published restriction map, and a comparison between the HF1 and HF2 genomes. This represents the first comparison of two related haloviruses for which both genome sequences are complete.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Halovirus HF1 DNA isolation.
A 50-µl sample of filtered (0.22-µm-pore-size filter) virus lysate (108 PFU/ml) was diluted 1/10 with water, and proteinase K and sodium dodecyl sulfate were added to final concentrations of 50 µg/ml and 0.1%, respectively. The sample was mixed, incubated at 37°C for 1 h, and then extracted once with phenol-chloroform-isoamyl alcohol (25:24:1, vol/vol) and a further one or two times with chloroform-isoamyl alcohol (24:1, vol/vol). After centrifugation to separate the phases, the aqueous layer was removed to a clean plastic tube. DNA was precipitated by adding sodium acetate (to 0.3 M) and ethanol (2 volumes) and incubating on ice for 15 min. The precipitate was collected by centrifugation (16,060 x g, 15 min, room temperature), washed with 70% ethanol (
1 ml), dried under vacuum, and redissolved in 50 µl of pure water.
Sequencing of HF1 genomic DNA. Short genomic fragments were amplified from HF1 DNA preparations (described above) by PCR (Hot-Star Taq kit; Qiagen, Hilden Germany) under high-stringency annealing conditions. Many of the primers originally designed for HF2 sequencing could be used to amplify and sequence HF1 DNA fragments. Where this was not possible, primer walking was performed with newly designed primers. The primer sequences are available upon request. The quality of each PCR fragment was first examined by agarose gel electrophoresis and, if it was satisfactory, sequencing reactions were performed with the purified DNA fragments as templates with specific primers. Automated dsDNA sequencing was carried out on an Applied Biosystems model 373A DNA sequencing system, using a PRISM terminator cycle sequencing kit, according to the manufacturer's instructions. The raw sequences were trimmed and assembled, and errors were corrected manually, using the program Sequencher version 3.01 (Gene Codes Corp.). More than 99% of the sequence was determined twice, independently. Only a few bases with clear-cut sequence chromatography were determined once.
Bioinformatics analysis. Sequence similarity comparisons (BLASTN, BLASTP, and BLASTX) were carried out against the NCBI nonredundant protein and nucleotide databases during December 2003 (http://www.ncbi.nlm.nih.gov/). Searching of the Clusters of Orthologous Groups databases of protein sequence families was performed with COGnitor, which is available at the NCBI website (http://www.ncbi.nlm.nih.gov/COG/). Analysis, ORF determination, and annotation of the HF1 genome sequence were with the software Sequin (NCBI website) and Glimmer2 (http://www.tigr.org/software/) (9). ORFs with fewer than 50 codons were included if they (i) overlapped with or were closely adjacent to flanking ORFs, (ii) showed a pI of less than 7, or (iii) showed predicted amino acid similarity to an HF2 or GenBank database sequence. HF1 ORFs were numbered from left to right, using the same genome orientation as described for HF2 (38).
Global and dot-plot alignments between HF1 ORFs and HF2 ORFs were performed with Lasergene software (DNAstar Inc.). The methods used included the Wilbur-Lipman method, the Martinez-NW method (26), and the Lipman-Pearson method (46). For tRNA searching, tRNAscan (23) (http://www.genetics.wustl.edu/eddy/tRNAscan-SE/) was used to scan sequences for tRNA sequences in the viral genome, and the Mfold program (27) (http://bioinfo.math.rpi.edu/
mfold/rna) was used to predict tRNA folding.
Protein secondary structure predictions, protein molecular weight determinations, amino acid composition determinations, and similar analyses were performed with DNA Strider (25), Lasergene, GeneticLab, or proteomic programs available at the ExPASy mirror site (http://au.expasy.org) of the Swiss Institute of Bioinformatics. Protein domain searches used the InterProScan program available at the ExPASy site.
Electron microscopy. Purified virus preparations were applied to Formvar-coated copper grids and allowed to adsorb for 15 min, and then the grids were blotted dry and treated with 0.1% glutaraldehyde for 5 min at room temperature. Negative staining was with 2% (wt/vol) uranyl acetate. Grids were examined on a Hitachi H300 electron microscope at machine magnifications of x20,300. Catalase crystals, negatively stained with 2% uranyl acetate, were used as an internal calibration standard (47).
Nucleotide sequence accession number. The complete HF1 genome sequence has been deposited in the GenBank database (accession number AY190604).
| RESULTS |
|---|
|
|
|---|
|
|
|
Detailed genomic comparison between HF1 and HF2 ORFs after kb 48. The significant differences between HF1 and HF2 occur between kb 48 and 73 (of HF1) and are detailed in Table 1. The predicted ORFs in this region vary in similarity from 30 to 100% (at both the nucleotide and amino acid sequence levels). The major differences were tightly clustered within the two MDRs, while ORFs adjacent to MDRs were highly conserved (Fig. 1). Only ORFs that differ significantly between the two viruses are described below.
(i) Unique ORFs. Homologues of HF2 ORFs 86 and 93 were not present in HF1. Both ORFs had no database matches, were relatively long (112 and 311 codons), and had predicted pIs of less that 4.5. No transcript corresponding to HF2 ORF 86 was detected by Northern blot hybridization (38), and the transcriptional direction of this ORF would be rightward, unlike that of most other late genes (Table 1). It is embedded in a larger ORF (HF2 ORF 87) on the complementary strand, and this larger ORF differs significantly in its nucleotide and predicted protein sequences from its HF1 homologue, ORF 86 (see below). The second unique ORF, HF2 ORF 93, was shown to be transcribed (38), and it is oriented in the same direction as other late genes.
(ii) HF1 ORFs with less than 70% amino acid identity to their HF2 homologues. The HF1 ORFs with less than 70% amino acid identity to their HF2 homologues included HF1 ORFs 85, 86, 87, and 92. They are all clustered within the two MDRs. Although these ORFs share significant protein sequence similarity (31.6 to 67%) to their corresponding HF2 ORFs and were certainly homologues, the number, even distribution, and range of mutations (Table 1) indicate that they have undergone extensive evolutionary change.
HF1 ORF 86 (33.1 kDa) showed only 29.4% amino acid similarity to its HF2 homologue, ORF 87 (29 kDa). There were 69 synonymous and 190 nonsynonymous nucleotide changes, as well as several insertions and deletions. Changes occurred evenly throughout the two sequences. The predicted proteins of these ORFs are rich in glycine residues, which occur in strings of four or five. They also differ significantly in size.
HF1 ORF 87 (52.9 kDa) is a large protein and corresponds to HF2 ORF 88 (61.8 kDa). They show 49.6% amino acid identity and also differ in size, but the first 70 amino acids (aa) are almost identical. After this, they rapidly diverge. The size difference is largely due to the HF1 ORF terminating earlier than the corresponding HF2 ORF. In addition, the nucleotide sequence shows numerous small repeat sequences within the HF1 ORF 87 that have a similar core but vary in length, i.e., CCX(A or G)CC, CCX(A or G)CCX(A or G)CC, or CCX(A or G)CCX(A or G)CCX(A or G)CC. The repetitions are in various reading frames, and the predicted protein does not contain highly repeated amino acids.
HF1 ORF 91 is larger than its corresponding HF2 ORF 92 because the latter ORF does not have the first 37 aa. The remaining sequence is almost the same as that of HF2 ORF 92. The nucleotide sequence upstream of HF2 ORF 92 was about 87% similar to HF1 over the region of the missing 37 aa. A close scrutiny of the nucleotide sequence showed that there was a frameshift (relative to the HF1 ORF) caused by an insertion of a few bases (at nt 55231 in HF2), producing a nonsense codon (TGA, nt 55246 to 55248) in the upstream sequence of HF2 ORF 92. No transcripts in this region were detected in HF2 (38).
HF1 ORF 92 (15.4 kDa) corresponds to part of HF2 ORF94 (34.3 kDa). A nonsense mutation terminates HF1 ORF 92 at about half the length of the HF2 ORF. However, a comparison of the surrounding nucleotide sequences indicates that the termination of HF1 ORF 92 was not due to a base change leading to a nonsense mutation but was due to a deletion (relative to HF2) that removed all of the rest of HF1 ORF 92 coding sequence (Table 1).
HF1 ORF 85 (14.0 kDa) corresponded to HF2 ORF 85 (14.5 kDa). They are similar in size but have an amino acid sequence similarity of only 47.5%, with the C-terminal half displaying a higher similarity than the N-terminal half. HF1 ORF 85 terminates eight codons earlier than HF2 ORF 85.
(iii) HF1 ORFs that are split in HF2. Two ORFs found in HF1 have corresponding HF2 ORFs that are split into two smaller ORFs (Table 1). The sequencing data for these regions of both viruses have been carefully checked. In the first case, one half of HF1 ORF 103 is homologous to HF2 ORF106 (98.6% amino acid identity), and the other half is homologous to the adjacent HF2 ORF 105 (99.1% amino acid identity). The other case is HF1 ORF 104, in which the first third is similar to HF2 ORF 108 (91.1%) and the remainder corresponds to HF2 ORF 107 (98.5%). In both cases, the two smaller ORFs in HF2 (corresponding to the larger HF1 ORF) are within 2 nt of each other and are likely to be transcribed together. The region of the HF2 genome corresponding to these HF1 ORFs was shown to be transcribed (38).
(iv) Predicted and observed restriction fragments of the HF1 genome. Restriction digest patterns of HF1 DNA have been published previously (30), and these were compared to restriction fragments predicted from the sequence, assuming a linear genome like that of HF2. Certain restriction digests clearly contained one additional fragment not predicted by the sequence (Fig. 2), and the terminal fragments were underrepresented. For example, DraI digestion produced all of the predicted fragments as well as a 6.2-kb band that was not predicted, and a HindIII digest was predicted to give 11 fragments but was observed to give 12, with the extra fragment being about 14 kb. The extra bands were discrete (not smeared), and their ethidium bromide staining intensities indicated a relative molarity that was about half of that of the other fragments in each digest (Fig. 2), excluding the terminal fragments. To check whether the stock of virus used in the present study had changed from that used earlier, we obtained a stock of HF1 that had been stored at 4°C since 1993 and sequenced the regions encompassing all 10 predicted HindIII sites. No changes were observed.
|
In sequencing the HF1 termini, terminal restriction fragments were first treated with T4 DNA polymerase (to blunt end any overhangs) before cloning, a process that can remove 3' overhangs and fill in 5' overhangs. Primer walking near the termini of HF1 DNA (using DNA polymerase) was also used, but this can check the length of only the 5' end and not that of the 3' end. We believe that the genome sequence of HF1 is complete, since (i) the 5' ends of (free) terminal repeats (TRs) have been sequenced, (ii) the sequence across TR borders (with internal genomic sequence) has been determined, and (iii) concatemer junctions derived from HF1-infected cells have been PCR amplified and sequenced (unpublished data). All give the same sequence, which is identical to that of the TR of HF2. However, our observations would be consistent with the presence of single-stranded cohesive termini, possibly 5' overhangs.
(v) Head size of virus particles.
Our original description of HF1 and HF2 (30) reported head diameters of 58 nm, but the genomes of these viruses are unusually large for isometric heads of this size and would give DNA densities far higher than those of structurally well-studied bacteriophages such as T7 and lambda (e.g., 0.45 g/cm3 for T7 [5]). The head size of HF1 was reexamined. Preparations of HF1 were studied by negative-stain electron microscopy (Fig. 3), and particle dimensions were measured from digitized photographic negatives. Catalase crystals were used as an internal calibration standard (47) (major lattice spacing of 8.75 nm). The head diameter was found to be 67.8 ± 3 nm, which is significantly larger than previously determined, and would give an estimated DNA density (0.51 g/cm3) close to those of bacteriophages such as lambda, Mu, and T7 and to that of halovirus
H. The tail length of HF1 (excluding the connector) was found to be 90 ± 2 nm.
|
| DISCUSSION |
|---|
|
|
|---|
Assuming a recent, common origin of the left ends of HF1 and HF2, then the extensive differences seen in the region from 48 kb to the right end speaks of a much longer history of evolutionary change. The most likely scenario for this lopsided pattern of divergence is that a recent recombination event has swapped most or all of the right end of one of the two viruses for that of a third, HF-like virus. More than one recombinatorial cross (within the right end) may have occurred before they were isolated in the laboratory, but the near identity of the left 48 kb favors a very recent divergence. Recombination events between viruses are common, both in nature and in the laboratory, and genetic crosses between lytic coliphages such as T4 were an important genetic tool in early molecular biology. Recent comparative sequence studies of natural T4-like viruses show high levels of recombination, sometimes of large genomic segments (10), although a natural recombination event of the same apparent magnitude as observed in this study is difficult to find in the literature. While homologous recombination in haloarchaea is well documented (6), experimental crosses between archaeal viruses have not yet been demonstrated.
In a recent study very relevant to this one, Pajunen et al. (32) proposed that phage T3 was probably the result of a natural recombinatorial cross between a T7-like phage and a yersiniophage. Like HF1 and HF2, these viruses have different hosts (and their genomes have direct TRs). The authors showed that such crosses were possible by crossing T7 and
YeO3-12, after first constructing an Escherichia coli strain that expressed Yersinia O antigens in order to infect it with both viruses simultaneously. Recombinants were mainly T7 derived and were all the result of double-crossover events (so maintaining the T7 TRs).
The two MDRs found in HF1 and HF2 appear to be recombinational hot spots that have undergone considerable evolutionary change. The high conservation of genes surrounding the MDRs may reflect functional constraints; i.e., these genes cannot easily be replaced by genes with equivalent function but different sequence. This would be consistent with the nature of the late region, as it probably encodes all of the structural proteins and these must assemble into multimeric components that fit together precisely and function as a whole virus particle. One structural protein that is, by contrast, highly variable in sequence is the tail fiber, which carries the cell adhesin domain enabling the virus to bind to specific host cells (16). HF1 and HF2 have different host ranges and show small differences in their major structural proteins (30) but the identity of the tail fiber adhesin has not been experimentally determined.
The only other pair of haloarchaeal viruses that have substantial similarity and for which sequence data are available are the temperate viruses
H and
Ch1 (20). The
Ch1 genome is complete, but only 60% of the
H genome has been determined. They share 50 to 95% nucleotide sequence identity over the regions for which both sequences are available, and their gene arrangements are very similar. Klein and colleagues (20) detected one inversion and a likely gene deletion. In summary, the two viruses have diverged substantially, yet they share extensive homology over much of their genomes. In addition, a 1982 study comparing the restriction fragments of
H and halovirus Hs1 DNAs indicated they share significant sequence similarity (40), indicating that this virus group may be common.
HF1 genome termini. Previous studies on halovirus HF2 showed conclusively that its dsDNA genome was packaged into virions as linear molecules with blunt-ended termini (29). These termini occurred at the left or right end of the 306-nt direct TR found at each end of the linear genome. From the type of termini, their presence in single copy between genomes in concatemeric DNA in infected cells, and sequence characteristics in and around the TR, it was speculated that the replication and packaging strategy may resemble that of the T7 group of bacteriophages (19, 29, 32). However, in at least a portion of HF1 DNA, the termini were bound to each other even though the TR and flanking sequences are identical to those of HF2. How the termini bind to each other has not been determined, but the join must be head to tail only, as restriction fragments arising from head-head or tail-tail fusions were not seen.
Although HF1 DNA may readily be able to circularize once injected into a host cell, the evidence we have so far indicates that HF1 is strictly lytic. No prophage state has been found, despite repeated attempts in this laboratory. Instead, both HF1 and HF2 can form unstable carrier states, with low levels of continuous virus production, but cells are not immune from superinfection and these laboratory cultures eventually collapse with complete lysis (unpublished data). HF1 and HF2 show a notable difference in the state of their DNA in infected cells. For HF1-infected cultures, Southern blot studies showed that TRs were not detected on free ends but were detected only as fragments where the TR was joined (at both ends) to viral genomic DNA (unpublished data), suggesting that the viral DNA is largely circular (or perhaps in very long concatemers). However, in HF2-infected cells, TRs at the ends of the linear viral genome (or concatemers) are readily detected, in addition to joined (concatemeric or circular) TRs (see Fig. 6 of reference 29). Concatemers have been directly observed in HF2-infected cells (29). Together, these results point to some difference in the processing of the termini of HF1 and HF2, a difference that could be either virus or cell specific.
Late gene organization and host range. The late regions of bacteriophages include genes for structural proteins, packaging DNA, and cell lysis. Commonly, the genes are closely spaced, sometimes overlapping, and are transcribed from the same strand. The gene order (synteny) is remarkably conserved across a wide variety of examples (2), so that once the location of one or more highly conserved genes within a late gene region of a newly sequenced genome is discovered, the functions of neighboring ORFs can be tentatively predicted. In the case of HF1, two highly conserved genes in the late region include the terminase (ORF 110) and COG3299 (ORF 89; Mu gp47-like family), and these are usually situated at the beginning (terminase) and about two-thirds (COG3299) of the length of the late region. The positions and orientations of these two "landmark" ORFs, and the close spacing of the ORFs between them, provide a tentative framework. For example, the portal protein ORF is usually large, and just downstream (in the transcriptional sense) of the terminase ORF, and HF1 ORF 109 is of an appropriate size and position for a portal protein. Major tail and head protein genes are usually next (and some of these structural proteins are known to vary between HF1 and HF2). A tail fiber gene(s) is usually not far downstream of the COG3299 gene, and the tape measure gene is usually just upstream of it.
The distinct host ranges of HF1 and HF2 probably reflect different cell adhesins carried on virus particles. By analogy to bacteriophages with head-tail morphology and contractile tails, their adhesins are likely to be tail fiber proteins. Well-preserved HF1 and HF2 particles show a compact baseplate and small (approximately 20- to 25-nm) filamentous structures extending from the edges of the baseplate (30). If these are the tail fibers, then calculations based on a straight fiber structure that is largely alpha-helical would indicate a protein size of at least 150 aa. The tail tape measure protein forms a helical fiber that determines the tail length of tailed phages. If an alpha-helical protein spanned the 90 nm, it would be at least 600 aa long (assuming 3.6 aa per turn and a pitch of 0.54 nm). There are only two proteins in the late region with sizes of 600 aa or more; one is 605 aa (ORF 95) and the other is 1,037 aa (ORF 102). Future studies to map all of the virus structural proteins to their corresponding genes will require the isolation and analysis of individual proteins from purified virus particles.
Relationship of HF1 to other haloviruses. HF1 and HF2 are clearly related to each other to the extent that they should be classified as members of the same virus group. At a morphological level they are remarkably similar to head-tail bacteriophages, and this is supported by the general features of their genomes, including the makeups and organizations of their genes. However, they share only a very distant relationship to other sequenced viruses and should be classified as a novel genus within the Myoviridae. Among other haloviruses being studied in this laboratory, they represent one of at least three broad morphological groups, the others being spindle-shaped haloviruses, such as His1 (1), and round viruses with an internal lipid layer, such as SH1 (12). The known diversity of haloviruses can be expected to increase with improvements in the ability to culture a wider variety of haloarchaea, particularly those species that are dominant members of their natural environment but have not yet been cultured, e.g., members of the SHOW group (square haloarchaea of Walsby) (12, 45).
| ACKNOWLEDGMENTS |
|---|
We thank Helen Camakaris for critical reading of the manuscript and C. Bath for technical assistance. Students of the 2001 environmental project practical class assisted with analyzing the early-passage HF1 culture.
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
Ch1: first complete nucleotide sequence and functional organization of a virus infecting a haloalkaliphilic archaeon. Mol. Microbiol. 45:851-863.[CrossRef][Medline]
M100 encodes the lytic enzyme responsible for autolysis of Methanothermobacter wolfeii. J. Bacteriol. 183:5788-5792.
M2. Mol. Microbiol. 30:233-244.[CrossRef][Medline]
Hmore than promoters. Syst. Appl. Microbiol. 16:591-596.
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Appl. Environ. Microbiol. | Infect. Immun. | Eukaryot. Cell |
|---|---|---|
| Mol. Cell. Biol. | J. Virol. | Microbiol. Mol. Biol. Rev. |
| ALL ASM JOURNALS |