Previous Article | Next Article ![]()
Journal of Bacteriology, November 2006, p. 7922-7931, Vol. 188, No. 22
0021-9193/06/$08.00+0 doi:10.1128/JB.00810-06
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
,
Center of Marine Biotechnology, University of Maryland Biotechnology Institute, Columbus Center, Suite 236, 701 E. Pratt St., Baltimore, Maryland 21202,1 Microbial Genomics, DOE Joint Genome Institute, 2800 Mitchell Drive, B400, Walnut Creek, California 94598,2 DOE Joint Genome Institute, Los Alamos National Laboratory, Los Alamos, New Mexico 87545,3 University of Illinois, Department of Microbiology, B103 Chemical and Life Sciences Laboratory, 601 S. Goodwin Avenue, Urbana, Illinois 618014
Received 7 June 2006/ Accepted 5 September 2006
|
|
|---|
|
|
|---|
The genus Methanosarcina includes the most metabolically diverse species of methanogenic archaea. Whereas most methanogenic species grow by obligate CO2 reduction with H2, methyl reduction with H2, aceticlastic fermentation of acetate, or methylotrophic catabolism of methanol, methylated amines, and dimethylsulfide, most Methanosarcina spp. can grow by all four catabolic pathways (49). Methanosarcina acetivorans was recently reported to grow also nonmethanogenically with CO (35). In addition to their appetency for all known methanogenic substrates, most Methanosarcina spp. can grow in a minimal mineral medium and fix molecular nitrogen (6, 26). They can adapt to intracellular solute concentrations ranging from freshwater to three times that found in seawater (38) by osmoregulatory mechanisms that enable them to synthesize or accumulate osmoprotectants and modify their outer cell envelope (41). This metabolic diversity is reflected in the relatively large genome sizes of Methanosarcina acetivorans (5.8 Mb) and Methanosarcina mazei (4.1 Mb) and the relatively large number of putative coding sequences, 4,524 and 3,371, respectively, compared with those of other methanogenic archaea (13, 17). The adaptive success of these species is further evidenced by the occurrence of multiple paralogs in the genomes, including multiple catabolic methyltransferases and carbon monoxide dehydrogenases, all three known types of nitrogenases, and all four known chaperoning systems (8, 13, 17).
Methanosarcina barkeri Fusaro was isolated from sediment from Lago del Fusaro, a freshwater coastal lagoon west of Naples, Italy (22). In contrast, M. acetivorans was isolated from marine sediments (40) and M. mazei was isolated from sewage sludge (12). M. barkeri utilizes all four methanogenic pathways described above and exhibits a dichotomous morphology. When grown on freshwater medium, this species grows as large multicellular aggregates embedded in a heteropolysaccharide matrix (Fig. 1A) composed primarily of D-galactosamine and D-glucuronic acid, termed methanochondroitin (24), whereas in marine medium these species grow as individual cells surrounded only by a protein cell surface layer (S layer) (38). This isolate has been one of the methanosarcinal strains most frequently studied for the physiology, biochemistry, and bioenergetics of methanogenesis (39). The development of a tractable methanosarcinal gene transfer system has led to a number of recent reports on the mechanisms of methanogenesis using genetic approaches (36).
![]() View larger version (109K): [in a new window] |
FIG. 1. Thin-section electron micrographs of M. barkeri Fusaro. Cells cultured in low-saline medium (A) grow as multicellular aggregates embedded in a methanochondroitin matrix (mc). Cells cultured in marine medium (B) grow as single cells without the methanochondroitin outer layer. When grown with hydrogen, gas vesicles (gv) are observed in some cells. Bar, 1.0 µm.
|
|
|
|---|
Genome sequencing, assembly, and finishing. Genomic DNA was isolated from M. barkeri Fusaro as described previously (5). The genome of M. barkeri was sequenced at the Joint Genome Institute (JGI) by using a combination of 3-kb, 8-kb, and 40-kb (fosmid) DNA libraries. All general aspects of library construction and sequencing performed at the JGI can be found at http://www.jgi.doe.gov/. Draft assemblies were based on 89,216 total reads. All three libraries provided 13-fold sequence coverage of the genome. The Phred/Phrap/Consed software package (http://www.phrap.com) was used for sequence assembly and quality assessment (15, 16, 19). After the shotgun stage, reads were assembled with parallel Phrap (High Performance Software, LLC). Possible misassemblies were corrected with Dupfinisher (21) or transposon bombing of bridging clones (EZ-Tn5 <R6Kyori/KAN-2>Tnp transposome kit; Epicenter Biotechnologies, Madison, WI). Gaps between contigs were closed by editing in Consed, custom primer walk, or PCR amplification (Roche Applied Science, Indianapolis, IN). A total of 2,389 additional reactions were necessary to close gaps and to raise the quality of the finished sequence. The completed genome sequences of M. barkeri contain 85,812 reads, achieving an average of 12-fold sequence coverage per base, with an error rate of less than 1 in 100,000. The sequences of M. barkeri include a chromosome and plasmid and can be accessed by GenBank accession numbers CP000099 and CP000098, respectively, or from the JGI IMG site (http://img.jgi.doe.gov) as taxon identification no. 623520000.
Annotation and analysis. Genes were predicted with a combination of GLIMMER and CRITICA (2, 11). These gene predictions were then run through a pipeline that identifies gene overlaps, missed genes, and incorrect start sites (29). The gene predictions were then manually curated. Functional predictions were generated automatically based on the presence of hits to COG (47), Pfam (3), and Interpro (32) families.
Whole-genome alignment and analysis. Chromosome sequences (Table 1) in FASTA format were used to build single-sequence BLAST databases, which served as the subject sequences for comprehensive WuBlast (W. Gish, 1996-2004, http://BLAST.wustl.edu), BLASTN, and TBLASTX paired comparisons both as whole sequences and as segmented comparisons using the following parameters: span2, noseqs, filter = none, hspmax = 10,000, gspmax = 10,000. Similarly, all coding sequence features were built into databases and a BLAST search was done to generate a comprehensive set of pairwise comparisons. BLASTN outputs were captured into a database of high-scoring segment pair (HSP) features cross-referenced to a sequence and sequence feature database. Outputs were also directly parsed by Cross (D. Maeder, 1998-2006, http://bigm.umbi.umd.edu/materials/software/Cross.pub/) for display and interactive examination of comparative features.
|
View this table: [in a new window] |
TABLE 1. Comparison of genome features among Methanosarcina spp.
|
Cumulative skew analysis was performed using skew (D. Maeder, 2001, http://bigm.umbi.umd.edu/materials/software/skew/), which implements the algorithm of Grigoriev (20). Repeat analysis emerged directly from unfiltered BLAST and was confirmed using MUMmer (11). Putative origins of replication were explored by examining regions with locally separated inverted repeats in close upstream proximity to the orc1 and cdc6 genes.
Chromosomal sequence similarity was calculated as a distance derived from BLASTN comparisons in the GRIT database by using Perl script cross match.pl, which generates distance matrices in MEGA2 format based on the following equation:
, where n is the length of the genome and HSP.ID is the maximal fractional identity at position n of sequence x and where HSP.ID exceeds a threshold of, e.g., 0.67. The mean distance, D, is calculated independently for each axis. This measure of distance is comparable with hybridization techniques, as it yields a fractional nucleotide similarity between organisms that considers stringency.
Synteny of any gene was measured by comparing the order of the gene's left and right neighbors with those of their best matched homologous genes in the comparable genome. Downstream synteny (SI) is expressed as the ratio of the ordinal distance between a gene, G, and its downstream neighbor, R (which is always 1), and the distance between a corresponding orthologous gene, G', and the ortholog of R, R'. This may be calculated as follows:
, with 0 < SI
1. Cumulative deviations from the mean of SI were calculated for intelligible display. Intergenic interval was calculated in the same manner.
Microscopy. For thin-section electron micrographs, cells were fixed with 2% glutaraldehyde and 2% osmium tetroxide and dehydrated in a graded series of ethanol mixtures. Cells were embedded and sectioned in Epon resin and then poststained with uranyl acetate and lead citrate as described previously (42). A Joel JEM-1200 EX II transmission electron microscope at 80 kV was used to generate thin-section micrographs.
|
|
|---|
Methanosarcina barkeri chromosome structure and content. A total of 3,680 putative protein-coding genes longer than 200 bp, which together cover 70% of the genome, were identified (Table 1). The average protein-coding region of M. barkeri, at 921 bp, is within 2% of M. acetivorans and M. mazei coding regions, while its average intergenic region, at 393 bp, is considerably larger than those of M. acetivorans (328 bp) and M. mazei (303 bp). A further 71 RNA features were identified, including three sets of ribosomal RNAs (5S, 16S, and 23S) and 62 tRNAs covering all amino acids and pyrrolysine, which is encoded by the UAG codon in methylamine methyltransferase genes. One thousand seven hundred eighty hypothetical protein open reading frames (ORFs) accounted for nearly half of all protein features, with 1,837 putative functional protein assignments based on similarity to identified protein sequences in public databases. Of hypothetical protein genes conserved at the 80% nucleotide level, 289 were shared with M. acetivorans and 249 with M. mazei, of which 105 were common to both and should be considered highly conserved unidentified genes.
Gene annotation. There were 128 ORFs with sequence identities greater than 67% to genes in the NCBI sequence database but without sequence identity to other methanosarcinal genomes (http://bigm.umbi.umd.edu/materials/Methanosarcina/) (also see the supplemental material). Some of these features are highlighted below.
The M. barkeri genome included the full complement of genes encoding enzymes in the CO2 and methyl reduction with H2, methylotrophic, and aceticlastic pathways (13, 17). In addition to these, a complete formate dehydrogenase operon (MbarA 1561 to 1562), fdhAB, with high sequence identity to catabolic formate dehydrogenase from several formate-utilizing methanogens, was detected. Methanosarcina spp. have never been reported to utilize formate for growth, and fdhAB has not been detected previously in this genus (7). Attempts to grow M. barkeri on 50 mM formate in this study were unsuccessful, and the addition of sodium formate to cultures containing trimethylamine or hydrogen did not enhance growth, which suggests that either the operon is not expressed under the conditions tested or it does not have a catabolic role. M. barkeri lacks genes encoding a two-subunit nucleoside diphosphate-forming acetyl-coenzyme A (CoA) synthetase (acdAB) that is found in M. acetivorans (MA3168 and MA3602) and M. mazei (MM0358 and MM0493) but has a remnant of this enzyme, pseudogene MbarA 3662. The sequence adjacent to the 5' end includes the same order of gene orthologs found in M. acetivorans and M. mazei, but the 3' end is adjacent to a sequence inversion, which further suggests that it is a truncated acdA sequence. This enzyme catalyzes one of two pathways for generating acetyl-CoA; the other is the CO dehydrogenase/acetyl-coenzyme A synthase that catalyzes aceticlastic catabolism in Methanosarcina spp. There are no ORFs encoding nucleoside diphosphate-forming acetyl-CoA synthetases close to characterized acetyl-CoA synthetases, but there are nucleoside monophosphate-forming acyl-CoA synthetases with unknown functions (MbarA 267, 2172, and 2821) that could potentially function as acetyl-CoA synthetases. Alternatively, it is also possible that the CO dehydrogenase/acetyl-coenzyme A synthase fulfills the function of both enzymes in M. barkeri.
Among genes encoding biosynthetic functions, a group of 14 sequential ORFs encode predicted gas vesicles with highest identity to gvpANOFGJKLM (MbarA 326 to 339) in the haloarchaea, which includes the minimal gene set for expression of vesicle in Haloferax volcanii (34). Although there are no prior reports of gas vesicles in M. barkeri Fusaro, gas vesicles have been reported in another strain of M. barkeri, FR-1, and in Methanosarcina vacuolata, which has a DNA-DNA reassociation value of 61% with the type strain of M. barkeri (1, 51, 52). Interestingly, M. barkeri has three sequential copies of gvpA that encode the ribs of the vesicle wall and influence the strength and width of the vesicles (4). The 33.5-kb region that includes the gvp operon may have been acquired from vesicle-synthesizing strains, as it is flanked by transposons. Gas vesicles have been proposed to be an early organelle of prokaryote motility, and they are often regulated by light and oxygen partial pressure (45, 48). In contrast to the other methanosarcinal strains that express gas vesicles with methanol and acetate, M. barkeri gas vesicles were observed only in cells grown with H2-CO2 in liquid and on solidified medium (Fig. 1B), which suggests that they might be expressed as part of a chemotactic mechanism in response to hydrogen gradients. M. barkeri possesses a full complement of chemotaxis genes, but unlike M. acetivorans and M. mazei it has only a single copy of the chemotaxis genes (with the exception of cheY) instead of two and lacks a cheC homolog. The functional role of these chemotaxis genes in Methanosarcina spp. is currently unknown, and additional types of motility, such as flagellar motility, have not been observed for these species. Osmoregulatory genes detected in the M. barkeri genome, including kefC (MbarA 671) for potassium uptake at low solute concentrations and ablAB (MbarA 669 to 670) for N
-acetyl-ß-lysine at high solute concentrations, indicate that this strain adapts to changes in extracellular solute concentrations (43) by mechanisms similar to those for other methanosarcinal species. Interestingly, M. barkeri also has ORFs (MbarA 22 to 23) with high identity to two enzymes required for N-acetylmuramic acid synthesis, which is unique among the sequenced archaea. Prior analysis of the cell wall composition of M. barkeri Fusaro failed to detect muramic acid (22). In addition, the ORFs and flanking ORFs MbarA 20 to 21 and MbarA 24 to 26 encode proteins with high sequence identity to the proteobacteria, which suggests that this DNA fragment was acquired by lateral gene transfer.
Another unique feature of the M. barkeri genome is the detection of a putative operon encoding a bacterial P450-specific ferredoxin reductase (MbarA 1947 to 1945). The family of heme protein monooxygenases known as cytochrome P450 plays a critical role in the synthesis and degradation of many xenobiotics and physiologically important compounds (37, 50). All known P450s are multicenter enzymes consisting of a heme, or P450, component with associated reductase components. The gene encoding the putative cytochrome P450 in M. barkeri is flanked immediately upstream by genes encoding a ferredoxin and ferredoxin reductase, which is typical of bacterial class I three-component systems. For catalytic activity, cytochrome P450 must be associated with the electron donor partner proteins ferredoxin/ferredoxin reductase complex (46). Cytochrome P450 has not been detected previously in the archaea. All three predicted proteins encoded by the putative operon have 54 to 62% sequence identity with cytochrome P450 from Myxococcus spp., and proteins encoded by genes immediately flanking the operon have high sequence identity to methanosarcinal genes. This suggests that this operon was acquired by M. barkeri through a lateral gene transfer event. Another putative operon encoding oxygen-dependent cytochrome d oxidase cydAB was also identified in the genome of M. barkeri and the other two methanosarcinal genomes. The presence of these oxygen-dependent genes along with one catalase and two superoxide dismutase genes suggests that these proteins protect methanosarcinal species from oxygen or that they may support microaerophilic growth by a currently undescribed mechanism. As cytochrome P450 catalyzes an oxygen-requiring reaction and has not been detected previously in an anaerobe, the detection of this gene in M. barkeri raises intriguing questions about the function of this gene product in this obligately anaerobic methanogen.
A comparison of gene role categories among the three species is shown in Table 2. The genomes were analyzed also for classes underrepresented or missing in the M. barkeri genome compared with the 1-Mb-larger M. acetivorans genome. Most of the genes absent from M. barkeri were unidentified ORFs, but identified genes included primarily ORFs encoding transporter proteins, sensory proteins, cell surface proteins, and polysaccharide synthesis proteins. All essential biosynthetic and catabolic genes were conserved in M. barkeri, including multiple copies of confirmed methyltransferases, but several hypothetical methyltransferases of unknown function were not present (17). As reported for M. mazei, which has a genome 1.7 Mb smaller than that of M. acetivorans, M. barkeri lacks also two multigene operons proposed to be linked to energy conservation in M. acetivorans during growth on acetate. The lack of mrpABCDEFG operon (MA4572 to MA4566) H+/Na+ antiporter and rnfABCDGE Na+ transporting NADH oxidoreductase in both M. barkeri and M. mazei supports the hypothesis that these gene products replace the function of the Ech hydrogenase, which is absent in M. acetivorans, by generating a transmembrane ion gradient for ATP synthesis during growth on acetate (25). Genes glnP and glnQ, encoding glutamine transporter proteins, were absent from M. barkeri but present in the other two methanosarcinal species. Finally, M. barkeri also lacks a low-affinity phosphate transporter (MA2935), suggesting it originated in a phosphate-poor environment. Two other transporters are missing in M. barkeri, a gluconate transporter (MA0021) and a dicarboxylate transporter (MA2961). This suggests that M. barkeri has less ability to take up organic compounds than the other two Methanosarcina spp.
|
View this table: [in a new window] |
TABLE 2. Gene role categories and numbers of genes for three Methanosarcina spp.a
|
Features revealed by whole-genome comparison. Sets of gene features shared between genomes were determined (see the supplemental material) and organized as sets of paralogous genes. This approach was pursued at several different levels of identity. The data for the 80% identity level are presented in Fig. 2. When excess paralogs (the difference between the number of paralog clusters and the genes they contain) are expressed as a fraction of total features in the paralog set, M. acetivorans has the highest fraction, at 14% to 15%, and M. mazei the lowest, at 10% to 11%, with M. barkeri intermediate. This correlation between genome size and paralogy suggests a model of genome growth driven by gene duplication and is consistent with our previous observation of high levels of paralogy in the heat shock proteins of M. acetivorans (8).
![]() View larger version (17K): [in a new window] |
FIG. 2. Venn diagram for three Methanosarcina sp. genomes, indicating the numbers of genome features with at least 80% nucleotide identity. Numbers in parentheses are the numbers of common paralog groups, and adjacent numbers are the gene counts for the contributing organisms. Values for M. barkeri are given in bold type, values for M. mazei are given in italics, and values for M. acetivorans are underlined.
|
At the 80% three-way identity level, M. acetivorans, M. barkeri, and M. mazei have 924, 893, and 881 genes falling into 785 paralogous clusters with similarity to the other Methanosarcina spp. However, when their respective transposase contributions of 68, 50, and 49 are discounted the residual differences in relative paralog counts are small. With about 50% of all paralogs being transposase, it is difficult to identify gene duplication events that may not have been driven by transposition-mediated duplication. Chromosome extension in all three organisms must be affected by transposition, but such effects are not uniformly distributed in M. barkeri (Fig. 3). Whole-genome distances (Table 3) based on maximal local alignments indicate that the genomes are quite similar in overall content but that M. acetivorans and M. mazei are marginally more closely related. This is in qualitative agreement with DNA-DNA hybridization experiments (44), which showed reassociation values of 28% between M. acetivorans and M. mazei and 18% between these species and M. barkeri. This result underscores the comparability of these sequences, with the exception of the plasmid sequence.
![]() View larger version (39K): [in a new window] |
FIG. 3. Asymmetric fragmentation in M. barkeri. The top panel shows cumulative deviations from the mean in the M. barkeri genome for synteny with respect to M. acetivorans (a), M. mazei (b), or intergenic interval (c). The cumulative transposon count is superimposed (d). The bottom panel shows uniformly scaled BLASTN cross plots of the M. barkeri chromosome with those of M. mazei and M. acetivorans, with the origin regions circled.
|
|
View this table: [in a new window] |
TABLE 3. Intergenomic distancesa
|
1e44 in ORI A and E
1e8 in ORI B) and show only weak similarity between ORI A and ORI B. They are extremely AT rich (
70%) and may show unconserved inverted-repeat structures. |
View this table: [in a new window] |
TABLE 4. Identification of conserved features for chromosomal origins of replication in Methanosarcina spp.a
|
![]() View larger version (23K): [in a new window] |
FIG. 4. Sequence alignment of the M. barkeri ORI A self-complementary region (1190360 to 1191059) using BLASTN. Numbers at ends of the bottom sequences are genomic locations.
|
![]() View larger version (20K): [in a new window] |
FIG. 5. Proposed mechanism for conserved repetitive sequence to provide bubbled-out repeat motifs for initiation of replication. Pairs of quasi-stable bubbles might occur in pairs at arbitrary locations on opposite strands. All motifs are essentially identical.
|
![]() View larger version (24K): [in a new window] |
FIG. 6. M. acetivorans is elongated due to distributed duplication events. Uniformly scaled, comparable 2-Mbp BLASTN cross plot comparisons reveal deviation from the dotted 45° slope of identity in M. acetivorans, with respect to M. barkeri (upper left) and M. mazei (lower right). The anticipated orthogonal relationship is observed in the M. barkeri/M. mazei comparison (upper right). The self-comparison of M. acetivorans (lower left) reveals multiple nonidentical repeat sequences characteristic of the multiple transposase genes found in all three genomes. M. barkeri comparisons show multiple strand inversions and transpositions relative to the other two genomes.
|
What might cause this wasteland effect? One possibility, given the symmetry with respect to the origin, is an accumulation of strand exchange failures in the replication process and subsequent "gene rot" of broken genes. The cross effect of random strand inversion noted by Eisen et al. (14) gives way to a shotgun effect. Another possibility is infiltration by transposons with transposase-mediated damage. Certainly there is an increased frequency of transposon genes in this area (Fig. 3, trace d), but this may be either causative or opportunistic, with the organism tolerating infiltration of already-dysfunctional sections of the chromosome. The possibility that CRISPRs (clustered regularly interspaced short palindromic repeats) might be involved in large-scale rearrangements was also investigated. All four known CRISPR-associated, or cas, genes typically found in association with the DNA repeats have been detected previously in the M. barkeri genome (18). In M. barkeri, CRISPR sequences are found in six distinct localities. None of these coincide with the margins of rearrangement with respect to the other Methanosarcina spp. but are found in intergenic regions in structurally unconserved regions. This contrasts with the genomes of Thermotoga spp., where large-scale DNA rearrangements appear to be associated with CRISPR DNA repeats and/or tRNA genes (10). Although CRISPRs might be involved in homologous recombination, their immediate environment is not strongly conserved and so it is impossible to say on that basis whether they are better accepted in nondeleterious locations or whether there is a localized deterioration of the immediate environment. In two locations, CRISPR elements are adjoined by colinear sequences, implying an insertion event in M. barkeri.
Conclusions. Of the 3,680 open reading frames in M. barkeri, 746 had orthologs with better than 80% similarity to both M. acetivorans and M. mazei while 240 were unique (nonorthologous) among these species. An etiology for genome rearrangement is revealed by whole-genome comparison of three species of the genus Methanosarcina. The inverse correlation of intergenic size and synteny demonstrates a mechanism for the development of genome plasticity, which involves replication-associated inversion with concomitant gene damage and colonization by transposable elements. Gene duplication is also observed as a mechanism for genome extension. The organization of M. barkeri is well conserved with respect to the other Methanosarcina spp. in the region proximal to the origin of replication, with interspecies gene similarities as high as 95%. In the half genome most distant from the origin, it is however disordered and marked by increased transposase frequency and decreased gene synteny and gene density. Furthermore, we have observed a highly conserved double origin of replication, which suggests a mechanism for replication which allows a double start with pass-through, enabling the origin itself to be replicated. The apparent genome plasticity likely contributed to these species' ability to adapt to a broad range of environments as a result of genome elongation and enrichment for favorable phenotypes.
K.R.S. was supported in part by NSF MCB Division of Cellular and Bioscience grant no. MCB0110762 and by DOE Energy Biosciences Program grant no. DE-FG02-93-ER20106. W.W.M. was supported in part by NSF MCB Division of Cellular and Biosciences grant no. MCB12466 and by DOE Energy Biosciences Program grant no. DE-FG02-02-ER15296.
Published ahead of print on 15 September 2006. ![]()
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
|
|
|---|
-acetyl-ß-lysine,
-glutamate, glycine betaine, and K+ as compatible solutes for osmotic adaptation. Appl. Environ. Microbiol. 61:4382-4388.[Abstract]This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»