Previous Article | Next Article ![]()
Journal of Bacteriology, March 2009, p. 1974-1978, Vol. 191, No. 6
0021-9193/09/$08.00+0 doi:10.1128/JB.01448-08
Copyright © 2009, American Society for Microbiology. All Rights Reserved.
,
Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada,1 UMR CNRS 7138 Systématique, Adaptation et Evolution, Universite Pierre et Marie Curie, Paris, France,2 Centre for Geobiology, University of Bergen, Bergen, Norway,3 Department of Process Engineering and Applied Science, Dalhousie, University, Halifax, Nova Scotia B3J 1Z1, Canada,4 Department of Biology, University of Bergen, Bergen, Norway5
Received 15 October 2008/ Accepted 29 December 2008
|
|
|---|
|
|
|---|
The complete genome sequence of this strain was determined by the conventional whole-genome shotgun strategy. Genomic libraries containing 1- to 4-kb and 40-kb fragments were constructed, and sequence chromatograms were produced using a MegaBACE 1000 capillary DNA sequencer (GE Healthcare). Nucleotide skews were computed as described previously (11). Automated open reading frame (ORF) identification and annotation were performed using the annotation software Manatee made available by TIGR (23). Pseudogenes were identified by doing BLAST searches of neighboring ORFs with the same or similar annotations and by using the program Psi-phi (9, 10), and clustered regularly interspaced short palindromic repeat loci (CRISPRs) were identified using the web site http://crispr.u-psud.fr/crispr/CRISPRHomePage.php with the default parameters (6). Maximum-likelihood (ML) trees (WAG [
+I model, four categories]) were constructed from protein-coding ORFs using PHYML and the PhyloGenie package (5). Recently, several Thermotogales genomes have become available in GenBank. As these genomes had not been published yet, we did not include them in any "genome-scale" analyses (i.e., the phylogenetic analyses). We did, however, include them in the BLAST analyses of mobile Thermosipho africanus genes.
The genome of Thermosipho africanus strain TCF52B is a single circular chromosome consisting of 2,016,657 bp with an average G+C content of 30.8%. Strand asymmetries, such as GC skew and tetramer skews, are pronounced and show two clear singularity points, located at roughly 8 kb and 1033 kb from the +1 site (see Fig. S1 in the supplemental material). Since these two points are diametrically opposed on the circular chromosome, dividing it into two halves with opposite compositional skews, they make good candidates for the putative origin and termination of replication. The 1,033-kb region is likely to harbor the origin, since GC skew becomes positive past this location, as in most bacterial genomes with a known origin.
The genome contains 2,000 potential coding sequences, of which 1913 are putative protein-coding ORFs, 30 are putatively assigned as pseudogenes, and 57 encode RNA. A comparison to the genome of Thermotoga maritima is given in Table 1. The Thermosipho africanus genome is about 156 kb larger than the Thermotoga maritima genome and carries 36 more ORFs. The genome contains duplicated regions comprising paralogous gene copies, CRISPRs, and mobile genetic elements, which collectively provide considerable indirect evidence for genomic instability and acquisition of exogenous genetic information.
|
View this table: [in a new window] |
TABLE 1. General features of the Thermosipho africanus genome, with a comparison to Thermotoga maritima
|
![]() View larger version (30K): [in a new window] |
FIG. 1. Distribution of CRISPR loci and mobile elements along the Thermosipho africanus genome, as well as phylogenetic "affiliation" of genes along the chromosome and the GC contents of genes. Outer circle, phylogenetic affiliation of the sister of Thermosipho africanus in phylogenetic trees estimated from predicted ORFs. The following color coding for the sister in the phylogenetic tree was used: green, self; red, Thermotogales; yellow, Firmicutes; blue, Archaea; orange, "others" as defined in Fig. 2; pink, complex; gray, complex including Thermotogales; light blue, no tree. Second and third circles, distribution of the mobile elements along the Thermosipho africanus chromosome. Mobile elements in forward orientation are indicated in red, and mobile elements in reverse orientation are indicated in blue. Fourth circle, distribution of CRISPRS and Cas genes along the genome. CRISPR repeats are in green, and Cas genes are in purple. Innermost circle, distribution of gene GC content. Genes having a GC content above the mean are in red, while those with a GC content below the mean are in green. The three spikes in GC content correspond to rRNA operons.
|
We attempted to calculate ML phylogenetic trees from each of the 1,913 ORFs and obtained trees from 1,578 (82%), using the PhyloGenie package. The distribution of the "immediate sisters" (nearest neighbors) of Thermosipho africanus in the trees is shown in Fig. 2. In 60% of the trees the sister was another Thermotogales bacterium, in most cases Thermotoga maritima, since this was the only other complete Thermotogales genome included in the analysis. For 9% of the treeable ORFs, the sister gene originated from within its own genome.
![]() View larger version (28K): [in a new window] |
FIG. 2. Distribution of Thermosipho africanus sister taxon or clade in 1,578 phylogenetic trees for potentially protein-coding ORFs. "Other group" means that the organism(s) in the sister group belonged to a taxonomic group that was not Thermotogales, Firmicutes, or Archaea. "Complex" means that the sister clade was composed of organisms from several different taxonomic groups, and "complex including Thermotogales" means that another Thermotogales sequence was included in this clade.
|
We therefore visually inspected each of the trees in order to also obtain information on LGT that predate the split between Thermosipho and Thermotoga (see Fig. S2 in the supplemental material). This also allowed us to detect transfers where the genes involved have later been duplicated in the Thermosipho africanus genome (so that the sister in the tree was another Thermosipho africanus gene.) This analysis suggested that a total of 202 ORFs (
13%) have been involved in LGT with Archaea (including both ancient and recent events). Among these, 125 (
62%) also involve Thermotoga maritima, while 77 (
38%) have no close homolog in Thermotoga maritima. This latter number is of course an overestimate of the number of potential recent transfers, as many of the transferred genes might have been lost by Thermotoga maritima MSB8, but these numbers do suggest that LGT between the Thermotogales and the Archaea is a still an ongoing process. Thermophilic Archaea such as members of the genera Archaeoglobus (2) and Thermococcus (3, 14) are among the few other organisms considered to be native to oil reservoirs, the habitat from which this strain was isolated (4). Moreover, a recent reanalysis of the Thermotoga maritima genome reported 11.3% archaeal genes in this genome, consistent with our findings (20).
A large proportion of the ORFs have a close phylogenetic relationship with Firmicutes, with 8% of the ORFs having Firmicutes as sister in the tree (Fig. 2). This connection has also been observed earlier in phylogenetic analyses (17, 19, 20). To further investigate this, we performed the same analysis of the trees in which Thermosipho africanus clusters with Firmicutes as we did for Archaea (see Fig. S3 in the supplemental material). In total there are 417 (26%) trees that suggest LGT between these lineages. For 244 (58.5%) of these trees the LGT predated the Thermosipho/Thermotoga split, as there was also a close homolog in Thermotoga maritima MSB8, while there was no close Thermotoga maritima homolog in 173 (41.5%) of the trees. Moreover, Thermotogales and Firmicutes were sisters, rather than nested one within the other, in 62 (3.9%) of the trees. One could interpret this as evidence that these two phyla are indeed sisters or that there has been substantial transfer between them, though the true phylogenetic position of the Thermotogales is elsewhere (likely deeper) in the tree. Alternatively, of course, the notion of a unique "true" phylogenetic position could be questioned.
A high level of LGT between Thermotogales and Firmicutes might in any case be expected, since some members of the Firmicutes, e.g., the Thermoanaerobales, frequently cohabit with Thermotogales in natural environments. For instance, Thermotogales and the Firmicutes genera Thermoanaerobacter and Desulfotomaculum are the only bacteria thought to be indigenous to oil reservoirs (4, 12, 18). Moreover, most of the mobile elements found scattered in the Thermosipho africanus genome seem to have recently originated from Firmicutes, further supporting the importance of LGT between these lineages.
Nucleotide sequence accession number. The genome sequence of Thermosipho africanus strain TCF52B has been submitted to GenBank under accession number CP001185.
Sequencing and assembly were performed at The Atlantic Genome Centre (Halifax, Canada). We thank TIGR (now JCVI) for providing the TIGR Annotation Service, which provided us with automatic annotation data and the manual annotation tool Manatee. We also thank Peter Cordes and Sebastien Halary for help with the data analysis and Angie Lewis for help with sequencing and assembly.
Published ahead of print on 5 January 2009. ![]()
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»