Previous Article | Next Article ![]()
Journal of Bacteriology, July 2005, p. 4935-4944, Vol. 187, No. 14
0021-9193/05/$08.00+0 doi:10.1128/JB.187.14.4935-4944.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland,1 Department of Microbiology, University of Regensburg, D-93053, Regensburg, Germany2
Received 24 January 2005/ Accepted 8 April 2005
|
|
|---|
|
|
|---|
Subsequent to the completion of the T. maritima strain MSB8 genome sequence, Nesbo and coworkers presented a series of studies that investigated potential LGT events in the Thermotoga lineage (14-16). In the first of these studies, the patterns of acquisition of two "archaeal-like" genes in the order Thermotogales were investigated, and the results lent additional support to the movement of the predicted "archaeal-like" genes across the domains. Suppressive subtractive hybridization (SSH) (1) was subsequently used to compare the genome of the sequenced strain MSB8 to Thermotoga sp. strain RQ2 (99.7% identity in the small-subunit rRNA sequence), which was isolated from the geothermally heated sea floor in Ribiera Quente, the Azores. This SSH study allowed for a partial identification of strain-specific sequences and resulted in a subset of sequences comprising approximately 48 kb of strain RQ2-specific DNA. Based on this finding, it was estimated that 20% of the strain RQ2 genome was not present in the genome of strain MSB8. Most recently, Nesbo and coworkers have screened lambda libraries that were created from strain RQ2 DNA for five regions that are absent from the MSB8 genome (14). Among the gene clusters found to be unique to strain RQ2 were an archaeal-type ATPase, a rhamnose biosynthesis operon, and an arabinosidase island.
With the advent of whole-genome sequencing, many new examples of gene transfer events between archaea and bacteria have come to the forefront (2, 3). However, although it is now evident that there is a high level of genetic exchange in the Thermotoga lineage, the study of LGT "is still in its adolescence" (7, 8), and the mechanism(s) of the exchange, its direction, and the degree to which it occurs are still not known. In an attempt to gain further insight into gene transfer in the Thermotoga lineage, we initiated a comparative genome hybridization (CGH) study with the reference-sequenced MSB8 genome against nine strains of Thermotoga (including strain RQ2) that have been isolated from different locations throughout the world.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Thermotoga strains used in the comparative genomic hybridization study
|
Tagged image file format images of the hybridized arrays were analyzed using TIGR Spotfinder software (http://www.tigr.org/software/), the data set was normalized by applying the linear regression algorithm of the MIDAS software (http://www.tigr.org/software/), and values were then averaged to determine the final ratio (R = MSB8/test strain) reported for each ORF. For each comparison, at least two flip-dye experiments (four hybridizations) were performed. Statistical analysis of the data collected was performed on log2-transformed signal ratios (log2 [test strain/MSB8]) by using GACK analysis software (6) (http://falkow.stanford.edu/whatwedo/software/software.html), which provides an estimate of the probability (%EPP) that any given gene is present in the test strain compared to the control. The %EPP, ranging from 0.5 (high likelihood of divergence) to 0.5 (high likelihood of presence), was then transformed to an estimated probability of divergence (%EPD), ranging from 0 to 100 (highest likelihood of divergence). Based on the CGH data and the GACK analysis, genes were considered to be shared between T. maritima MSB8 and the test strain when the signal ratios were less than 3 and divergent for signal ratios greater than 10. Hierarchical clustering of the CGH data, as presented in Fig. 1B, was performed using the TIGR-MEV package (http://www.tigr.org/software/).
![]() View larger version (56K): [in a new window] |
FIG. 1. Relatedness of the Thermotoga strains used in this study. (A) Small-subunit rRNA gene phylogeny. *, strains that were compared in the CGH study. (B) CGH results and hierarchical clustering based on the CGH data.
|
2 analysis was performed as previously described (9). Nucleotide sequence accession numbers. The nucleotide sequences of the strain RQ2 regions presented in Fig. 3 and Tables S1 and S3 have been deposited in GenBank under the following accession numbers: DQ073429, DQ073430, DQ073431, DQ073432, DQ073433, DQ073434, DQ073435, and DQ073436.
![]() View larger version (55K): [in a new window] |
FIG. 3. Analysis of the sequenced RQ2 regions compared to their MSB8 counterparts. For each RQ2 region that was sequenced to closure and analyzed, the blue line (top) represents the PCR product obtained for RQ2, whereas the black line (bottom) represents the same region in MSB8. Red lines delimited by two red filled circles were drawn to give length indications of remarkable sequences of the MSB8 genome. The coordinates refer to the start of the PCR product for RQ2 and to the whole genome for MSB8. MSB8 genes (above the black line for MSB8) and RQ2 genes (underneath the blue line for RQ2) are color coded according to their best match. Green, best match to bacterial species; brown, best match to bacterial species; black, hypothetical proteins (no match). The orange line represents the 2 analysis of the G+C content of the MSB8 region (a scale is shown on the right). The brown areas represent the nucleotide identities (values in dark brown) between the MSB8 and RQ2 sequences. Finally, remarkable features are represented, such as predicted "archaeal islands" in the MSB8 genome (brown boxes), repeats (pink boxes).
|
|
|
|---|
The results from the CGH analysis were in agreement with the results of the phylogenetic reconstruction of the 16S rDNA sequences presented above. Hierarchical clustering of the data from the CGH analysis of the different Thermotoga strains (Fig. 1B), compared to the reference genome, MSB8, revealed groupings into two main clusters. Strains NE2x/L8B (99.84% of the genes shared with MSB8), RQ2 (93.08% shared), S1/L12B (92.02% shared), NE7/L9B (93.42% shared), and PB1platt (90.03% shared) all clustered together. The other isolates (LA10, 23.43%; LA4, 16.57%; RQ7, 25.73%; and VMA1/L12B, 30.62%) share lower levels of similarity with the reference genome, MSB8, and appear to be more closely related to T. neapolitana NS-E (data not presented). Strains NE7/L9B, S1/L12B, and NE2x/L8B were isolated from the same location as strain NS-E, i.e., Naples, Italy (Table 1), and clustered together in both the CGH and the 16S rDNA analyses (Fig. 1). Strains LA4 and LA10 were isolated from Lake Abbe, Djibouti (Table 1) and clustered together in the CGH analysis (Fig. 1B). On the other hand, the two strains isolated from the Azores (Table 1), i.e., RQ2 and RQ7, clustered separately (Fig. 1B).
These two patterns of hybridization become more evident when the CGH data are aligned to a circular representation of the T. maritima MSB8 genome (Fig. 2). More remarkably, it becomes evident that most of the genes that are absent or divergent from genes in strain MSB8 are not distributed randomly over the bacterial chromosomes but, rather, group to form large islands. For example, in the strain RQ2 genome, 106 of the estimated 129 divergent ORFs occur as islands ranging in size from 2 to 38 kb. Similarly, for strain S1/L12B, 15 islands larger than 2 kb could be found within the set of 149 divergent ORFs.
![]() View larger version (80K): [in a new window] |
FIG. 2. Circular representation of the T. maritima MSB8 genome and CGHs using microarrays. Outer circle and second circle, predicted coding regions on the plus and the minus strand, respectively, color coded by role categories (10). Third circle, atypical nucleotide composition. Fourth circle, "archaeal islands" as predicted by Nelson et al. (9). Fifth circle, repeats associated with CRISPR elements (red; CRISPR locus numbers from I to VIII are listed), CRISPR-associated sequence loci (half-sized ticks, orange), transposons (blue), and repeat elements (three-quarter-sized ticks, cyan). Sixth circle, CGH regions R1 to R12 (see Tables S1 and S3). The circles 7 to 24 show the CGH results (comparison to MSB8) for strains RQ2 (circles 7 and 8), S1/L12B (circles 9 and 10), PB1platt (circles 11 and 12), NE7/L9B (circles 13 and 14), NE2x/L8B (circles 15 and 16), LA10 (circles 17 and 18), LA4 (circles 19 and 20), RQ7 (circles 21 and 22), and VMA1/L12B (circles 23 and 24). For each strain, the outermost circle represents the hybridization ratio, R (see Materials and Methods); gray, R < 3 (genes shared); yellow, 3 < R < 5; orange, 5 < R < 7; red, 7 < R < 10; brown, R > 10 (genes divergent). The innermost blue circle displays the %EPD for each gene compared to MSB8. The %EPD could not be calculated for strains LA10 (circle 18), LA4 (circle 20), RQ7 (circle 22), and VMA1/L12B (circle 24).
|
Strains S1/L12B and NE7/L9B have essentially identical patterns of hybridization to the Thermotoga MSB8 array. In total, 149 (8%) of the ORFs in strain MSB8 do not have homologs in the strain S1/L12B genome. Of these, 37 occur as single ORFs, and the remainder occur in 15 islands larger than 2 kb. In addition, 6.9% are devoted to transport. When comparing strains S1/L12B and NE7/L9B to each other, the only apparent differences are one chemotaxis and flagellar biosynthesis operon (TM0698 to TM0705) that is absent from strain NE7/L9B, as well as a large section of contiguous genes (TM0966 to TM1005) that encodes only hypothetical and conserved hypothetical proteins and that is also absent from strain NE7/L9B.
In addition to the regions described above, the lipopolysaccharide biosynthesis operon and surrounding regions that includes TM0611 through TM0653 in the reference strain MSB8 genome has variable levels of hybridization in all the strains tested, including strain NE2x/L8B, which otherwise is identical to strain MSB8. Also, a number of single genes that are randomly distributed on the reference genome appear to be absent from the genome of NE2x/L8B. These include an indole-3-glycerol phosphate synthase, an orotate phosphoribosyltransferase, a threonine dehydratase, a fructokinase, xylose repressor, a cold shock protein, and a number of conserved hypothetical proteins. As these genes represent individual changes, it is also possible that they represent genes that are evolving at a higher rate and cannot be detected using the array.
Finally, Thermotoga strains LA10, LA4, RQ7, and VMA1/L12B appear to be divergent from the reference strain MSB8, sharing only 23.43%, 16.57%, 25.73%, and 30.62% of their genes, respectively, with the reference T. maritima MSB8 (Fig. 1B). The low number of genes in common between the reference MSB8 and the strains LA10, LA4, RQ7, and VMA1/L12B, when hybridizing against the MSB8 microarray, is most likely the result of gene sequences that are too divergent in these strains compared to the MSB8 sequences, rather than a total absence. Nevertheless, most of the genes that are conserved between these four strains and MSB8 are grouped in three large islands, with a size ranging approximately from 13 kb to 81 kb (Fig. 2). Surprisingly, these conserved genes are mostly hypothetical proteins. These three regions also contain a putative polysaccharide export protein (TM0638), a putative NH3-dependent NAD+ synthetase (TM0645), a glutamine synthetase (TM0943), and four ORFs coding for subunits of a ribose ABC transporter (TM0955, TM0956, TM0958, and TM0959). Thermotoga strains LA10, LA4, RQ7, and VMA1/L12B are closely related to T. neapolitana NS-E. A whole-genome alignment between T. neapolitana NS-E and T. maritima MSB8 revealed that these two genomes are, on average, 80.4% identical and surprisingly syntenic, with only a few insertions/deletions and inversions present (data not shown). Considering the percent identity cutoff for hybridization against the MSB8 array, i.e., approximately 85% (nucleotide level), it is logical to assume that strain NS-E, as well as LA10, LA4, RQ7, and VMA1/L12B, are likely to hybridize poorly to the MSB8 array. However, the three regions described above were found to be highly conserved between NS-E and MSB8, with an average percent identity of well above 90%. Because most of the genes present in these regions seem to be hypothetical, it is likely that they carry functions that are essential for these species but that are yet to be characterized.
Analysis of the genomic islands divergent in Thermotoga sp. strain RQ2. A more detailed analysis was performed for nine regions that were predicted as being absent or highly divergent in strain RQ2. These regions were PCR amplified using primers designed for the flanking genes in strain MSB8. The PCR products were subsequently sequenced and assembled to closure (results presented in Table S1 in the supplemental material). In all cases, the predictions from the CGH analysis of gene absence or variability were correct. In addition, this analysis revealed at least three major types of gene transfer events. The first represents events whereby large regions (up to 12 kb in size) that are present in the reference strain MSB8 genome are missing in their entirety from the genome of strain RQ2 (regions RQ2-R2, RQ2-R9, and RQ2-R11 in Fig. 3). These appear to be large gene insertion events in strain MSB8 that have all occurred in intergenic regions, and the genes flanking these regions have remained with a high degree of conservation between strains RQ2 and MSB8. In region RQ2-R2 (corresponding to TM0411 to TM0423), for example, strain RQ2 lacks genes coding for the transport and metabolism of tagatose, a putative alpha-glucosidase, and three sugar ABC transporters. In strain MSB8, one subset of genes, TM0417 to TM0422, was predicted to be an "archaeal island," and another subset of genes (TM0411 through TM0416) are best aligned to genes in bacterial species (data not shown). This entire region is also absent from strain PB1platt but is present in all other Thermotoga strains that are closely related to MSB8, i.e., S1/L12B, NE2x/L8B, and NE7/L9B. It is now apparent that this entire region was acquired by strain MSB8 in two independent events, from both bacterial and archaeal donors. Similarly, region RQ2-R9 is a 10-kb stretch comprising nine genes (TM1063 through TM1071) that is missing from strain RQ2 compared to MSB8. These nine genes code for five oligopeptide ABC transporters (TM1063 to TM1067), two proteins involved in sugar metabolism (TM1068 and TM1071), one transcriptional regulator belonging to the DeoR family (sugar catabolism), and one hypothetical protein. These genes are entirely conserved in the genomes of the strains S1/L12B, NE2x/L8B, NE7/L9B, with one of the oligopeptide subunits (TM1067) being absent from strain PB1platt. The variable region (TM1063 to TM1071) that corresponds to RQ2-R9 encodes an "oligopeptide transporter," which may possibly be a sugar transporter, as this transporter seems to be part of an operon also comprising an alpha-glucosidase and a transcriptional regulator from the DeoR family (TM1069). In strain RQ2, this region is completely absent and corresponds to a 100-bp piece of "unique" DNA. Finally, strain RQ2 does not have a region (TM1261 to TM1271) that encodes a phosphate transport system, nor does it have one of the two DNA mismatch repair proteins in the reference strain MSB8 (these regions were not amplified).
The second type of gene transfer event relates to major rearrangements with individual genes rather than deletion events associated with entire operons or large contiguous cassettes of genes. This is evident in regions RQ2-R5, RQ2-R7, RQ2-R8, and RQ2-R10 (Fig. 3), which all appear to have undergone complex rearrangements/gene insertion/deletion events. In RQ2-R5, for example, there are two small regions (2.02 kb and 346 bp) (Fig. 3 and Tables S1 and S3 in the supplemental material) of the MSB8 genome that have been replaced in the strain RQ2 genome by an 864-bp and a 724-bp unique sequence, respectively. What makes these two rearrangements remarkable is that they have occurred in each case within the predicted ORFs, not in intergenic regions. The N terminus ends of TM0756 and RQ2-R5-3 are conserved, but the C terminus ends of these two genes are different, leading to two different predicted proteins, as follows: TM0756 codes for a galactosyltransferase, whereas RQ2-R5-3 is a glycosyltransferase-fusion protein (Table S3 in the supplemental material). Similarly, TM0758 and RQ2-R5-5 have conserved N- and C-terminal ends, but the middle portions of the two strains are completely different. While the resulting two genes appear to encode the same flagellin, their differences in activity, if any, remain to be seen. Another example is region RQ2-R7; a 244-bp region of the MSB8 genome has been replaced in the RQ2 genome by a 1.06-kb unique sequence. Although the C terminus of RQ2-R7-1 and TM0969 are conserved, RQ2-R7-1 codes for a much larger protein, TM0969, with a unique N terminus giving the protein a different function from its ortholog in MSB8. TM0969 is a small hypothetical protein, whereas RQ2-R7-1 is a putative archaeal ATPase. Downstream of TM0969, strain MSB8 lacks a small 249-bp region compared to RQ2, and a 2.62-kb sequence in MSB8 has been replaced by a unique 2.11-kb region in RQ2, leading to three different predicted genes in RQ2, namely, a putative methyl-accepting chemotaxis protein (RQ2-R7-3), an HD domain protein (RQ2-R7-4) and a hypothetical protein (RQ2-R7-5). In regions RQ2-R8 and RQ2-R10, a 5.46-kb and 2.85-kb region, respectively, containing hypothetical proteins in MSB8 (TM0992 and TM1125 to TM1127), was replaced in strain RQ2 by a 568-bp and 6.38-kb region, where the RQ2 ORFs are also all hypothetical. Elsewhere in region RQ2-R8, RQ2 lacks two small regions of 958 bp and 1.1 kb compared to MSB8, along with four hypothetical proteins (TM0999 through TM1002) and one transposase-related protein (TM1003). Again, it is noticeable that almost the entire RQ2-R8 region was originally predicted to be an "archaeal island."
Finally, the third type of gene transfer variant relates to RQ2-R12. On the array, this region in strain RQ2 appears to be divergent from MSB8. However, after sequencing and annotation, most of the genes in this region in strain MSB8 (TM1165 to TM1172) appear to be conserved in RQ2, have diverged to a certain extent, and are therefore not similar enough to give a positive result by microarray hybridization. For example, TM1166 only has 85.6% identity with its RQ2 counterpart (RQ2-R12-2). Three of the RQ2-R12 ORFs not only are divergent from their MSB8 homologs but also are smaller in size. RQ2-R12-5, RQ2-R12-6, and RQ2-R12-7 cover only 85.69%, 78.02%, and 90.58% of the length of TM1169, TM1170, and TM1171, respectively (Fig. 3; Table S3 in the supplemental material). This does not seem to have any effect concerning the predicted function of RQ2-R12-5 and RQ2-R12-7, compared to TM1169 and TM1171. However, RQ2-R12-6 is predicted to be a putative response regulator/HD domain protein, and its homolog TM1170 in strain MSB8 was annotated as an ABC transporter. RQ2 lacks a 1.92-kb region compared to MSB8, along with four genes (TM1173 to TM1176).
CRISPR sequences across the Thermotoga strains. Region R1 was selected for analysis across all of the strains, because the results of the CGH comparisons suggested that it was divergent across all of the strains. In MSB8, region R1 is comprised of two long DNA repeats, LR1 and RPT5A (Fig. S1 in the supplemental material), separated by a reiterated 30-bp repeat and unique intervening sequences. These repeat features are hallmarks of CRISPR (10, 18), and region R1 is one of the eight CRISPR loci found in the genome of MSB8 (Table 2). The role of CRISPR in microbial genomes is still not known (5, 20), but we have hypothesized that the variable presence and absence of CRISPR elements in the microbial lineages is the result of gene transfer events as well as intrachromosomal recombination.
|
View this table: [in a new window] |
TABLE 2. Number of shared spacer sequences between the eight CRISPR loci within the MSB8 genome
|
|
View this table: [in a new window] |
TABLE 3. Number of shared spacer sequences in the CRISPR locus I of various Thermotoga strains
|
|
|
|---|
From the whole-genome CGH study presented here, similarities in the total number of genes that are variable when compared to the reference strain MSB8 become evident. For example, of the more closely related strains, it appears that strain PB1platt, although similar to strain MSB8 on the array, is most divergent in terms of metabolic capabilities with respect to the metabolism of carbohydrates. In contrast to T. maritima strain MSB8, strain PB1platt either does not use the plant polymers pectin or xylan, or glycerol, maltose, tagatose, or cellobiose for energy, or it uses systems that are divergent from those employed by strain MSB8 and were therefore not detectable using microarrays. Compared to the other Thermotoga strains used in this study, strain PB1platt was isolated from a unique environment, the upcoming produced fluids (oil-water-gas mixtures) from the Prudhoe Bay oil fields. These geothermally heated reservoirs may represent isolated pockets of microbial communities situated deep down below the permafrost soil hostile to hyperthermophilic life. Therefore, microorganisms such as PB1platt could represent survivors from the times where this crude oil had been formed. It is also possible that they have invaded their hot biotope very recently during the procedure used for secondary oil recovery, i.e., when seawater (which may possibly harbor some dormant hyperthermophiles which had originated from submarine vents) is pumped down into the oil reservoirs. In both hypotheses, strain PB1platt had to adapt to an environment in which sugars and plant polymers are not (or are no longer) available. Therefore, gene transfer and genome plasticity are important features for genetic evolution of the Thermotogales, in order to ensure adaptability to changing environments.
Of additional significance are the changes that have occurred within the gene sequences of the isolates that have most likely resulted in some selective advantage in terms of efficiency of the resultant proteins. This was seen in at least two situations in the comparisons between strains MSB8 and RQ2 (the galactosyltransferase/glycosyltransferase and the flagellins) and has most likely occurred in many other genes (this will become evident if we obtain the complete genome sequence of strain RQ2, for example). There have been studies that have demonstrated that the nature of the carboxyl-terminal domain of bacterial topoisomerases strongly determine their DNA binding efficiency and cleavage (21). Domain exchange between the Thermus aquaticus DNA polymerase and the 3'-to-5' exonuclease domain of the homologous mesophile Escherichia coli DNA polymerase I and the homologous T. neapolitana DNA polymerase resulted in variable chimeras that had characteristics from the parental polymerases (variable in temperature and high polymerase activity, processivity, 3'-to-5' exonuclease activity and proofreading function) (22). A study by Tsoka and Ouzounis also revealed that metabolic enzymes seem to exhibit a much higher tendency to participate in multiple gene fusion events than any other proteins (19). This suggests that the changes that have happened within genes in the order Thermotogales have somehow conveyed a selective advantage to the species that have acquired the change, allowing them to increase their metabolic fitness.
It is obvious that lateral gene transfer is a powerful evolutionary force that has played a significant role in microbial species evolution. Codon analysis and conservation of gene order based on the complete genome sequence gave the initial clues to the promiscuity demonstrated by the Thermotogales. Although it has been argued that Thermotoga is a deep-branching bacterium and that some of these archaeal-like genes are ancestral genes that have since been lost in other bacterial lineages (15), many of the gene sequences that were thought to be lost or gained in this study appear to be associated with particular biological processes and may therefore be a reflection of the environmental niche where these individual species are residing. Based on the Thermotoga analysis, it is evident that substrate availability may be one of the main reasons for loss and gain of genetic material. The most striking example is strain PB1platt. This Thermotoga strain was isolated from the most distant location, i.e., from an oil field at the Prudhoe Bay, Alaska. Although it shares a high level of genome similarity based on the CGH analysis, it does not group with the other T. maritima or T. neapolitana strains included in this study based on 16S rDNA phylogeny but, rather, appears to be more closely related to strain RKU, which was isolated from the same type of environment, i.e., a deep oil field.
It is also possible that shared regulatory elements/promoters among hyperthermophilic species have enabled the efficient activity of acquired genes. Alternatively, regulatory elements from other locations in the chromosome can be tapped to regulate these acquired genes and/or pathways, allowing for the success of these transfer events. The regions that have been subject to gene transfer events are scattered over the chromosome, and there does not appear to be any bias for gene loss or gain of particular regions of the chromosome.
The value of CGH in analyzing potential gene transfer events has been highlighted in this study. It is now evident that the SSH study of Nesbo and colleagues (15) may have overestimated the number of genes that are unique to strain RQ2, as CGH analysis does not suggest such a high level of diversity between the two species. Ultimately, however, whole-genome comparisons remain the most reliable way to detect the subtle genetic differences between closely related species.
We thank Bruce Weaver and Joanne Emerson for support with various aspects of this work.
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»