Previous Article | Next Article ![]()
Journal of Bacteriology, April 2006, p. 2364-2374, Vol. 188, No. 7
0021-9193/06/$08.00+0 doi:10.1128/JB.188.7.2364-2374.2006
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850
Received 18 October 2005/ Accepted 16 January 2006
|
|
|---|
|
|
|---|
One of the main advantages of sequencing the genomes of multiple strains from the same species, as well as genomes of closely related species, is that DNA shuffling within the genome, and lateral gene transfer (LGT) events, can be readily identified at the level of a chromosomal replicon. A comparison of closely related genomes therefore enables the identification of chromosomal segments that have undergone DNA rearrangements sometime after the lineages diverged from a common ancestor (6, 7, 17, 19, 32, 39). A computational technique that is commonly used to reveal similarities and differences in the gene order of two microbial genomes is to calculate the pair-wise alignment of their DNA sequences or their translated peptide sequences. The genome alignment can then be visualized as a dot plot in which the x and y coordinates of each position represent similarity between the chromosomes, so that a perfect alignment between two chromosomes would appear as a diagonal line in which f(x) = x.
Comparative genomics of closely related microbial species has revealed an abundance of large-scale genomic changes in the evolution of some species. For example, whole-genome alignments of some closely related species display an "X-shaped" alignment that likely results from numerous chromosomal inversions that pivot around the origin and terminus (9). Whole-genome alignment has also revealed shuffling of chromosomal segments within the same replichore (the half chromosome divided by the replication axis) (39). In the present study, the features examined for two members of the Thermotogales are associated with numerous rearrangements within the same replichore.
In the initial analysis of the genome of the hyperthermophilic bacterium Thermotoga maritima MSB8, there was evidence that members of this lineage undergo extensive gene transfer, particularly with members of the archaeal domain (24, 27-29). In a more recent study (23), we validated this hypothesis using a comparative genome hybridization (CGH) approach to investigate genome plasticity and LGT in the Thermotogales. In this study, numerous gene loss and gain events that have contributed to the metabolic diversity in the members of this species can be seen, and neither mobile elements nor remarkable genomic features such as repeated sequences that could be associated with these genomic rearrangements could be identified. However, our analysis, along with studies of the whole genome, have demonstrated the presence on the chromosome of eight distinct CRISPRs (clustered regularly interspaced short palindromic repeats) (16) that consist of a 30-bp repeat element interspersed with a unique sequence of approximately the same length. These CRISPR elements and their associated group of putative protein-encoding genes (CRISPR-associated sequences [cas genes]) have been identified in the genomes of a broad range of microbial species and have been theorized to be involved in the mobilization of DNA (3, 13, 16). More recently, the intervening spacer sequences in CRISPRs have been shown to have a possible origin from preexisting chromosomal sequences and sometimes from transmissible elements such as bacteriophage and conjugative plasmids. It was also found that these transmissible elements do not reside in cells that carry virus-specific CRISPR spacer sequences but could be found within closely related strains that did not carry these sequences. Thus, a role for CRISPRs in immunity to foreign DNA was proposed (3, 22, 31).
Thermotoga neapolitana strain NS-E, isolated from the Bay of Naples, in Italy (15), has recently been the subject of whole-genome sequencing (Nelson et al., unpublished data). The availability of this additional genome from this lineage of hyperthermophiles has enabled a comprehensive comparative analysis of the chromosomal architecture of the Thermotogales. In this report we present a detailed analysis of chromosomal variation and address the features that have contributed to these differences during the evolution of this lineage.
|
|
|---|
DNA joint assignments. The approximate ends of the rearranged chromosomal segments are evident in the scatterplot of the whole-genome alignment between the two Thermotoga species (Fig. 1). Homologous open reading frames (ORFs) at the ends of each rearranged chromosomal segment were identified in the two species by BLASTP analysis (2). The sequence between the putative protein-encoding regions of homologous ORFs at the ends of two adjoining DNA segments is referred to as a "DNA joint." Each of the 15 observed DNA joints was assigned a roman numeral between I and XV (Fig. 1; Table 1) that is based on the order of its appearance in the T. neapolitana genome. Also, as depicted in Fig. 1, the DNA joint nomenclature uses a roman numeral (such as X) at one end of a particular DNA segment and a primed roman numeral (such as X') at the corresponding end of its adjoining DNA segment. The exact position of each rearrangement within these DNA joints was not discerned, because of the relatively low DNA sequence similarity within the intergenic regions of the two species. The DNA joint sequences vary in size from 41 bp to 5,792 bp. Larger DNA joints correspond to regions that encode ORFs that are absent from one of the two species.
![]() View larger version (17K): [in a new window] |
FIG. 1. Whole-genome amino acid alignments between T. maritima strain MSB8 and T. neapolitana strain NS-E. The Promer algorithm was used to calculate and plot the amino acid percentage identity of maximally unique matching subsequences of at least 5 amino acids between the two genomes. A point (x,y) indicates a sequence that occurs once within each genome, at location x in one genome and at location y in the other genome. The matching sequences may occur on either the forward or the reverse strand; in either case, the locations indicate the 5' end of the sequences. The point 0,0 corresponds to the putative origin of replication for each genome. Panel A shows the alignment at the whole-genome level. Two regions of interest, boxes B and C, are shown in more detail in panels B and C, respectively.
|
|
View this table: [in a new window] |
TABLE 1. ORF pairs, which are found at the DNA joints that connect shuffled chromosomal segments in Thermotoga spp., and their associated DNA features
|
Sequence composition analysis. For the cumulative GC skew analysis, the G+C composition using a 1-kb window over the entire length of the chromosomes (Fig. 2A and B), or a window of 100 bp for particular T. neapolitana and T. maritima subsequences (Fig. 2E and F), was quantified with the formula (GC)/(G + C). A cumulative GC skew was calculated by using successive windows from the beginning to the end of each sequence, and the value of the cumulative GC skew (y axis) was then plotted at its corresponding position on each DNA molecule (x axis).
![]() View larger version (34K): [in a new window] |
FIG. 2. Cumulative GC skew and ORF orientation in the T. maritima strain MSB8 and T. neapolitana strain NS-E genomes. (A and B) Plots of cumulative GC skew calculated with 1-kb windows. (C and D) Plots of the running sum of ORF orientation. (E and F) Expanded plots of GC skew for the pink regions displayed in panels A and B and calculated with a 100-bp window. The four regions in T. maritima (E) and T. neapolitana (F) are putative inverted segments revealed by the whole-genome alignment (Fig. 1). The asterisks in panels A and B correspond to the putative origin of DNA replication, as described by Lopez and coworkers (21). The roman numerals correspond to the nomenclature of the different chromosomal regions displayed in Fig. 1.
|
PCR assays. PCR assays were used to compare the sizes and structures of the 15 DNA joints (described above) for five strains of T. neapolitana that were isolated from different geographical locations (23). The strains used in the present study included the following: T. neapolitana strain NS-E, isolated from a shallow submarine hot spring in Naples, Italy; T. neapolitana strains LA4 and LA10, isolated from the shore of Lac Abbe, Djibouti; Thermotoga sp. strain RQ7, isolated from a geothermal heated seafloor, Ribeira Quente, the Azores; and Thermotoga sp. strain VMA1/L2B, isolated from Vulcano Island, Italy. Although several of these were not previously designated T. neapolitana strains, their patterns of hybridization in the CGH study of Mongodin and colleagues (23), as well as phylogenetic analyses of their 16S rRNA sequences (23), suggest that they are in fact closely related to T. neapolitana. Genomic DNA for these strains was provided by Karl Stetter and Robert Huber from the University of Regensburg, Germany.
PCR primer pairs were designed from the sequences of ORFs flanking each DNA joint (Fig. 1; Table 1) and are as follows for the 15 respective regions: I, ACATGCCCTGTTATCAACTTCAGG; I', ATCTGCGATTTCCTTTCTTCTTGC; II, CTGCCTGTGAGTTTCAGAAAAACG; II', GTTCGTCTTGACCAGTTCGTATCC; III, CTTTTCTGTGATCATCGCTTTTGG; III', TTTCATTCCTTTCAGTGGTTCAGC; IV, GGTACAACGGTTTGATGAACTTGC; IV', ACGGCAGAGAGTACACTTTTGTGG; V, AATTTCACTTGAATGGGGAGAAGC; V', GTCCTGTACCTCCCGTTTATTTCC; VI, CCGGAAAAAGAAGCAATTAAGACG; VI', TTTTCCTACGGCATAGAAACATGG; VII, GGCAGAAAGATCTTCAACATCACC; VII', CTGATTTCATGGCAAAAGATCACC; VIII, GAACACGGTTTACAACACGAAACG; VIII', TGCGTACGGATGATATAAGGAAGG; IX, ATGGTGTGCTTCTTCATGATCTCC; IX', ATACGTCCCCTCAAGAACAAGACC; X, GGAACGTTGAACTCCTCAAGAACC; X', CCTTGCTTTTCAGCAATTCTTTCC; XI, GTCCTTTGTGATGAATCCATAGCC; XI', TCTGTGAACATCATTTCCCTACCG; XII, GGTGTTCAAAAAGACGGAAAGAGG; XII', GGAAGTTCTGGTGAATGGAGAACC; XIII, CTTTGTTTTCAGAAACGGGAATGG; XIII', GATCTTTTCGGAATTTGTCGAAGG; XIV, AATTTCACTTGAATGGGGAGAAGC; XIV', GTCCTGTACCTCCCGTTTATTTCC; XV, AATCTCTTTCCGTACCCACTTTCG; XV', GATCTCAGACGACTCAACGTCTCC. In addition, an internal primer pair was designed for walking the relatively large insert of DNA joint XIII: XIIIb, GCACCAGCACACTTTTCTCATAGC; XIIIb', AAACCGCACACTTAGCCTCTAACC. PCR amplification was performed with TaKaRa Taq polymerase, according to the manufacturer's instructions (Chemicon International, Temecula, CA), with the following cycle profile: 98°C for 20 s, 55°C for 20 s, and 68°C for 60 s per kb, for 30 cycles. The resulting PCR products were visually checked by agarose gel electrophoresis and sequenced by walking directly on the PCR product, and the sequences were used for comparative analyses.
Nucleotide sequence accession numbers. The nucleotide sequences of the CRISPR regions that were amplified in the different strains have been deposited in GenBank under accession numbers DQ352545 to DQ352560 and are listed in Table 2. For each group of strains sharing identical CRISPR spacer sequences, a single sequence was deposited in GenBank. One exception is VMA1/L2B region XII, which has its own accession number because it differs in sequence from the other members of the region XII group which also lack a spacer sequence (strains NS-E, LA10, and LA4).
|
View this table: [in a new window] |
TABLE 2. CRISPR spacer consensus sequences found in the DNA joints of five T. neapolitana strains
|
|
|
|---|
A large region, delimited by T. maritima ORFs TM0939 and TM1016 and covering approximately 80 kb (Fig. 1A, yellow line around coordinate 1000000), is highly conserved between MSB8 and NS-E, with an average percentage of identity of 99.6% between the two strains (compared to 83.6% for the entire proteome). Of the 67 ORFs in this region, 23 encode conserved hypothetical proteins, 5 encode proteins of unknown function, 5 encode proteins involved in ribose metabolism and transport, 1 encodes a protein involved in fucose metabolism, 2 encode putative lipoproteins, 1 encodes a putative membrane protein, and 3 encode putative transcription regulators. This may suggest that some environmental pressure exists to retain some of these ORFs and may explain the preference for the utilization of sugars other than glucose in both species (5). An alternative explanation for the high degree of similarity between the two strains has been proposed recently by Nesbo and coworkers (26). These authors suggest that the high similarity in the TM0939-to-TM1016 region is due to a recent transfer or recombination event between the T. maritima and the T. neapolitana lineages.
Two chromosomal regions, of approximately 40 kb and 500 kb (Fig. 1B and C), display relatively large rearrangements, including combinations of inverted DNA segments, which have a negative slope (e.g., the segment III-II' in Fig. 1B), and translocated DNA segments, which have a positive slope and are offset from the otherwise diagonal line (e.g., segment I'-II in Fig. 1B). In total, 15 distinct DNA segments that are rearranged in the chromosome of T. neapolitana relative to that of T. maritima were identified. The DNA joints that connect these rearranged chromosomal segments, numbered from I to XV in Fig. 1B and C, were assigned as described in Materials and Methods.
Features associated with chromosomal rearrangements. Two predominant types of chromosomal features were identified in the DNA joints between the shuffled DNA segments (Fig. 1B and C; Table 1). In the chromosomes of both species, four DNA joints (VI-VI', VIII-VIII', X-X', and XI-XI' for T. neapolitana and VI'-XI', XIV-VIII', X'-V', and VI-XIV' for T. maritima; Table 1 and Fig. 1B and C) contain one or more tRNA genes. Six DNA joints (I-I', III-III', V-V', X-X', XII-XII', and XIII-XIII' for T. neapolitana and I-III, II'-I', II-III', X-VII, XII-XIII', and XII'-XV' for T. maritima) contain one or more copies of a 30-bp DNA repeat (Fig. 3) belonging to a CRISPR element (16). The remaining five DNA joints do not display obvious chromosomal features, with the exception of T. maritima TM1339, a conserved hypothetical protein that is duplicated (100% similarity) in T. neapolitana (TM1366 and TM1516) where the ORFs are associated with independent, rearranged DNA segments (Table 1). The obvious presence of tRNA genes and CRISPR elements in the DNA joints between shuffled chromosomal segments is unlikely to happen by chance. Statistical analysis, as described in Materials and Methods, shows it to be highly likely that there is an association between these chromosomal features and rearrangements. The Fisher exact test produced a P value of <0.001, given the null hypothesis that the 15 observed rearrangements are not associated with intergenic spaces containing tRNA genes and/or CRISPRs.
![]() View larger version (53K): [in a new window] |
FIG. 3. CRISPR motifs in the genomes of T. maritima strain MSB8 and T. neapolitana strain NS-E. The CRISPR sequences were generated from three different multiple sequence alignments that were compiled by DNA HMM searches. Note the overlapping identity between the two 30-mers in the different Thermotoga species. Also, all occurrences of the 29-mer are located in a single region in T. neapolitana which is absent from T. maritima.
|
One obvious exception to the expected trends in the contours of GC skew is the region between
1.3 Mb and
1.42 Mb in T. maritima and T. neapolitana, which was not identified as an inversion in the whole-genome alignment but does display an inverted GC skew. This observation can be explained if these lineages share chromosomal inversions that occurred before the split between T. maritima and T. neapolitana. The locations of these potential rearrangements appear as common "peaks and valleys" in the contours of the plots in Fig. 2A and B. These two genomes also display a strong correlation between the contours of cumulative GC skew and ORF orientation (Fig. 2C and D). That is, GC content closely reflects ORF orientation, sometimes more so than the position of the origin. One possibility is that many large-scale DNA inversions have shuffled the ORFs with respect to the origin of DNA replication and that the GC content of these displaced ORFs has not had time to ameliorate in their new locations in the chromosome.
Characterization of DNA joints among five strains of T. neapolitana. To investigate the prevalence of these chromosomal shuffling events beyond the two completely sequenced strains of T. maritima and T. neapolitana, and to assess whether or not related but different isolates share these particular DNA joints, PCR assays of four T. neapolitana strains from different geographic locales (LA10, LA4, RQ7, and VMA1/L2B, which are described in Materials and Methods) were performed using primer pairs designed to bridge the 15 DNA joints that connect the shuffled chromosomal segments in T. neapolitana strain NS-E. In the majority of the strains, PCR products were obtained for all 15 of the DNA joints (Fig. 4). Also, for at least eight DNA joints (regions I, III, VI, VII, X, XII, XIII, and XV), the sizes of the PCR products appear to vary between strains, with strains RQ7 and VMA1/L2B most often associated with size changes or the absence of a PCR product (see below). Despite these differences in size, perhaps the most significant result is that 15 PCR products were obtained for all of the T. neapolitana test strains, with the exception of three DNA joints (regions V, IX, and XIV) in strain VMA1/L2B. At region XIV, for example, PCR products of varying abundance are present for strains NS-E, LA10, LA4, and RQ7 but not for VMA1/L2B, which might have failed to produce PCR products because of sequence divergence at the corresponding primer binding sites. That is, the architecture of the shuffled chromosomal segments was established in a common ancestor of the thermophilic T. neapolitana lineages that were isolated from disparate geographical locations.
![]() View larger version (118K): [in a new window] |
FIG. 4. An ethidium bromide-stained agarose gel of PCR products for each of the T. neapolitana species at the 15 DNA joints described in Materials and Methods.
|
|
|
|---|
Global genomic rearrangements, such as duplications, inversions, and translocations, contribute significantly to the evolution of species. Pair-wise comparisons of organisms such as Helicobacter, Chlamydia, Mycobacterium, Vibrio cholerae, Escherichia coli, and Pyrococcus (9, 39) have shown that genome rearrangements occurred mainly via replication-directed translocation across an axis defined by the origin and the terminus of replication. Since matching sequences tend to occur at the same distance from the origin (but not necessarily on the same side of the origin), whole-genome alignments display "X-shaped" patterns that are symmetric about the origin of replication of the two genomes being compared (9, 39). The origin of replication of T. maritima still remains unknown, mostly because the classical approaches, such as GC ratio, GC skew (20), and asymmetric distribution of oligomers along the genome (35), have failed to unambiguously detect it. In the 1999 publication of the T. maritima genome, Nelson and coworkers (24) assigned bp 1 of the genome to the beginning of the longest stretch (2.6 kb) of 30-bp repeats, which was characterized later as one of the eight CRISPR loci present in the chromosome. In a 2000 publication, Lopez and colleagues (21) used tetramer skews and subsequent identification of DNA repeats having similarity to DnaA boxes to predict that the origin of DNA replication is located between coordinates 156960 and 157518. Although the typical features of bacterial origins of replication, such as a local minimum in a plot of cumulative GC skew, seem to be in agreement with this prediction, it has not been experimentally confirmed.
Early genetic studies of chromosomal inversions in Salmonella enterica (36) and E. coli (33) found that these rearrangements can use endpoints encompassing the origin of DNA replication or endpoints contained within a replichore. Different explanations are proposed for the constraints observed at some chromosomal segments (reviewed in reference 34). The results of the present study suggest that Thermotogales species, including T. maritima and T. neapolitana, favor inversion/translocation events within a replichore. In the example below, we propose a model in which a succession of simple inversion events produces the mosaic of chromosomal rearrangements displayed in the whole-genome alignment of the two Thermotoga species. From the cumulative GC skew analysis presented in Fig. 2, DNA segment II'-III of T. neapolitana appears to be inverted, and from the whole-genome alignment, the adjoining DNA segment, I'-II, appears to be translocated. In this simple scenario, the T. maritima sequence can be represented as the DNA string ABCD and the T. neapolitana sequence can be represented as the DNA string ACB'D, where C is the translocated segment I'-II and B' is the inverted segment II'-III (Fig. 5). In a series of two inversion events, the T. maritima sequence can be rearranged into the T. neapolitana sequence. In one possible path (Fig. 5, green arrows), the segment BC flips once, producing the segment AC'B'D. A second inversion occurs, flipping the segment C' to produce the final T. neapolitana sequence ACB'D. An alternative way (Fig. 5, red arrows) of producing the same result would be to flip segment C, producing the sequence ABC'D, and then flip the segment BC' to produce the final T. neapolitana sequence ACB'D. Although less straightforward, a similar transformation via a series of inversion events might be responsible for the more complex pattern of rearrangements observed in the 500-kb region between 1.3 Mb and the end of the chromosome for these two strains.
![]() View larger version (33K): [in a new window] |
FIG. 5. Two-step model of successive inversions to produce an inversion of one DNA segment (B to B') and a concomitant translocation of an adjoining DNA segment (C). Two alternative pathways, which differ in the order of the larger and smaller inversions, are proposed: the green path begins with a large inversion, and the red path ends with a large inversion. In both pathways, segment C is inverted in the first step, and it is proposed that there was positive selective pressure favoring a subsequent cell population in which segment C is restored to its original orientation after the second step.
|
The importance of the involvement of CRISPR elements and tRNA genes in these chromosomal rearrangements is currently unknown. Our study shows an association of these sequence features with the location of DNA joints for the potential DNA inversions. Unfortunately, as modeled by the alternative pathways described above, the precise sequence of DNA inversions is also unknown. It is therefore difficult to discern which particular CRISPR and tRNA genes might be involved. In the above two-step inversion example that represents the transformation of ABCD to ACB'D (Fig. 5), the DNA joint adjoining either segment A or segment D might be involved in both steps, depending on which path (green or red arrows) is used. Thus, it is possible that the biologically important sequence feature resides at one or the other location.
The targeted PCR and sequence analysis of additional strains provided information that validates a previous relatedness study of these two members of the Thermotogales (23). First and foremost, the five strains of T. neapolitana examined here have remarkably similar gross chromosome architectures; thus, their common rearrangements appear to have occurred before strain diversification. Even the apparent lack of 3 of 15 PCR products for strain VMA1/L2B might be explained by sequence divergence of the PCR primer sites. That is, VMA1/L2B might have the same chromosome architecture as the other four strains of T. neapolitana at all 15 DNA joints. Alternatively, the chromosome architecture of strain VMA1/L2B might differ at three DNA joints that did not produce PCR products, which might suggest that a VMA1/L2B-like ancestor diverged relatively early, before subsequent chromosomal rearrangements gave rise to an ancestor of the other T. neapolitana strains having all 15 DNA joints.
Based on a more detailed sequence analysis of the short CRISPR spacer sequences at five loci, a prediction of this study is that the five strains of T. neapolitana can be clustered into three different groups: strain RQ7, strain VMA1/L2B, and the group comprised of strains NS-E, LA10, and LA4. This conclusion agrees with the previous conclusions of a hierarchical clustering of CGH data for T. neapolitana strains compared to T. maritima strain MSB8 and differs somewhat from a phylogenetic comparison of 16S rRNA genes for the same strains (23). Thus, CRISPR spacer sequence analysis appears to add information to 16S rRNA analysis for reconstructing the relatedness of strains. From a comparison of their spacer sequences, strain RQ7 appears to share components of two DNA joints with strain VMA1/L2B and two DNA joints with the group of NS-E, LA10, and LA4 strains. Thus, an RQ7-like ancestor appears to be the common link between these three T. neapolitana strain groups.
The rich diversity of CRISPR spacer sequences in the Thermotogales examined so far hints that a treasure trove of horizontally transferred genetic elements exists in the extreme environment these organisms live in. Previous tallies of CRISPR sequences in T. maritima identified 105 unique spacer sequences in strain MSB8 and 39 more sequences in locus I of additional strains (23). In this study, 49 unique spacer sequences were identified in the five CRISPR regions examined in the five strains of T. neapolitana. Mojica and colleagues (22) suggested that CRISPR spacer sequences could be involved in conferring specific immunity against foreign DNA, such as plasmids and phages; e.g., CRISPR spacers get added in response to foreign DNA. However, with the exception of the relatively small pRQ7-like plasmids (1, 14), these types of horizontally transferred genetic elements have not been identified in the Thermotogales. Thus, there is potentially a significant cache of genetic elements awaiting isolation and study. Some of the CRISPR spacer sequences in the five strains of T. neapolitana were observed to vary by geographic locale. However, other spacer sequences were found to be shared at four loci by members of the NS-E, LA10, and LA4 strain group, suggesting that these particular CRISPR spacer sequences might have originated in a common ancestor and the three lineages subsequently migrated to their current locales. Future studies examining strains from more diverse locations, and more strains at the same locations, might answer the question of whether or not there is a correlation between CRISPR spacer sequences and geographic location.
This project was supported by U.S. Department of Energy Office of Biological Energy Research Co-Operative Agreements DE-FC02-95ER61962 and DE-FG02-01ER63133.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»