Previous Article | Next Article ![]()
Journal of Bacteriology, March 2007, p. 1914-1921, Vol. 189, No. 5
0021-9193/07/$08.00+0 doi:10.1128/JB.01498-06
Copyright © 2007, American Society for Microbiology. All Rights Reserved.

Department of Microbiology and Molecular Genetics,1 Computational Genomic Section, Human Genetics Center, The University of Texas Health Science Center, Houston, Texas 770302
Received 22 September 2006/ Accepted 11 December 2006
|
|
|---|
|
|
|---|
-3 subgroup of the Proteobacteria (38). A number of strains with common morphological, physiological, and biochemical characteristics have been identified as representatives of this species (36); these strains were originally collected from Delft, Holland, and California from a variety of enrichment cultures. Together with the other
subgroup of the Proteobacteria (12, 18), members of R. sphaeroides exhibit substantial metabolic versatility (6) and genomic complexity, including the existence of two chromosomes (23, 31, 32). Twenty-five strains of R. sphaeroides, including the three strains examined in the present study, have been previously investigated for their macro-restriction length polymorphisms by using pulsed-field gel electrophoresis. Multiple genetic markers derived from R. sphaeroides 2.4.1 were used to examine the genome complexity of the different strains of R. sphaeroides. Identification of diagnostic gene loci located on specific macro-restriction fragments belonging to either chromosome I (CI) or CII demonstrated the wide divergence between the two chromosomes among the different strains of R. sphaeroides. For example, the number of rrn operons varied from two to five among the different strains (23). The genome of R. sphaeroides 2.4.1 has been completely sequenced and "fully" annotated. Preliminary analysis of the genome of R. sphaeroides 2.4.1 facilitated our understanding of its genome organization (19). DNA duplication analysis revealed an abundance of exact DNA sequence duplications in the genome of R. sphaeroides 2.4.1, which further demonstrated that both CI and CII have coexisted in the R. sphaeroides genome (3), and this long association may have been established prior to the derivation of R. sphaeroides as a species. Optical mapping data (reference 41 and unpublished data) obtained from three strains, 2.4.1, ATCC 17029, and ATCC 17025, indicated that the sizes of CI of these stains were similar while the sizes of CII varied. In addition to that of R. sphaeroides 2.4.1, genomic sequences of strains ATCC 17029 and ATCC 17025 are now available, and therefore genome analyses of these strains provide a powerful approach to the examination of the differential evolution of CI and CII.
In this study, two independent approaches, global DNA sequence alignment and exact DNA duplication analysis, were used to identify the extent of DNA sequence conservation among the three R. sphaeroides genomes. Chromosomal sequences of the three strains were aligned in order to identify the common backbone regions and the degree of their DNA sequence similarities. Also, we examined the distribution of exact DNA sequence duplications within and between the genome(s) of these three strains. Results of both the global sequence alignment and the exact DNA duplication analysis revealed that the genome of ATCC 17025 is more highly diverged from those of the other two strains (2.4.1 and ATCC 17029), whose genomes are more closely related. Comparison of CI- and CII-specific DNA sequences from these three strains provided evidence that CII has evolved at a higher rate than CI and suggested that the rapid evolution of CII of R. sphaeroides is mediated by either acquiring new genetic material through horizontal DNA transfer or rapidly generating genetic rearrangements. Thus, the resulting new genetic variants may play differential metabolic roles in allowing the different strains to utilize different niches of diverse nutritional resources.
|
|
|---|
Global DNA sequence alignment. Concatenated chromosomal DNA sequences of the three strains of R. sphaeroides were aligned using Mauve 1.0 (4). This method utilizes pairwise or multiple alignments of conserved genomic sequences of whole genomes, with modest computational requirements without compromising the alignment quality (35). Local alignments were performed in order to identify multiple maximal unique matches (multi-MUMs), which were subsequently used to calculate a guide for phylogenetic tree constructions. A subset of multi-MUMs were then used as anchors, which were divided into local collinear blocks (LCBs). Using multi-MUMs lowers anchoring sensitivity in conserved repetitive regions, such as rRNA operons and prophages. Each LCB is a homologous DNA region of multi-MUMs, which lacks any sequence rearrangements and which is shared by two or more of the genomes under analysis. The sequence alignment identifies the number of common LCBs by using the length of the total conserved regions and the overall nucleotide identity between chromosomal sequences for each pair of strains. The conserved backbone sequence was extracted from the alignment.
DNA sequences in any given genome are aligned once to each of the other genomes, and therefore Mauve detects only orthologous sequences. The number of matching nucleotides and the number of insertions and deletions within each LCB over the complete aligned regions were extracted from the alignment output. The identity between DNA sequences was calculated as a ratio of the number of matching nucleotides to the number of total nucleotides spanning all aligned regions interrupted with many unaligned DNA strings. For example, a nucleotide identity of 0.95 means that 95 out of the total 100 nucleotides of the matching regions are identical.
DNA duplication analysis. Since the Mauve global alignment identifies only orthologous DNA sequences, MUMmer 3.0 (5) was used to identify the exact DNA sequence duplications within and between the genomes of the three strains of R. sphaeroides. This method also includes paralogous duplicated regions. We used 20 nucleotides as the minimum cutoff length of the DNA sequence that perfectly matches elsewhere within or between genomes. The MUMmer output results show the coordinates of each sequence pair of identical matches and the length of each exactly duplicated DNA sequence.
|
|
|---|
A global alignment of the concatenated chromosomal DNA sequences of the three strains of Rhodobacter sphaeroides is shown in Fig. 1, and the results of the alignment are described in Table 1. One of the important criteria for the sequence alignment is the minimum weight of LCBs, which measures the confidence in distinguishing between a real genome rearrangement and a false match. For example, a minimum weigh of 45 refers to three times the sequence length of 15 nucleotides, which was used during the initial search for the multi-MUMs. Multiple alignments of concatenated chromosomal sequences at a minimum weight of 45 displayed 382 common LCBs comprised of
3.35 Mb of shared DNA sequence with 76.1% overall nucleotide identity. The total numbers of LCBs identified in CI and CII were 263 and 119, respectively. The CIs of the three strains of R. sphaeroides shared
90% of their DNA sequences, while the CIIs of these strains shared only
50% of their DNA sequences. In addition, the nucleotide identity between CII-specific sequences was lower (up to 5%) than the nucleotide identity observed between CI-specific sequences of the three strains of R. sphaeroides, as shown in Table 1. Approximately 10% of CI- and
50% of CII-specific DNA sequences of the three stains of R. sphaeroides remain unaligned; these are often repetitive, paralogous, or strain-specific genomic regions acquired by horizontal DNA transfer from other organisms. The role of the unaligned DNA sequences with no homology among the three strains will be discussed below.
![]() View larger version (90K): [in a new window] |
FIG. 1. Mauve representation of the total of 382 LCBs observed between concatenated chromosomal sequences of the three strains of Rhodobacter sphaeroides, 2.4.1, ATCC 17029, and ATCC 17025, at a minimum weight of 45. Black vertical bars indicate concatenated chromosomal boundaries. The R. sphaeroides 2.4.1 DNA sequence given on the forward strand is the reference against which the sequences of the other two strains were aligned and compared. LCBs placed under the vertical bars represent the reverse complement of the reference DNA sequence. The Mauve display window provides viewers the ability to zoom in on regions of interest and examine the local rearrangements of DNA sequences, where connecting lines between genomes identify the locations of each orthologous LCB in all three genomes. Unmatched regions within an LCB indicate the presence of strain-specific sequence. Each sequential colored block represents homologous backbone DNA sequence without rearrangements. ATCC 17025 has undergone more chromosomal rearrangements than the other two strains.
|
|
View this table: [in a new window] |
TABLE 1. Comparison of genomic alignments among three strains of R. sphaeroides
|
The numbers of insertion sites in strains ATCC 17029 and ATCC 17025 were calculated as the number of locations where the nucleotides were missing from the reference strain 2.4.1 but the insertion of a nucleotide(s) was present in either of the two strains. Conversely, the number of nucleotide deletions in either of the two strains was calculated as the number of locations where the nucleotides were present in the reference strain 2.4.1 and the nucleotides were missing from either of the two strains. The genomes of the three strains of R. sphaeroides, as shown in Table 1, demonstrated a varied number of deletions and insertions. The total amounts of nucleotide deletions and insertions in 2.4.1, ATCC 17029, and ATCC 17025 were 135,867, 145,540, and 248,296 base pairs of DNA, respectively. A similar DNA deletion pattern was observed in CI in all three strains of R. sphaeroides; however, the pattern of nucleotide deletions in CII of the three strains differed significantly. The numbers of insertions and deletions and their corresponding nucleotide lengths in the R. sphaeroides strains are provided in Table 2. The numbers of total DNA deletions and insertions occurring in strain ATCC 17029 were 990 and 837, respectively, representing
62 and
42 kb of DNA, respectively. In contrast, the numbers of total DNA deletions and insertions in strain ATCC 17025 were 7506 and 5671, respectively, comprising
255 and
105 kb of DNA, respectively. The high frequencies of both nucleotide insertions and deletions found in strain ATCC 17025 reflect the older separation of strain ATCC 17025 as opposed to the separation of ATCC 17029 from the reference strain 2.4.1, and therefore the earlier separation of strain ATCC 17025 allowed this strain to accumulate more insertions and deletions.
|
View this table: [in a new window] |
TABLE 2. Numbers of insertions and deletions
|
DNA sequence divergence between CI and CII. Aside from the variation in size of CI and CII, the number of LCBs identified in the two chromosomes is related to the extent of their DNA sequence conservation; while a low number of LCBs reflects the sequence conservation over a longer DNA length, a higher number of LCBs indicates sequence conservation over small segments of DNA. All pairwise comparisons revealed that CI contained fewer LCBs than expected based on its relative size, as shown in Table 1. In contrast, CII displayed more than the expected number of LCBs, as estimated from the varied sizes of CII as determined by optical mapping (Tim Donohue, personal communication) of the three different strains.
As mentioned earlier, the total numbers of contigs in the sequence assemblies of ATCC 17029 and ATCC 17025 are 20 and 88, respectively. The numbers of contigs present in the final genome assemblies of these two strains affect the numbers of LCBs in their genome comparisons. However, most of the DNA sequences remained in longer contigs in both ATCC 17029 and ATCC 17025, with a majority of smaller contigs from the ATCC 17025 genome assembly appearing as one LCB when aligned to another genome. Thus, the analysis of the genome comparisons and the validity of the results are not compromised. Moreover, when the number of existing contigs was included in the analysis, the numbers of LCBs identified in the genome alignments of strains 2.4.1 and ATCC 17029; 2.4.1 and ATCC 17025; ATCC 17029 and ATCC 17025; and 2.4.1, ATCC 17029, and ATCC 17025 were 110, 354, 372, and 274, respectively. Therefore, the patterns observed before taking the number of contigs into consideration are not different than those seen when these were not considered. This finding suggested that CI of R. sphaeroides has been more conserved than CII. Seemingly, at the same minimum weight of 45, the pairwise alignment of the chromosomal sequences of 2.4.1 and ATCC 17029 identified fewer LCBs than were identified in sequence alignments of either of the two pairs of strains 2.4.1 and ATCC 17025 or ATCC 17029 and ATCC 17025.
In addition, the data revealed that the genomes of strains 2.4.1 and ATCC 17029 shared extensive DNA homology (
3.96 Mb), with a very high degree of sequence conservation (95.6% nucleotide identity), whereas the sequence alignments of either of the two other pairs of strains, ATCC 17029 and ATCC 17025 or 2.4.1 and ATCC 17025, revealed less DNA homology (
3.4 Mb) as well as a lower level of sequence conservation (
77% nucleotide identity).
The DNA sequence divergence between each pair of the three strains of R. sphaeroides was separately measured for CI and CII, as shown in Table 3. Nucleotide identities were determined over all LCBs extracted from the various pairwise alignments, which were performed at different minimum weights, such as 30, 45, 60, 500, and 5,000. The minimum weight refers to three times the size of the initial DNA sequence length used for searching the multi-MUMs. A high minimum weight identifies genome rearrangements that are more likely to exist at a large scale, while a low minimum weight deals with sensitivity to smaller genome rearrangements. Alignments at low minimum weight (such as 30 and 45) identified higher numbers of LCBs with a greater degree of DNA sequence conservation. In contrast, sequence alignments at high minimum weight (such as 60, 500, and 5,000) revealed lower numbers of LCBs with a lesser degree of DNA sequence conservation. However, all pairwise genome comparisons revealed on average low nucleotide identity between CII-specific orthologous sequences, and therefore a high DNA sequence divergence, as shown in Table 3, and the difference in sequence divergence between CI and CII was highly significant (chi-square test, P = 106). Thus, the CII sequences of each strain have diverged more than the CI sequences.
|
View this table: [in a new window] |
TABLE 3. Nucleotide identity among three strains of R. sphaeroides
|
2% lower coding capabilities and relatively longer intergenic lengths. The longer intergenic lengths in CII may possibly allow more intrachromosomal recombination and access to horizontal gene transfer. These two factors will influence the genetic divergence of CII among the strains of R. sphaeroides. |
View this table: [in a new window] |
TABLE 4. Percent coding capabilities and average intergenic lengths in CI and CII
|
8% lower than the number of DNA sequence duplications shown for strain 2.4.1. The genome of R. sphaeroides 2.4.1 revealed
158 kb (
3.4% of its genome) of exactly duplicated DNA sequences. The genome of R. sphaeroides ATCC 17029 similarly displayed
100 kb (
2.8% of its genome) of duplicated DNA regions. However, R. sphaeroides ATCC 17025 contained
205 kb (
5.9% of its genome) of exactly duplicated DNA sequences, which was approximately twice the amount of exactly duplicated DNA sequences found in the other two strains, 2.4.1 and ATCC 17029. Although the majority of DNA sequence duplications in all three strains were small (<100 nucleotides), as shown in Table 5, the genome of R. sphaeroides ATCC 17025 possesses a higher frequency of DNA duplications for relatively longer DNA sequences (100 to 1,000 nucleotides). The genome of ATCC 17025 showed twice the number of longer duplications (100 to 1,000 nucleotides) than identified for the genome of either 2.4.1 or ATCC 17029. Frequent gene duplications have also been reported to exist in many bacterial species, including Enterococcus faecalis and Lactobacillus johnsonii, which represent relatively large and small genomes, respectively (2). Thus, different selective constraints might explain the varied degree of sequence amplification among strains or closely related bacterial species. |
View this table: [in a new window] |
TABLE 5. Distribution of intra- and intergenomic DNA duplications in R. sphaeroides
|
10 times more than the number for the intragenomic DNA duplications found in each of the three strains alone, as shown in Table 5. In addition, the total number of DNA duplications identified between strains 2.4.1 and ATCC 17029 was
7% higher than the number of DNA duplications between either the genomes of strains 2.4.1 and ATCC 17025 or those of strains ATCC 17029 and ATCC 17025.
The total content of exact DNA duplications between strains 2.4.1 and ATCC 17029 was 3.82 Mb, which was
7% higher than the amount of exact DNA duplications found between either strains 2.4.1 and ATCC 17025 or strains ATCC 17029 and ATCC 17025. These results revealed that the majority of the genomic DNA duplications between any two strains were small, but
50% of the total DNA duplications between 2.4.1 and ATCC 17029 were of longer DNA sequences than the exact DNA duplications identified between either 2.4.1 and ATCC 17025 or ATCC 17029 and ATCC 17025. Thus, the frequency and the amount of total intergenomic DNA duplications demonstrated that the genome of strain ATCC 17025 diverged more from each of the other two strains and possibly separated before the separation of the latter two strains, 2.4.1 and ATCC 17029, which share
85% of exactly duplicated DNA sequences.
|
|
|---|
3.95 Mb of DNA), with a high degree of conservation (95.6% nucleotide identity), than that (
3.4 Mb of DNA) between either of the other two pairs of strains, 2.4.1 and ATCC 17025 or ATCC 17029 and ATCC 17025. Also, the DNA homologies of the latter two pairs of strains revealed an average low level of sequence conservation (
77% nucleotide identity). Similarly, DNA duplication analysis revealed more exact DNA sequence duplications between the genomes of 2.4.1 and ATCC 17029 than between the genomes of either 2.4.1 and ATCC 17025 or ATCC 17029 and ATCC 17025. Thus, two different and independent analyses yielded similar results, which demonstrated a closer phylogenetic relationship between the genomes of R. sphaeroides 2.4.1 and ATCC 17029 and that the genome of ATCC 17025 was more diverged from either of the other two strains, namely, 2.4.1 and ATCC 17029. The high degree of sequence conservation among the three strains of R. sphaeroides is in agreement with the level of DNA sequence conservation found among genomes of other bacterial species. Recently, a genome comparison of two strains of Francisella tularensis showed that these two strains share 97.6% of their genomes and that the nucleotides within these regions are 98.9% identical. However, the major difference between the two strains is the level of genomic rearrangements in gene order (27). Similarly, among nine different representative strains of Vibrio cholerae, only 1% difference exists in their gene contents (34), but there was an extensive level of genomic rearrangements. Thus, the high degree of intraspecies DNA sequence conservation suggests that the differences in lifestyles or virulence types are not due to large differences in the genetic contents of the strains.
R. sphaeroides ATCC 17025 harbors
3 times more nucleotide deletions and insertions than either 2.4.1 or ATCC 17029 (Tables 1 and 2). If the rates of deletion formation were the same in all three strains of R. sphaeroides, the cumulative number of deletions would correlate with the relative time of separation of each strain from its common lineage. The higher number of deletions and insertions in ATCC 17025 suggests its earlier separation from the common lineage. In addition to the divergence of the CI-specific sequences of the three strains, the number of deletions and insertions in CI of these strains also correlates with the relative separation times of the three strains. However, CII conserved regions reflected about the same number of deletions in all three strains. Thus, the divergence of CI-specific DNA sequences among the strains is the best indicator of a phylogenetic relationship among different strains of R. sphaeroides.
The use of nonfunctional DNA sequences, such as pseudogenes and intergenic sequences, is essential for evaluating the role of spontaneous mutations, including insertions and deletions. The nonfunctional DNA sequences accumulate random mutations, and the occurrence of such mutations would not be affected by natural selection (15). The spectrum of insertions/deletions was first used as a parameter of genome size evolution in the study of mammalian pseudogenes, and it was subsequently shown that DNA loss was estimated to be faster in rodents than in humans (8), possibly resulting in smaller rodent genomes. The pattern of DNA insertions and deletions in strains of R. sphaeroides revealed more nucleotide deletions than insertions, which corroborates the earlier finding of a deletional bias as a major force that shapes bacterial genomes (21). Mutational analyses of the genomes of the three strains is currently in progress, and the results of such analyses would further our understanding of the genome size variation in R. sphaeroides.
Rapid divergence of CII and evolution of strain-specific genomic rearrangements.
Genome analysis using a combination of criteria, such as the analyses of the restriction patterns with AseI- and CeuI-generated DNA fragments and the localization of various genes on these restriction fragments by using strain 2.4.1 as the prototype, demonstrated the similar size of CI (
3.0 Mb) in many R. sphaeroides strains, but the size of CII of R. sphaeroides varies (23). Also, optical mapping revealed that CII of R. sphaeroides 2.4.1, ATCC 17029, and ATCC 17025 consisted of 0.94, 1.23, and 0.91 Mb of DNA, respectively (41; T. J. Donohue, personal communication).
Genome comparison of Vibrio cholerae and Vibrio parahaemolyticus revealed a similar observation, i.e., that chromosomes I of these two species do not differ greatly in size (3.0 and 3.3 Mb, respectively) but chromosome II is much larger in V. parahaemolyticus than in V. cholerae (33). Furthermore, the global transcriptional pattern of in vivo-grown cells of V. cholerae shows the highest levels of expression for genes located on CI, while bacterial growth in the intestine has the highest levels of expression for genes that are located on CII (39). Thus, the diverse size, genetic content, and the pattern of expression of genes of CII suggest that CII maintains the genetic reservoir required for species adaptation in specialized environments (34).
The rapid divergence of CII of R. sphaeroides includes a high degree of nucleotide differences between orthologues, rearranged DNA sequences, duplicated genes, and/or newly acquired genetic elements on CII. These genetic divergences revealed a faster evolution of CII, and that could be attributed to different evolutionary forces. Since the two chromosomes (CI and CII) of R. sphaeroides appear to have coexisted, possibly prior to the formation of R. sphaeroides as a species (3), the origin of the two-chromosome genome architecture must have occurred before the diversification of R. sphaeroides strains. An ancient association of the two chromosomes in R. sphaeroides 2.4.1 is further supported by the fact that there is no difference in DNA parameters, such as percent G+C content, di- and trinucleotide frequencies, and codon preferences of CI and CII, in all three strains of R. sphaeroides (reference 19 and unpublished observations). However, some of the CII DNA sequences of R. sphaeroides might have been recently acquired from other bacteria with similar or different genetic backgrounds. In a number of instances, longer DNA insertions identified in R. sphaeroides differ slightly in percent G+C composition (
2% low G+C composition) as well as in nucleotide repeat patterns from the complete chromosome or genome and encode many phage-related functions. The presence of a large number of prophages (lambda like) has also been found in Escherichia coli genomes (25) and are suspected to be involved in the diversification of different strains of E. coli. These newly acquired genetic variations of CII were selected for strain-specific adaptations, but their DNA had not yet drifted towards the genome average. Therefore, the rapid evolution of CII could be attributed to more recent horizontal DNA transfers from bacteria with similar genetic backgrounds, as these newly acquired insertions do not reflect a drastic difference from their overall genome composition.
The susceptibility of CII for rapid chromosomal changes may be due to the fact that CII has a relatively low coding capacity and relatively long intergenic sequences (Table 4), which together may make CII more prone to accumulating genetic variants. Although the division of the genome into two replicons would be advantageous for rapid DNA replication as observed in V. parahaemolyticus (40), the difference in their copy numbers might amplify the level of gene expression in changing environmental conditions. However, the existence of a single chromosome in species closely related to R. sphaeroides, such as Rhodobacter capsulatus and Rhodopseudomonas palustris (14), does not appear to be disadvantageous. Seemingly, a genome analysis of R. sphaeroides suggests that the closest relative of R. sphaeroides is Silicibacter pomeroyi, a member of the marine Roseobacter clade (Chris Mackenzie, personal communication), and members of the Roseobacter clade are widely distributed over diverse hydrographical regions of the ocean. The genome of S. pomeroyi consists of a main chromosome and a megaplasmid, and its genome sequence is fully equipped to take advantage of transient high-nutrient niches within a low-nutrient marine environment (22). Thus, a detailed analysis of CII-specific sequences is required to substantiate the hypothesis that the possession of multiple chromosomes in bacteria has some adaptive advantages.
Published ahead of print on 15 December 2006. ![]()
|
|
|---|
subdivision. Syst. Appl. Microbiol. 5:315-326.[Medline]This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»