Previous Article | Next Article ![]()
Journal of Bacteriology, April 2004, p. 2019-2027, Vol. 186, No. 7
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.7.2019-2027.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Department of Microbiology and Molecular Genetics,1 Computational Genomic Section, Human Genetics Center, The University of Texas Health Science Center, Houston, Texas 770302
Received 5 September 2003/ Accepted 18 December 2003
|
|
|---|
2.7% of the total chromosomal DNA. The chromosomal DNA sequence duplications were aligned to each other by using MUMmer. Frequency and size distribution analyses of the exact DNA duplications revealed that the interchromosomal duplications occurred prior to the intrachromosomal duplications. Most of the DNA sequence duplications in the R. sphaeroides genome occurred early in species history, whereas more recent sequence duplications are rarely found. To uncover the history of gene duplications in the R. sphaeroides genome, 44 gene duplications were sampled and then analyzed for DNA sequence similarity against orthologous DNA sequences. Phylogenetic analysis revealed that
80% of the total gene duplications examined displayed type A phylogenetic relationships; i.e., one copy of each member of a duplicate pair was more similar to its orthologue, found in a species closely related to R. sphaeroides, than to its duplicate, counterpart allele. The data reported here demonstrate that a massive level of gene duplications occurred prior to the origin of the R. sphaeroides 2.4.1 lineage. These findings lead to the conclusion that there is an ancient partnership between CI and CII of R. sphaeroides 2.4.1. |
|
|---|
-3 subgroup of the Proteobacteria (40, 41). This species, along with other members of the class Proteobacteria, represents one of the largest divisions within the prokaryotes (41) and comprises a large number of gram-negative bacteria. R. sphaeroides is also one of the most metabolically versatile and diverse subgroups of the
-3 subgroup of the Proteobacteria, which includes Rhizobium, Agrobacterium, Caulobacter, Brucella, and Rickettsia (40, 41). A few examples of the metabolic diversity are the diversity in assembly and regulation of the light-harvesting apparatus, in nitrogen fixation, in carbon dioxide fixation, in hydrogen metabolism, in electron transport, in oxyanion reduction, and in tetrapyrrole biosynthesis (9, 19).
R. sphaeroides 2.4.1 contains one of the most complex genomes found in members of the Proteobacteria (4, 18, 21). This species was the first bacterial species shown to possess a complex genome consisting of two circular chromosomes, one
3.0 Mbp long (chromosome I [CI]) and one
0.9 Mbp long (chromosome II [CII]), and five additional endogenous replicons (36, 37). The existence of multiple chromosomes in bacteria is now no longer an exception and has been instrumental in setting aside the long-held dogma that all bacterial species have a single circular chromosome.
Currently there is an extensive list of prokaryotic genomes that have been sequenced, including that of R. sphaeroides 2.4.1. As a result, R. sphaeroides is an ideal system for the study of genome complexity because the genome has been assembled and annotated (www.rhodobacter.org). CI and CII are 3,188,631 and 943,022 bp long, respectively, and contain approximately 3,106 and 874 open reading frames, respectively. Preliminary genome analyses (5, 6, 25, 26) have revealed that a wide variety of essential and housekeeping genes are present on both CI and CII.
Gene duplication followed by DNA sequence divergence plays a major role in genome evolution (33). Besides generating the genetic diversity that allows a species a wider spectrum of metabolic capabilities, gene duplication also contributes to the production of biodiversity by promoting genome divergence and further speciation events (24). The R. sphaeroides genome possesses a high degree of gene duplication (25, 26). Studies involving the genetic and biochemical characterization of a number of gene duplications in R. sphaeroides have been conducted previously (8, 13, 14, 28, 29, 30).
Duplications can arise from single-gene duplications, duplication of short chromosomal fragments, duplication of an entire chromosome, or duplication of the whole genome; these events are thought to be major sources of evolutionary novelties (33). In order to uncover both the nature and the amount of exact DNA sequence duplication within and between the two chromosomes, we aligned the CI and CII DNA sequences to each other and also against their own sequences. To examine the evolutionary history of gene duplications and to further understand the relationship between gene duplication events and the separation of the R. sphaeroides lineage from its ancestor, we compared the DNA sequences for each duplicate gene pair with orthologues from species and genera closely related to R. sphaeroides. On the basis of the inferred phylogeny of each set of duplicate genes and their orthologues, a relative age for CI and CII was derived.
|
|
|---|
NUCmer was used to cluster all the matches found by MUMer and to construct larger regions of alignment. Clusters of these matches were subsequently grouped into larger sequence blocks. The NUCmer output file contains multiple columns that show several features of the sequence match, including the coordinates of the matching segments, the lengths of the matching segments, and the level of identity of a match between the two sequences.
Identification of gene duplications and orthologous sequences. Gene duplications were identified as described previously (26), and similarity searches were carried out by using BLASTP (1). The amino acid sequences of the duplicated gene pairs were run through the databases to identify orthologues. The DNA sequences corresponding to the corresponding gene duplications and their orthologues were obtained from closely related species and genera, such as Rhodobacter capsulatus, Paracoccus denitrificans, Sinorhizobium meliloti, Bradyrhizobum japonicum, Agrobacterium tumefaciens, and Caulobacter crescentus, as these sequences are available in the National Center for Biotechnology Information database. The DNA and protein sequences of all duplicate gene pairs and their orthologues were saved in a local sequence file, and then these sequences were used for multiple alignment with CLUSTALW (38).
Phylogenetic tree construction. Phylogenetic relationships were analyzed by using each pair of duplicated alleles and the orthologous DNA sequences from several species and genera closely related to R. sphaeroides. Phylogenetic and molecular analyses were conducted by using MEGA, version 2.1 (22). Phylogenetic tree construction was performed by using the neighbor-joining (NJ) method (35) because of its known accuracy. The distances between DNA sequences used for building the NJ tree were computed by using Jukes-Cantor corrections (17). The NJ method produced a unique final tree based on the assumption of minimum evolution with the correct tree topology. Bootstrap values for the consensus tree were calculated by using 1,000 replications.
|
|
|---|
![]() View larger version (74K): [in a new window] |
FIG. 1. Schematic representation of chromosomal duplications within and between CI and CII. CI and CII are depicted as horizontal bars from left (5' DNA end) to right (3' DNA end). Connecting vertical lines represent the locations on the chromosome(s) where the sequence matches perfectly. Genes involved in flagellum biosynthesis (fl-), electron transport (nuo), chemotaxis (che), and carbon assimilation (cbb) are duplicated as gene clusters, and they are indicated by different colors.
|
2.7% of the total chromosomal content. The amounts of CI-CI and CII-CII sequence duplications were
39 and
18 kb, respectively, which does not reflect the relative sizes of the chromosomes as CI is approximately three times larger than CII. The amount of interchromosomal sequence duplication was
55 kb. The degree of sequence duplication was identified by using a high stringency criterion for exact matches, a perfect 20-nucleotide match. The criterion used in this analysis was more stringent than the criterion used in analyses in which DNA sequences that are >100 nucleotides long with 50% mismatches are used. Thus, the criterion applied in this study provided only a conservative estimate of the amount of DNA sequence duplication. |
View this table: [in a new window] |
TABLE 1. DNA exact duplication in R. sphaeroides 2.4.1
|
|
View this table: [in a new window] |
TABLE 2. Frequency distribution of sequence duplications for CI and CII of R. sphaeroides 2.4.1
|
72%) were 20 to 25 nucleotides long. In contrast, most of the intrachromosomal duplications were also small, but the most common duplications were 26 to 50 nucleotides long, suggesting that these duplications are more recent than the interchromosomal duplications.
![]() View larger version (17K): [in a new window] |
FIG. 2. Frequency distribution of the number of intra- and interchromosomal DNA sequence duplications. The two panels have different scales for the x and y axes.
|
5.5 kb long and encode rRNA operons (rrnA, rrnB, and rrnC, respectively). This finding has been published previously (8). |
View this table: [in a new window] |
TABLE 3. Large duplicated regions of CI and CII
|
Phylogenetic relationship between duplicate gene copies and their orthologues. In addition to a computational analysis performed by using the sizes and frequencies of exact DNA sequence duplications within and between CI and CII, an independent, phylogenetic analysis was used to infer the evolutionary origins of CI and CII. The DNA sequences of a number of duplicated gene pairs and their orthologous DNA sequences from closely related organisms were used for phylogenetic tree construction to determine which of the alternative gene sequences they most closely matched, the other sequence of the duplicate pair (type B tree) or an orthologous sequence (type A tree). Two types of phylogenetic relationships, type A and type B, were expected based on the assumptions adopted from a previous study (23). The derived amino acid similarity for each duplicate gene pair (homologue) and the similarity to the orthologous sequence are shown in Table 4. The species with which a copy of the duplicate gene showed the best match is also listed. The tree topology and the bootstrap value for each consensus tree are indicated in Table 4. It was found that the bootstrap values varied for different gene trees. In general, the bootstrap value is a function of the sequence length and the divergence time for the two DNA sequences, which in this case was strongly affected by the timing of gene duplication.
|
View this table: [in a new window] |
TABLE 4. Similarity analysis of gene paralogues and orthologues
|
80%) of the 44 gene duplications shown in Table 4 represent type A relationships with the orthologous sequences. The inferred phylogeny from a type A tree shows that the duplicate alleles are less similar to each other than to an orthologous sequence from a species closely related to R. sphaeroides. In contrast, in type B phylogenetic relationships there is greater DNA sequence similarity between the duplicated alleles than between the alleles and their orthologues. Nine of 44 duplicated genes showed a type B phylogenetic relationship. These nine gene pairs were cbbGI/cbbGII, cbbPI/cbbPII, flgB1/flgB2, flgF1/flgF2, flhB1/flhB2, fliF1/fliF2, fliQ1/fliQ2, hemN/hemZ, and pucB1/pucB2. Three of these duplications, cbbGI/cbbGII, cbbPI/cbbPII, and pucB1/pucB2, also showed a high level of genetic identity (>80%) both between the duplicated alleles and with the orthologous sequences, as shown in Table 4.
![]() View larger version (27K): [in a new window] |
FIG. 3. Phylogenetic relationships of duplicated gene paralogues of R. sphaeroides and the orthologous sequences from closely related species or genera. As examples, consensus phylogenetic trees representing four gene pairs, rdxA/rdxB, hemA/hemT, pucB1/pucB2, and flhB1/flhB2, and their orthologous sequences are shown. The relationships reflect the two types of topology (type A and type B), and the strength of branching support is indicated by the bootstrap values at the nodes. Scale bars represent genetic distances.
|
|
|
|---|
CI-CII duplications are older than CI-CI duplications. Genomes of most prokaryotic (20) and eukaryotic (34) species examined to date show a high degree of gene duplication, which is an ongoing process. Recently, analysis of the Arabidopsis genome revealed that this genome contains extensive duplications (3) and has gone through several successive rounds of duplication (44) that resulted in different types of duplications. The recent duplications are the least altered in the present genome. The oldest duplications have undergone repeated modifications by a number of DNA-modifying events during evolution of the genome, leading to shortened remnants of the original duplicated sequence blocks. Therefore, an older duplication event results in a higher frequency of small stretches of perfect DNA sequence matches. In contrast, more recent duplication events result in perfect DNA sequence identity over longer lengths of the DNA since the duplications have had less opportunity to be modified.
The rarity of exact large duplications (>1 kb) in the R. sphaeroides 2.4.1 genome validates the assumption that most sequence duplications found in the genome are older duplications. Therefore, most large duplications within or between CI and CII occurred a long time ago, during the evolution and derivation of R. sphaeroides 2.4.1 as a species. The high frequency of the smallest duplications (20 to 25 nucleotides) between CI and CII suggests that a major interchromosomal duplication appeared as a single event, which occurred earlier than most of the intrachromosomal duplications. Hence, CI and CII have existed together over a very long period of evolutionary time.
Both chromosomes are integral to species formation. Gene duplication is common in plants, animals, and microorganisms (27, 39, 42). Based on the inferred phylogeny of each set of gene duplications in several closely related species, the relative timing of these gene duplications can be estimated. It has been shown that the yeast genome duplication occurred as a single event before separation of the Saccharomyces cerevisiae lineage from its ancestor (23). Two possible phylogenetic trees, type A and type B, predict two different outcomes in time, describing gene duplication events prior to or after speciation. The relationship expected in the type A trees predicts that the gene duplication occurred before the formation of the R. sphaeroides 2.4.1 lineage. In contrast, the relationship shown in the type B trees indicates that the gene duplication occurred after separation of the R. sphaeroides 2.4.1 lineage from its ancestor.
Approximately 80% of all the gene duplications sampled showed a type A phylogenetic relationship. The other nine gene duplications displayed a type B relationship, as indicated in Table 4 (also see Fig. S1 in the supplemental material). If gene duplication occurred after species formation, the duplicated gene pair should exhibit a high level of genetic identity, unless the duplicate copies have diverged rapidly. cbbGI/cbbGII, cbbPI/cbbPII, and pucBA1/pucBA2 are reflective of a type B phylogenetic tree, and the duplicated protein sequences have >80% amino acid sequence identity. Six gene duplications, fliQ1/fliQ2, flgB1/flgB2, flgE1/flgE2, fliF1/fliF2, flhB1/flhB2, and hemN/hemZ, also displayed a type B phylogenetic relationship, but the levels of genetic identity between the amino acid sequences encoded by the corresponding duplicated alleles were lower (<50%). This result might have been possible if the gene duplications occurred after the formation of the new lineage, followed by rapid DNA sequence divergence.
To gain some quantitative insights into gene divergence, we can determine the bootstrap value. The bootstrap value signifies the phylogenetic topology (type A or type B), as indicated in Table 4; however, it might be affected by the timing of the gene duplication event. If gene duplication occurred long before speciation, the observed type A topology would have a relatively high bootstrap value (70 or 80). Similarly, there would be a high bootstrap value for the type B topology if gene duplication occurred long after speciation.
Thirty-four (77%) of the 44 gene duplications exhibited either a type A or type B phylogenetic relationship with a bootstrap value of >70. Ten of the gene duplications reflected either a type A or type B relationship with a bootstrap value of <70. If we exclude the 10 gene duplications with low bootstrap values, there are 34 definitive phylogenetic trees, and 29 (
85%) of these trees exhibited a type A topology with a high bootstrap value (>70). Furthermore, 27 (60%) of the gene duplications exhibited a more definitive tree topology with a bootstrap value of >80, and 92% of these trees exhibited a type A topology. Therefore, the majority (80 to 92%) of the definitive and more robust phylogenetic trees had a type A topology, which suggests that a copy of the duplicate pair is more related to its orthologue than to its homologue. This indicates that these duplications are very old and likely occurred prior to the development of the R. sphaeroides lineage.
In summary, two different methods were used to decipher the evolutionary relationship of CI and CII. In the first method we analyzed the length and the frequency distribution of exact DNA sequence duplications in the R. sphaeroides 2.4.1 genome. In the other independent approach we performed a phylogenetic analysis of the duplicated gene pairs and orthologues from closely related species. Tree topology was used to predict the relative timing of intra- and interchromosomal gene duplications. The data from both analyses yielded similar interpretations, that interchromosomal DNA sequence duplications are older than intrachromosomal duplications. Therefore, CI and CII have existed together for a very long time, even before the appearance of R. sphaeroides or a distantly related species. Some of the duplicated genes present in the R. sphaeroides genome are also duplicated in the chromosome and the megaplasmid in other closely related genera, such as Sinorhizobium (data not shown). In contrast, many duplicate genes in R. sphaeroides exist as single copies in the genome of Brucella melitensis, which also possesses two chromosomes. In R. capsulatus, a species reported to be closely related to R. sphaeroides, gene duplications for a number of the gene loci described for R. sphaeroides are not observed. Therefore, the distributions of gene duplications in other related organisms appear to be independent from each other and from the distribution in R. sphaeroides, suggesting that the origins of the complex genomes were independent. However, a more detailed analysis is required.
On the basis of a phylogenetic analysis of several photosynthesis genes, the Proteobacteria emerged as the earliest lineage among the photosynthetic prokaryotes (43). However, phylogenies based on several genes from widely different metabolic pathways provide evidence that the cyanobacteria constitute one of the earliest prokaryotic lineages (11), having evolved about 2.5 billion years ago (12). If we subscribe to the hypothesis that the anaerobic photosynthetic bacteria existed prior to the oxygen-evolving cyanobacteria, then the heterotrophic purple bacteria may have arisen before the cyanobacteria. If this is true, CI and CII have been together for an extended period of evolutionary time. Therefore, we concluded that CI and CII have been partners in the R. sphaeroides genome since it separated from its ancestral lineage.
We thank Steven L. Salzberg, Institute of Genomic Research, for the MUMer analysis. We also thank Haipeng Li and Xiaoming Liu, Human Genetics Center, University of Texas School of Public Health, for providing help with the phylogenetic analysis.
Supplemental material for this article may be found at http://jb.asm.org. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2010 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»