This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wong, T.-Y.
Right arrow Articles by Liu, J.-K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wong, T.-Y.
Right arrow Articles by Liu, J.-K.

 Previous Article  |  Next Article 

Journal of Bacteriology, October 2008, p. 6718-6725, Vol. 190, No. 20
0021-9193/08/$08.00+0     doi:10.1128/JB.00682-08
Copyright © 2008, American Society for Microbiology. All Rights Reserved.

Role of Premature Stop Codons in Bacterial Evolution{triangledown}

Tit-Yee Wong,1* Sanjit Fernandes,1 Naby Sankhon,1 Patrick P. Leong,2 Jimmy Kuo,3 and Jong-Kang Liu4

Department of Biology, Bioinformatics Program, The University of Memphis, Memphis, Tennessee,1 Gigamon Systems, Milpitas, California,2 Department of Planning and Research, National Museum of Marine Biology and Aquarium, Pingtung, Taiwan, Republic of China,3 Department of Biological Sciences, National Sun Yat-sen University, Kaohsiung, Taiwan, Republic of China4

Received 14 May 2008/ Accepted 4 August 2008


arrow
ABSTRACT
 
When the stop codons TGA, TAA, and TAG are found in the second and third reading frames of a protein-encoding gene, they are considered premature stop codons (PSC). Deinococcus radiodurans disproportionately favored TGA more than the other two triplets as a PSC. The TGA triplet was also found more often in noncoding regions and as a stop codon, though the bias was less pronounced. We investigated this phenomenon in 72 bacterial species with widely differing chromosomal GC contents. Although TGA and TAG were compositionally similar, we found a great variation in use of TGA but a very limited range of use of TAG. The frequency of use of TGA in the gene sequences generally increased with the GC content of the chromosome, while the frequency of use of TAG, like that of TAA, was inversely proportional to the GC content of the chromosome. The patterns of use of TAA, TGA and TAG as real stop codons were less biased and less influenced by the GC content of the chromosome. Bacteria with higher chromosomal GC contents often contained fewer PSC trimers in their genes. Phylogenetically related bacteria often exhibited similar PSC ratios. In addition, metabolically versatile bacteria have significantly fewer PSC trimers in their genes. The bias toward TGA but against TAG as a PSC could not be explained either by the preferential usage of specific codons or by the GC contents of individual chromosomes. We proposed that the quantity and the quality of the PSC in the genome might be important in bacterial evolution.


arrow
INTRODUCTION
 
The universal genetic code categorized TGA, TAA, and TAG as stop codons for the termination of protein translation. The preferential use of a particular stop codon is highly biased among different organisms and different genes (1). The differences among patterns of stop codon usage are likely due to environmental adaptation (1, 14, 16, 23, 25). Most textbooks state that TAA is the preferred stop codon in bacteria (17). Clarke and Miller noticed many TGA, TAA, and TAG trimers in the second and third reading frames of a protein-coding gene (8) and referred to them as premature stop codons (PSC). PSC do not seem to serve any immediate physiological role but could prevent off-frame reading of gene sequences. Clarke suggested that, in Escherichia coli, the occurrence of each type of PSC might be related to the GC content of the bacterial chromosome (7). During a routine nucleotide sequence analysis of a gene from the extremely radioresistant bacterium Deinococcus radiodurans, we noticed that this gene, which contained more than 3,000 nucleotides, had 24 TGA trimers but not a single TAA or TAG trimer in its entire coding region. This interesting phenomenon led us to investigate the PSC profiles of the whole deinococcal genome and, subsequently, other bacterial genomes. In this report, we describe the relationships between PSC frequencies and real stop codon usages in bacteria with different GC contents. We propose that PSC might be important in directing bacterial evolution.


arrow
MATERIALS AND METHODS
 
The bacterial genomic sequences in FASTA Nucleic Acid (FNA) format were downloaded from the Comprehensive Microbial Resource database (http://cmr.tigr.org). The results presented in this report are based on 72 bacterial genomes with widely different chromosomal GC contents. Bacterial species, together with their corresponding GenBank accession numbers and GC contents, are listed online at https://umdrive.memphis.edu/tywong/public/Appendix. Bacterial chromosomal DNA sequences were downloaded from the Entrez Genome Project website (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi). In this paper, we use the term "gene" to refer to a protein-coding sequence synonymous with an open reading frame (ORF) listed on an FNA file. Non-protein-coding genes and the complementary sequences of protein-coding genes were treated as nongenic sequences. Bacterial codon usage tables were directly downloaded from the Codon Usage Database (http://www.kazusa.or.jp/codon/).

The occurrences of TAA, TAG, and TGA trimers within the protein reading frames in the FNA file were counted by a program written in C language (available upon request) and designated genic counts. Similarly, the occurrences of TGA, TAA, and TAG trimers in the forward and reverse strands of the chromosome were counted and designated total counts. The numbers of TGA, TAA, and TAG trimers of the nongenic fraction were calculated as the differences between the total counts and the numbers of TGA, TAA, and TAG trimers of the genic counts, respectively. The resulting counts, together with the GenBank accession numbers and GC contents of the organism, were transferred to an Excel spreadsheet for subsequent calculations. The types and numbers of real stop codons (which were the last three nucleotides of an ORF) in a genome were also recorded.

We classified codons into four groups, based on their GC nucleotide contents. Codons without any G or C (such as AAA) were designated GC-free codons (zero-G/C codons). Codons with one G or C (such as GAA) were designated low-GC codons (one-G/C codons). Codons with two G or C nucleotides (such as TGG) were designated moderate-CG codon (two-G/C codons), while codons composed entirely of G and/or C (such as GCC) were designated high-GC codons (three-G/C codons). SigmaPlot 9.0 (Systat Software) was used for statistical calculations and plotting.


arrow
RESULTS
 
Distributions of TAA, TAG, and TGA in D. radiodurans. The genome of Deinococcus radiodurans contains two chromosomes, a megaplasmid (MP1), and a small plasmid (CP1). Table 1 summarizes the frequencies of TGA, TAA, and TAG trimers in each piece of DNA. TGA is much more frequent as a PSC (86% overall) than TAA (7.3% overall) and TAG (6.5% overall). In fact, in chromosome 1 alone, there are more than 900 genes containing only TGA as PSC. The order of abundance of the trimers is TGA > TAA > TAG. This does not reflect the relatively high GC content (66%) of the deinococcal chromosomes: the trimer TAG, which contains the same percentage of G as TGA, is the least frequent PSC.


View this table:
[in this window]
[in a new window]

 
TABLE 1. Occurrences of TAA, TAG, and TGA trimers in Deinococcus radiodurans

We also calculated the frequencies of these trimers in the nongenic sequences and in real stop codons in D. radiodurans. Distributions of TGA, TAA, and TAG in the nongenic portion of the chromosome are more random, but the occurrence of TGA is still very high. The overall order of abundance (TGA [75.8%] > TAG [15.74%] > TAA [8.4%]) in the nongenic fragments is more in line with the high GC content of the deinococcal chromosomes. Interestingly, for real stop codon usage, while TGA is the most commonly used codon for protein termination (83%), the use of TAA for a stop codon (12%) is almost double the frequency of the use of TAA as a PSC (7.3%). TAG is the least favored stop codon (5%) in D. radiodurans. This unusually high frequency of TGA and low frequency of TAG as PSC in the deinococcal genome led us to examine the PSC distributions in other bacterial genomes.

Distributions of TGA, TAA, and TAG trimers in other bacterial genomes. To assess the relative abundances of the three PSC more broadly, and to evaluate the effect of chromosomal GC content in forming these trimers, we analyzed 72 bacterial genomes with widely differing chromosomal GC contents for their frequencies of TGA, TAA, and TAG as PSC. Figure 1 shows a three-dimensional plot displaying the relative percentages of TGA, TAA, and TAG as PSC in each genome. If the use of the three types of PSC was determined by random events, one would expect that the data points should be distributed near the center of the triangular graph, with each type of PSC equal to roughly one-third (33%) of the total PSC. We found instead that the occurrence of each type of PSC is far from random. The frequency of TGA is the most variable and has the widest span across these species. In Nocardia farcinica, 93% of the PSC are TGA triplets, higher than any other organism investigated. In contrast, only 25% of all PSC are in the form of TGA in the genome of Fusobacterium nucleatum. The occurrence of TAA as a PSC is more restricted. The genome of Borrelia garinii has the highest percentage of TAA trimers (about 51%), while as little as 4% of all PSC trimers are TAA in Mycobacterium avium subsp. paratuberculosis. The use of TAG as a PSC is the least frequent and most restricted, its use ranging from 32% in Chlamydia trachomatis to 4% in Silicibacter pomeroyi.


Figure 1
View larger version (34K):
[in this window]
[in a new window]

 
FIG. 1. Ternary plot showing the percent distributions of PSC TGA, TAA, and TAG in 72 bacterial genomes. Also shown are their corresponding real stop codons. The forward-tilting dashed grid line (i.e., from left to right) represents a genome's corresponding percentage of TAA. The backward-tilting solid grid line (i.e., from right to left) represents its corresponding percentage of TGA. The horizontal dotted grid line represents the corresponding percentage of TAG. The Rickettsiales are circled as clade A, the Chlamydia/Chlamydophila group is circled as clade B, and the enteric organisms are circled as clade C.

More importantly, the triplets as PSC fall roughly on a line. Bacterial species are not distributed evenly along this line. Along the line, there are many clusters of genomes with similar PSC ratios (TAA/TAG/TGA). Species within each cluster are often phylogenetically related. For examples, all the Chlamydia/Chlamydophila species, including Chlamydia abortus (PSC ratio = 34.8:30.9:34.3), Chlamydia muridarum (PSC ratio = 35.0:30.8:34.2), Chlamydia trachomatis (PSC ratio = 33.2:32.1:34.8), and Chlamydophila pneumoniae (PSC ratio = 33.9:30.7:35.4) are clustered near the center of the graph (Fig. 1, circle A). Members of the genus Rickettsia, which includes R. typhi, R. conorii, and R. prowazekii, form a cluster and have a PSC ratio of about 50:23:26 (Fig. 1, circle B). "Candidatus Pelagibacter ubique" is a free-living phototrophic bacterium. This organism has the smallest genome of any cell known to replicate independently in nature (10, 24, 27). This bacterium is placed in the order Rickettsiales based on the similarity of the ribosomal gene sequence to those of members of that order (20). The PSC ratio of the free-living "Candidatus Pelagibacter" (50:22:28) closely resembles the PSC ratios of its parasitic counterparts, despite the different lifestyles of these organisms. The enteric clade also forms a cluster with a PSC ratio of about 31:11:58 (Fig. 1, circle C). Included in this cluster are Erwinia species (one species), Shigella species (one species), Yersinia species (three species), Escherichia coli (two strains), and Salmonella species (two species).

The PSC profiles and the size of the genomes are not related. For example, the genome of the pathogenic E. coli O157:H7 EDL933 (5.52 Mb, with 5,679 ORFs) is considerably larger than the genome of the laboratory strain E. coli K-12 (4.63 Mb, with 4,289 ORFs). However, their corresponding PSC ratios (31.94:10.39:57.67 and 31.50:10.62:57.87) are almost identical. Thus, the evolutionary history seems to be a major factor affecting the PSC profile of individual organism.

The results shown in Fig. 1 suggested that our initial observation of the high frequency of TGA trimers in the Deinococcus genome was not an isolated incident. In fact, many bacterial genes have TGA as 100% of their PSC. In one extreme case, the PSC trimers in 3,561 genes (out of 5,686 genes) in the Nocardia farcinica genome are all TGA.

Bacterial use of TGA, TAA, and TAG as real stop codons is more random (Fig. 1). However, the frequency of TAG is still noticeably less than the frequency of TAA or TGA stop codons. The variations in usage of TGA, TAA, and TAG as stop codons among these species range from 8% to 81%, 9% to 76%, and 8% to 50%, respectively. Unlike the linear relationships of PSC, the real stop codon usages among these bacteria appear to cluster. The largest cluster shows a high-TAA preference, centered on 23% TGA, 20% TAG, and 58% TAA. Another cluster shows a high-TGA preference, centered on 68% TGA, 15% TAG, and 16% TAA. There are other bacteria with elevated TAG. However, their populations are too small to be analyzed.

Relationship between PSC and chromosomal GC content. The G nucleotide present in TGA and TAG triplets suggests that their frequencies should increase relative to that of the TAA triplet as the GC content of the chromosome increases. As predicted, the proportion of PSC represented by TGA (Fig. 2) increases linearly with the GC content of the bacterial chromosome, while that represented by TAA drops (Fig. 2). However, much to our surprise, we found that the percentage of TAG in the genome, like TAA, was inversely proportional to the GC content of the chromosome (Fig. 2). Furthermore, the ratios of the percentages of TAG and TAA in a genome follow the equations %TAG = (98 – %TGA) x 0.34 and %TAA = (98 – %TGA) x 0.66.


Figure 2
View larger version (19K):
[in this window]
[in a new window]

 
FIG. 2. Correlation between chromosomal GC content and the percentages of TGA, TAA, and TAG premature stop codon trimers in the genomes of 72 bacterial species. The percentage of TGA in the genomes was positively correlated with the chromosomal GC content, while the percentages of TAA and TAG in the genomes were inversely proportional to the chromosomal GC contents. Lines represents the linear regressions of each data set.

Applying data from the 72 bacterial genomes to the equations above showed a standard derivation of less than 3.6. Figure 2 also suggests that the percentage of TGA reaches a lower limit of approximately 25%. This lower limit of 25% TGA may be linked to the GC content of a chromosome (about 27%). However, the TGA upper limit of 94% is substantially higher than the highest chromosomal GC content, approximately 70%.

Average numbers of PSC trimers per gene in different bacteria. The numbers of PSC trimers in the genes, even within the same organism, ranged from zero to more than 150. We found that bacteria with higher GC content tend to contain significantly fewer PSC in their genes than those bacteria with lower GC content. Data on the average numbers of PSC per gene of the 72 bacterial species stratified according to their GC content are presented in Fig. 3 (the insert shows the pairwise comparison of GC content and the average number of PSC per gene). Among these organisms, Fusobacterium nucleatum has the lowest GC content, 27.1%, while Thermus thermophiles has the highest GC content, 69.4%. Although the GC contents of these two organisms differ by only 40%, the number of PSC per gene in F. nucleatum (76.4 PSC/gene) is more than sevenfold higher than the number of PSC per gene in T. thermophiles (10.4 PSC/gene). Our results further revealed that, with few exceptions (such as that concerning Staphylococcus aureus; see below), the average number of PSC per gene in a genome is inversely proportional to the GC content of that genome. This inverse relationship between PSC number and GC content holds true for all three types of PSC (data not shown). Interestingly, most metabolically versatile bacteria have fewer PSC in their genes. For example, the genes of the Deinococcus, Pseudomonas, Salinibacter, Azoarcus, and Klebsiella spp., as well as those of the low-GC-content bacterium Staphylococcus, all contain relatively few PSC (less than 25 PSC/gene) in their genes. On the other hand, bacteria that have a very high number of PSC in their genes often are symbionts. For examples, Fusobacterium is associated with human oral cavities. Rickettsia is an obligate parasite of ticks and humans, and Borrelia forms a symbiosis with ticks. There is no indication to suggest that the genes of those symbionts are proportionally longer than the genes of the free-living bacteria. Therefore, gene length cannot be used to explain the disparity between PSC numbers of the high- and low-GC-content groups.


Figure 3
View larger version (55K):
[in this window]
[in a new window]

 
FIG. 3. Relationships between the average number of PSC per gene in a genome and chromosomal GC content on 72 bacterial genomes. These bacteria are placed according to the ranking of their chromosomal GC contents, from low to high. The insert shows the pairwise comparison of GC content versus PSC per gene and the linear regression line.

Relationship between chromosomal GC content and real stop codon usage. A somewhat different pattern appears when the relative use of real stop codons is compared with the bacterial chromosomal GC content (Fig. 4). As for PSC frequency, increased chromosomal GC content is associated with an increase in the frequency of use of TGA and a decrease in the use of TAA. However, organisms with similar GC contents show substantially more variation in the relative frequency of use of TGA and TAA as real stop codons than in their use of the same triplets as PSC (compare Fig. 2 and 4). Furthermore, the relationship between GC content and stop codon usage for TAA and TGA does not appear to be strictly linear, with the mean frequencies changing little as GC content varied from 24 to 40%, and this then apparently switched abruptly at around 50 to 60%. Even more dramatically, the relative frequency of TAG as a stop codon is nearly independent of GC content, remaining close to 20%. In contrast, its use as a PSC decreases steadily from 25% to 5% as GC content increases (Fig. 2). Although TAG is often the least used stop codon in most bacteria, more than 50% of the genes in Anaplasma marginale St. Maries, a relatively low-GC-content bacterium, used TAG as a stop codon. The use of TAG as a PSC never rose above 32% (Fig. 1 and 2). This result suggests that those factors influencing the selection of real stop codons and the formation of PSC are different.


Figure 4
View larger version (24K):
[in this window]
[in a new window]

 
FIG. 4. Correlation between stop codon usages and the GC contents of 72 bacterial chromosomes. TAA as a real stop codon decreases as a function of increasing chromosomal GC content of the organism. TGA as real stop codon increases as a function of increasing GC content of the organism. The use of TAG as real stop codon remains at about 20% regardless to changes in chromosomal GC content. The regression lines are second-order polynomials.

Correlations between codon usages and PSC formations. The Pareto principle of statistical analysis (12) was employed to evaluate the relationships between the codon usages and PSC frequency. We considered the GC content and the numbers of TAA, TAG, and TGA PSC to be signals for a genome. Eight bacterial genomes having the strongest signals were selected. Two genomes (those of Thermus thermophiles and D. radiodurans) were selected to represent the very-high-GC-content group. Two genomes (those of F. nucleatum and Borrelia afzelii) were selected to represent the very-low-GC-content group. Additionally, the genomes of Borrelia garinii, Chlamydia trachomatis, and Nocardia farcinica were included because they had the highest PSC counts of TAA, TAG, and TGA, respectively. The low-GC-content bacterium Staphylococcus aureus, which has relatively few PSC in its genes, was also included in this study. Since the mechanisms leading to the formation of PSC in the second and third reading frames are quite different, formations of PSC in each reading frame were therefore analyzed separately.

The formation of a TAA PSC in the second reading frame results from the juxtaposition of two codons characterized by the sequence [X]TA-A[X][X], where [X] represents any nucleotide. Likewise, formation of a TAG PSC in the second reading frame of a gene is dictated by the two sequential [X]TA-G[X][X] codons. We postulated that, since TAA and TAG PSC were rare in the high-GC-content bacteria, the usage of [X]TA codons should be rare, because if these [X]TA codons were followed by any of the A[X][X] or G[X][X] codons, they would become TAA and TAG PSC, respectively. Conversely, since TGA was very common in the high-GC-content genome, the usage of [X]TG codons should occur frequently. The opposite should be true for the low-GC-content bacteria because these organisms contain many TAA trimers but only a few TGA trimers in their coding sequences.

Table 2 shows the frequencies of [X]TA and [X]TG codons in the eight bacterial genomes. As predicted, bacterial genomes with high frequencies of PSC TAA/TAG use the [X]TA codons frequently. This group of bacteria prefers ATA for isoleucine, TTA for leucine, and GTA for valine. They use the [X]TG codons sparingly. Bacterial genomes with high frequency of PSC TGA are just the reverse. This group of bacteria prefers CTG for leucine and GTG for valine and use the [X]TA codons sparingly. While this result suggested a strong correlation between codon usages and the formation of PSC TAA and TGA in the second reading frame, these correlations between codon usage and [X]TA/[X]TG codon frequencies are more likely due to the GC content of the chromosome. When all the synonymous codons are considered, the preferential use of a particular set of synonymous codons is always related to the relative GC contents of the codons and the chromosomal GC content of that particular species. We found several examples of this for the low-GC-content group of bacteria. (i) There are four synonymous codons for isoleucine; the two zero-G/C codons (ATA and ATT) are the most frequently used codons, while the one-G/C codon (ATC) is used sparingly. (ii) Even though both TTA and CTA are [X]TA codons for leucine, the zero-G/C codon TTA is the preferred codon and not but not the one-G/C codon CTA. (iii) There are four synonymous codons for valine, and the one-G/C codons GTA and GTT are used more frequently than the two-G/C codon GTC. (iv) The [X]TG codons, such as CTG (for leucine) and CTG (for valine), are used sparingly, likely because synonymous codons with lower GC contents are available. The opposite is true when the synonymous codons of the high-GC-content group of bacteria are considered. Bacteria with high GC contents often prefer codons rich in G and C.


View this table:
[in this window]
[in a new window]

 
TABLE 2. Bacterial codon usages

Formation of TAA, TAG, or TGA trimers in the third reading frame requires a juxtapositions of an [X][X]T codon followed by a AA[X], AG[X], or GA[X] codon, respectively. Careful examination of the codon usage table revealed that the preferential codon usage could not affect the formation of TAA or TGA on the third reading frame. This is because the amino acids encoded by the AA[X] or GA[X] codons have only one synonymous codon and the distinction between the paired synonymous codons is the nucleotide in the third base. For example, the two codons coding for asparagine (AAC and AAT) are both AA[X] codons. Thus, either codon would potentially become TAA. Similarly, the two codons coding for lysine (AAG and AAA) are both AA[X] codons. Likewise, the glutamate (GAA and GAG) and aspartate (GAC and GAT) codons are all GA[X] codons. Therefore, any codon coding for asparagine, lysine, glutamate, or aspartate could potentially create TAA and TGA trimers on the third reading frame, regardless of the codon preference of the individual organism.

A juxtaposition of [X][X]T-AG[X] codons would lead to the formation of TAG in the third reading frame. The AG[X] codons are used by two amino acids, namely, arginine and serine. Both of these amino acids have six synonymous codons. Again, the GC content of the chromosome seems to dictate codon usage in these bacteria. The arginine codon AGA (a one-G/C codon) is more commonly used by the low-GC-content group, but all high-GC-content bacteria favor the three-G/C codon (CGC). Similarly, there are six synonymous codons for serine: AGC, AGT, TCG, TCA, TCT, and TCC. The low-GC-content group (which has relatively more PSC TAG) generally prefers one-G/C codons (AGT, TCA, and TCT), while the high-GC group (which has less PSC TAG) generally prefers the two-G/C codons (AGC, TCG, and TCC).

Some bacteria show a very strong preference for a particular type of PSC. However, there is no evidence to support that such a strong preference for a particular type of PSC is affected by the codon usage. For example, Nocardia farcinica has the greatest percentage of PSC TGA in its genes. However, the frequencies of use of the [X]TG codon CTG (60.65/1,000) is only slightly higher that the frequency of CTG codon usage in D. radiodurans (59.83/1,000). B. garinii has the highest percentage of PSC TAA in its genes, and Chlamydia trachomatis has the highest percentage of PSC TAG. However, the frequencies of [X]TA codons of these two bacteria are generally comparable to those of other low-GC-content bacteria. Staphylococcus aureus is a low-GC-content bacterium. It has significantly fewer PSC on its genes, but the usages of [X]TA and [X]TG codons by Staphylococcus aureus are not significantly less frequent than the usages of these codons by the rest of the low-GC-content group.


arrow
DISCUSSION
 
In this study, we showed that there is a great disparity of PSC in different bacterial genomes (Fig. 1 to 3). There were several important findings. (i) PSC formations were not totally correlated with the chromosomal GC content of a species: the higher the GC content in a chromosome, the fewer the TAG triplets that could be found in the genome. (ii) The PSC profile of a species might be related to the natural history of the species. (iii) The formation of a particular type of PSC was not necessarily related to the codon preferences of the organism. (iv) Organisms with higher chromosomal GC contents often contained significantly fewer PSC in their genes. (v) Unlike the real stop codon usages, the percentages of PSC TGA formed a linear function with the bacterial chromosomal GC content. (vi) Finally, metabolically versatile bacteria generally contained fewer PSC in their genes. These observations suggested that the various types of PSC in a genome were the result of certain selective pressure imposed on the bacterial genomes. Herein, we propose that both the quantity and quality of PSC may be important for bacterial genome evolution.

The Ohno theory of evolution by genome expansion is widely accepted (4, 15, 18). Genome expansion by DNA recombinations would create a repertoire of redundant genes. The extra copies of duplicated genes are free substrates for the evolution of proteins through base substitutions (6). Many details regarding the mechanism of genome evolution remain unclear. The environment of some bacteria may change rapidly and, the species would become extinct if they failed to response in time. However, genomic change through base substitutions is a slow process (11). Fusion of the surplus redundant genes could allow some bacteria to leap forward rapidly. Gene recombination has two patterns: in-frame recombination and off-frame recombination. Obviously, in-frame concatenation of two gene fragments would allow a gene to elongate, but because the amino acids encoded by the newly formed gene are essentially identical to the original parental sequences, the topology (and thus the function) of a newly formed fusion gene product might not differ significantly from that of its parental proteins. This new gene would require a long time of continual modifications to evolve into a functionally different protein. Therefore, in-frame recombination genes might not change fast enough to allow adaptation to a rapidly changing environment. The most effective way to create new protein significantly different from the parental proteins is by off-frame recombination. Because of the frameshift effect, off-frame recombination would instantaneously create a protein with a different topology (and possibly new function). If this new protein could enhance the survival of the cell, the species would survive. Thus, the success of off-frame recombinations in creating a new functional gene would be influenced by the quantity of PSC trimers in the DNA: the fewer the number of PSC trimers in the gene fragment, the more likely a longer DNA fragment could be inserted into an existing gene without truncation.

In addition to the number of PSC trimers in the genes, the quality of each type of PSC trimers might also be important for the success of an off-frame gene recombination event. The efficiencies of TGA, TAA, and TAG as translational stop codons are quite different. In Salmonella enterica serovar Typhimurium, the leakiness of the TGA codon occurs at a frequency of at least 10–2 to 10–3 (21), while TAG and TAA are more error-proof, at about 7 x 10–3 to 1.1 x 10–4 (5) and 9 x 10–4 to 1 x 10–5 (22), respectively. The mechanism of termination at the TGA site is complex. In many cases, TGA is the site of a programmed frameshift. A programmed frameshift allows the ribosome to pass over the UGA stop signal and continue reading the mRNA. Programmed frameshift is a regulatory mechanism for controlling the gene expression of many proteins (13, 19). Furthermore, protein termination at the bacterial UGA site is not straightforward. Successful termination at the UGA site is influenced not only by the concentration of the releasing protein RF2 (25) but also by the ribonucleotide sequences before and after the UGA site on the mRNA (26). In fact, the expression of the bacterial RF2 gene is itself governed by a programmed frameshift (25). TGA also serves as the code for selenocysteine (16), the 21st amino acid, in a wide range of organisms but tryptophan in other organisms and organelles (14). Interestingly, mutations at the TGA site in various genes are very rare (21). All these evidences suggested that TGA might not serve solely as a terminator for protein translation. Perhaps, in the event of an off-frame gene fusion, the presence of these leaky TGAs might allow some off-frame reading, at least temporarily, until better DNA repair (such as base substitution) could occur. Thus, the bias toward a leaky TGA might provide an immediate and provisional relief to a bacterium facing a rapidly changing environment. The use of TGA as a "switching point" might be analogous to the use of ATA codon in many proteins. For an example, Barrai et al. (2, 3) showed that ATA could stimulate polymerase attachment inside coding sequences. In fibrinogen, the relative abundance of the ATA codons might permit the production of pieces of the fibrinogen molecule still functional in a stoppage network and benefit the organism.

The strategy of creating new genes by off-frame recombination might benefit only some organisms and might be harmful to others, depending on the ecophysiological status of the individual organism. The metabolically versatility of a bacterium is often correlated to its environment (9). This might explain why many bacteria, such as Deinococcus, Pseudomonas, Salinibacter, Azoarcus, Klebsiella spp., and the skin-associated Staphylococcus, all known for their metabolic versatilities, contain very few PSC trimers in their genes (Fig. 3). Symbionts, such as Fusobacterium, Rickettsia, and Borrelia, often contain large number of PSC trimers in their genes. Symbionts live in a stable environment. A sudden change by frameshift recombination would create an undesirable product and disrupt their long-established relationship with the host, leading to the collapse of the symbiotic relationship. This might explain why there are many more PSC serving as check marks in the genes and the use of the more error-proof TAA and TAG as PSC in their genes. We should emphasize that, although the number of PSC trimers in the gene and GC content of a chromosome are generally inversely related, and although the intrinsic nature of the stop codons are AT rich, the notion that TA-rich genomes should contain more PSC is not true. For example, the genome of Staphylococcus aureus is AT rich (GC content = 32.8%), but its PSC in the gene is also quite low (less than 25 per gene). This versatile pathogen, which is well known for its resistance to antibiotics, is commonly found on the skin. Unlike other intracellular parasites, the environment of the skin changed rapidly. Having fewer PSC in its genome might enhance survival of S. aureus.

The inverse relationship between the GC content and the number of TAG codons remains unexplained. Perhaps bacteria use a yet-unknown mechanism to remove TAG sequences from their genes. Systematically replacing those AG[X] and [X]TA codons with other codons in the gene would eventually reduce the number of TAG triplets in the genome. Therefore, the higher the GC content, the fewer TAG codons that could be found in the gene.


arrow
ACKNOWLEDGMENTS
 
Part of this research was supported by the "Aim for the Top University Plan" grant by National Sun Yat-sen University to T.-Y. Wong and research grant NSC95-2621-B110-004, National Science Council, Taiwan, Republic of China, to J. K. Liu.

We kindly acknowledge K. Gartner, M. Beck, and S. Schwartzback for their critical reviews and discussions.


arrow
FOOTNOTES
 
* Corresponding author. Mailing address: LS 523, Biology Department, The University of Memphis, Memphis, TN 38152. Phone: (901) 678-4462. Fax: (901) 678-4457. E-mail: tywong{at}memphis.edu Back

{triangledown} Published ahead of print on 15 August 2008. Back


arrow
REFERENCES
 
    1
  1. Alff-Steinberger, C., and R. Epstein. 1994. Codon preference in the terminal region of E. coli genes and evolution of stop codon usage. J. Theor. Biol. 168:461-463.[CrossRef][Medline]
  2. 2
  3. Barrai, I., C. Scapoli, and C. Nesti. 1994. Possible identity of transcription and translation signals in early vital systems. J. Theor. Biol. 169:289-294.[CrossRef][Medline]
  4. 3
  5. Barrai, I., C. Scapoli, C. Nesti, G. Poli, R. Gambari, and M. Beretta. 1994. Codon usage and evolutionary rates of proteins. J. Theor. Biol. 166:331-337.[CrossRef][Medline]
  6. 4
  7. Beiko, R. G., and R. L. Charlebois. 2007. A simulation test bed for hypotheses of genome evolution. Bioinformatics 23:825-831.[Abstract/Free Full Text]
  8. 5
  9. Bossi, L. 1983. Context effects: translation of UAG codon by suppressor tRNA is affected by the sequence following UAG in the message. J. Mol. Biol. 164:73-87.[CrossRef][Medline]
  10. 6
  11. Chothia, C., J. Gough, C. Vogel, and S. A. Teichmann. 2003. Evolution of the protein repertoire. Science 300:1701-1703.[Abstract/Free Full Text]
  12. 7
  13. Clarke, C. H. 1982. Influences of bacterial DNA base-ratios and amino acid composition of the gene product on the consequences of frameshift mutations. J. Theor. Biol. 98:661-674.[CrossRef][Medline]
  14. 8
  15. Clarke, C. H., and P. G. Miller. 1982. Consequences of frameshift mutations in the trp A, trp B and lac I genes of Escherichia coli and in Salmonella typhimurium. J. Theor. Biol. 96:367-379.[CrossRef][Medline]
  16. 9
  17. Dobrindt, U., and J. Hacker. 2001. Whole genome plasticity in pathogenic bacteria. Curr. Opin. Microbiol. 4:550-557.[CrossRef][Medline]
  18. 10
  19. Everett, K. D., R. M. Bush, and A. A. Andersen. 1999. Emended description of the order Chlamydiales, proposal of Parachlamydiaceae fam. nov. and Simkaniaceae fam. nov., each containing one monotypic genus, revised taxonomy of the family Chlamydiaceae, including a new genus and five new species, and standards for the identification of organisms. Int. J. Syst. Bacteriol. 49:415-440.[Abstract/Free Full Text]
  20. 11
  21. Gaut, B. S., B. R. Morton, B. C. McCaig, and M. T. Clegg. 1996. Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc. Natl. Acad. Sci. USA 93:10274-10279.[Abstract/Free Full Text]
  22. 12
  23. Haaland, P. D. 1989. Experimental design in biotechnology. Marcel Dekker Inc., New York, NY.
  24. 13
  25. Harger, J. W., and J. D. Dinman. 2004. Evidence against a direct role for the Upf proteins in frameshifting or nonsense codon readthrough. RNA 10:1721-1729.[Abstract/Free Full Text]
  26. 14
  27. Jukes, T. H., and S. Osawa. 1990. The genetic code in mitochondria and chloroplasts. Experientia 46:1117-1126.[CrossRef][Medline]
  28. 15
  29. Kim, J., J. Nietfeldt, J. Ju, J. Wise, N. Fegan, P. Desmarchelier, and A. K. Benson. 2001. Ancestral divergence, genome diversification, and phylogeographic variation in subpopulations of sorbitol-negative, beta-glucuronidase-negative enterohemorrhagic Escherichia coli O157. J. Bacteriol. 183:6885-6897.[Abstract/Free Full Text]
  30. 16
  31. Kryukov, G. V., and V. N. Gladyshev. 2004. The prokaryotic selenoproteome. EMBO Rep. 5:538-543.[CrossRef][Medline]
  32. 17
  33. Lewin, B. 2004. Protein synthesis, p. 152. In J. Carlson (ed.), Gene VIII. Pearson Prentice Hall, Upper Saddle River, NJ.
  34. 18
  35. Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, Berlin, Germany.
  36. 19
  37. Pande, S., A. Vimaladithan, H. Zhao, and P. Farabaugh. 1995. Pulling the ribosome out of frame by +1 at a programmed frameshift site by cognate binding of aminoacyl-tRNA. Mol. Cell. Biol. 15:298-304.[Abstract]
  38. 20
  39. Rappé, M. S., S. A. Connon, K. L. Vergin, and S. J. Giovannoni. 2002. Cultivation of the ubiquitous SAR11 marine bacterioplankton clade. Nature 418:630-633.[CrossRef][Medline]
  40. 21
  41. Roth, J. R. 1970. UGA nonsense mutations in Salmonella typhimurium. J. Bacteriol. 102:467-475.[Abstract/Free Full Text]
  42. 22
  43. Rydén, S. M., and L. A. Isaksson. 1984. A temperature-sensitive mutant of Escherichia coli that shows enhanced misreading of UAG/A and increased efficiency for some tRNA nonsense suppressors. Mol. Gen. Genet. 193:38-45.[CrossRef][Medline]
  44. 23
  45. Schmitz, J., M. Ohme, and H. Zischler. 2000. The complete mitochondrial genome of Tupaia belangeri and the phylogenetic affiliation of scandentia to other eutherian orders. Mol. Biol. Evol. 17:1334-1343.[Abstract/Free Full Text]
  46. 24
  47. Stingl, U., R. A. Desiderio, J. C. Cho, K. L. Vergin, and S. J. Giovannoni. 2007. The SAR92 clade: an abundant coastal clade of culturable marine bacteria possessing proteorhodopsin. Appl. Environ. Microbiol. 73:2290-2296.[Abstract/Free Full Text]
  48. 25
  49. Tate, W. P., J. B. Mansell, S. A. Mannering, J. H. Irvine, L. L. Major, and D. N. Wilson. 1999. UGA: a dual signal for ‘stop’ and for recoding in protein synthesis. Biochemistry (Moscow) 64:1342-1353.[Medline]
  50. 26
  51. Uno, M., K. Ito, and Y. Nakamura. 2002. Polypeptide release at sense and noncognate stop codons by localized charge-exchange alterations in translational release factors. Proc. Natl. Acad. Sci. USA 99:1819-1824.[Abstract/Free Full Text]
  52. 27
  53. Vergin, K. L., H. J. Tripp, L. J. Wilhelm, D. R. Denver, M. S. Rappe, and S. J. Giovannoni. 2007. High intraspecific recombination rate in a native population of Candidatus pelagibacter ubique (SAR11). Environ. Microbiol. 9:2430-2440.[CrossRef][Medline]


Journal of Bacteriology, October 2008, p. 6718-6725, Vol. 190, No. 20
0021-9193/08/$08.00+0     doi:10.1128/JB.00682-08
Copyright © 2008, American Society for Microbiology. All Rights Reserved.





This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wong, T.-Y.
Right arrow Articles by Liu, J.-K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wong, T.-Y.
Right arrow Articles by Liu, J.-K.