Previous Article | Next Article ![]()
Journal of Bacteriology, October 2008, p. 6718-6725, Vol. 190, No. 20
0021-9193/08/$08.00+0 doi:10.1128/JB.00682-08
Copyright © 2008, American Society for Microbiology. All Rights Reserved.

Department of Biology, Bioinformatics Program, The University of Memphis, Memphis, Tennessee,1 Gigamon Systems, Milpitas, California,2 Department of Planning and Research, National Museum of Marine Biology and Aquarium, Pingtung, Taiwan, Republic of China,3 Department of Biological Sciences, National Sun Yat-sen University, Kaohsiung, Taiwan, Republic of China4
Received 14 May 2008/ Accepted 4 August 2008
|
|
|---|
|
|
|---|
|
|
|---|
The occurrences of TAA, TAG, and TGA trimers within the protein reading frames in the FNA file were counted by a program written in C language (available upon request) and designated genic counts. Similarly, the occurrences of TGA, TAA, and TAG trimers in the forward and reverse strands of the chromosome were counted and designated total counts. The numbers of TGA, TAA, and TAG trimers of the nongenic fraction were calculated as the differences between the total counts and the numbers of TGA, TAA, and TAG trimers of the genic counts, respectively. The resulting counts, together with the GenBank accession numbers and GC contents of the organism, were transferred to an Excel spreadsheet for subsequent calculations. The types and numbers of real stop codons (which were the last three nucleotides of an ORF) in a genome were also recorded.
We classified codons into four groups, based on their GC nucleotide contents. Codons without any G or C (such as AAA) were designated GC-free codons (zero-G/C codons). Codons with one G or C (such as GAA) were designated low-GC codons (one-G/C codons). Codons with two G or C nucleotides (such as TGG) were designated moderate-CG codon (two-G/C codons), while codons composed entirely of G and/or C (such as GCC) were designated high-GC codons (three-G/C codons). SigmaPlot 9.0 (Systat Software) was used for statistical calculations and plotting.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Occurrences of TAA, TAG, and TGA trimers in Deinococcus radiodurans
|
Distributions of TGA, TAA, and TAG trimers in other bacterial genomes. To assess the relative abundances of the three PSC more broadly, and to evaluate the effect of chromosomal GC content in forming these trimers, we analyzed 72 bacterial genomes with widely differing chromosomal GC contents for their frequencies of TGA, TAA, and TAG as PSC. Figure 1 shows a three-dimensional plot displaying the relative percentages of TGA, TAA, and TAG as PSC in each genome. If the use of the three types of PSC was determined by random events, one would expect that the data points should be distributed near the center of the triangular graph, with each type of PSC equal to roughly one-third (33%) of the total PSC. We found instead that the occurrence of each type of PSC is far from random. The frequency of TGA is the most variable and has the widest span across these species. In Nocardia farcinica, 93% of the PSC are TGA triplets, higher than any other organism investigated. In contrast, only 25% of all PSC are in the form of TGA in the genome of Fusobacterium nucleatum. The occurrence of TAA as a PSC is more restricted. The genome of Borrelia garinii has the highest percentage of TAA trimers (about 51%), while as little as 4% of all PSC trimers are TAA in Mycobacterium avium subsp. paratuberculosis. The use of TAG as a PSC is the least frequent and most restricted, its use ranging from 32% in Chlamydia trachomatis to 4% in Silicibacter pomeroyi.
![]() View larger version (34K): [in a new window] |
FIG. 1. Ternary plot showing the percent distributions of PSC TGA, TAA, and TAG in 72 bacterial genomes. Also shown are their corresponding real stop codons. The forward-tilting dashed grid line (i.e., from left to right) represents a genome's corresponding percentage of TAA. The backward-tilting solid grid line (i.e., from right to left) represents its corresponding percentage of TGA. The horizontal dotted grid line represents the corresponding percentage of TAG. The Rickettsiales are circled as clade A, the Chlamydia/Chlamydophila group is circled as clade B, and the enteric organisms are circled as clade C.
|
The PSC profiles and the size of the genomes are not related. For example, the genome of the pathogenic E. coli O157:H7 EDL933 (5.52 Mb, with 5,679 ORFs) is considerably larger than the genome of the laboratory strain E. coli K-12 (4.63 Mb, with 4,289 ORFs). However, their corresponding PSC ratios (31.94:10.39:57.67 and 31.50:10.62:57.87) are almost identical. Thus, the evolutionary history seems to be a major factor affecting the PSC profile of individual organism.
The results shown in Fig. 1 suggested that our initial observation of the high frequency of TGA trimers in the Deinococcus genome was not an isolated incident. In fact, many bacterial genes have TGA as 100% of their PSC. In one extreme case, the PSC trimers in 3,561 genes (out of 5,686 genes) in the Nocardia farcinica genome are all TGA.
Bacterial use of TGA, TAA, and TAG as real stop codons is more random (Fig. 1). However, the frequency of TAG is still noticeably less than the frequency of TAA or TGA stop codons. The variations in usage of TGA, TAA, and TAG as stop codons among these species range from 8% to 81%, 9% to 76%, and 8% to 50%, respectively. Unlike the linear relationships of PSC, the real stop codon usages among these bacteria appear to cluster. The largest cluster shows a high-TAA preference, centered on 23% TGA, 20% TAG, and 58% TAA. Another cluster shows a high-TGA preference, centered on 68% TGA, 15% TAG, and 16% TAA. There are other bacteria with elevated TAG. However, their populations are too small to be analyzed.
Relationship between PSC and chromosomal GC content. The G nucleotide present in TGA and TAG triplets suggests that their frequencies should increase relative to that of the TAA triplet as the GC content of the chromosome increases. As predicted, the proportion of PSC represented by TGA (Fig. 2) increases linearly with the GC content of the bacterial chromosome, while that represented by TAA drops (Fig. 2). However, much to our surprise, we found that the percentage of TAG in the genome, like TAA, was inversely proportional to the GC content of the chromosome (Fig. 2). Furthermore, the ratios of the percentages of TAG and TAA in a genome follow the equations %TAG = (98 – %TGA) x 0.34 and %TAA = (98 – %TGA) x 0.66.
![]() View larger version (19K): [in a new window] |
FIG. 2. Correlation between chromosomal GC content and the percentages of TGA, TAA, and TAG premature stop codon trimers in the genomes of 72 bacterial species. The percentage of TGA in the genomes was positively correlated with the chromosomal GC content, while the percentages of TAA and TAG in the genomes were inversely proportional to the chromosomal GC contents. Lines represents the linear regressions of each data set.
|
Average numbers of PSC trimers per gene in different bacteria. The numbers of PSC trimers in the genes, even within the same organism, ranged from zero to more than 150. We found that bacteria with higher GC content tend to contain significantly fewer PSC in their genes than those bacteria with lower GC content. Data on the average numbers of PSC per gene of the 72 bacterial species stratified according to their GC content are presented in Fig. 3 (the insert shows the pairwise comparison of GC content and the average number of PSC per gene). Among these organisms, Fusobacterium nucleatum has the lowest GC content, 27.1%, while Thermus thermophiles has the highest GC content, 69.4%. Although the GC contents of these two organisms differ by only 40%, the number of PSC per gene in F. nucleatum (76.4 PSC/gene) is more than sevenfold higher than the number of PSC per gene in T. thermophiles (10.4 PSC/gene). Our results further revealed that, with few exceptions (such as that concerning Staphylococcus aureus; see below), the average number of PSC per gene in a genome is inversely proportional to the GC content of that genome. This inverse relationship between PSC number and GC content holds true for all three types of PSC (data not shown). Interestingly, most metabolically versatile bacteria have fewer PSC in their genes. For example, the genes of the Deinococcus, Pseudomonas, Salinibacter, Azoarcus, and Klebsiella spp., as well as those of the low-GC-content bacterium Staphylococcus, all contain relatively few PSC (less than 25 PSC/gene) in their genes. On the other hand, bacteria that have a very high number of PSC in their genes often are symbionts. For examples, Fusobacterium is associated with human oral cavities. Rickettsia is an obligate parasite of ticks and humans, and Borrelia forms a symbiosis with ticks. There is no indication to suggest that the genes of those symbionts are proportionally longer than the genes of the free-living bacteria. Therefore, gene length cannot be used to explain the disparity between PSC numbers of the high- and low-GC-content groups.
![]() View larger version (55K): [in a new window] |
FIG. 3. Relationships between the average number of PSC per gene in a genome and chromosomal GC content on 72 bacterial genomes. These bacteria are placed according to the ranking of their chromosomal GC contents, from low to high. The insert shows the pairwise comparison of GC content versus PSC per gene and the linear regression line.
|
![]() View larger version (24K): [in a new window] |
FIG. 4. Correlation between stop codon usages and the GC contents of 72 bacterial chromosomes. TAA as a real stop codon decreases as a function of increasing chromosomal GC content of the organism. TGA as real stop codon increases as a function of increasing GC content of the organism. The use of TAG as real stop codon remains at about 20% regardless to changes in chromosomal GC content. The regression lines are second-order polynomials.
|
The formation of a TAA PSC in the second reading frame results from the juxtaposition of two codons characterized by the sequence [X]TA-A[X][X], where [X] represents any nucleotide. Likewise, formation of a TAG PSC in the second reading frame of a gene is dictated by the two sequential [X]TA-G[X][X] codons. We postulated that, since TAA and TAG PSC were rare in the high-GC-content bacteria, the usage of [X]TA codons should be rare, because if these [X]TA codons were followed by any of the A[X][X] or G[X][X] codons, they would become TAA and TAG PSC, respectively. Conversely, since TGA was very common in the high-GC-content genome, the usage of [X]TG codons should occur frequently. The opposite should be true for the low-GC-content bacteria because these organisms contain many TAA trimers but only a few TGA trimers in their coding sequences.
Table 2 shows the frequencies of [X]TA and [X]TG codons in the eight bacterial genomes. As predicted, bacterial genomes with high frequencies of PSC TAA/TAG use the [X]TA codons frequently. This group of bacteria prefers ATA for isoleucine, TTA for leucine, and GTA for valine. They use the [X]TG codons sparingly. Bacterial genomes with high frequency of PSC TGA are just the reverse. This group of bacteria prefers CTG for leucine and GTG for valine and use the [X]TA codons sparingly. While this result suggested a strong correlation between codon usages and the formation of PSC TAA and TGA in the second reading frame, these correlations between codon usage and [X]TA/[X]TG codon frequencies are more likely due to the GC content of the chromosome. When all the synonymous codons are considered, the preferential use of a particular set of synonymous codons is always related to the relative GC contents of the codons and the chromosomal GC content of that particular species. We found several examples of this for the low-GC-content group of bacteria. (i) There are four synonymous codons for isoleucine; the two zero-G/C codons (ATA and ATT) are the most frequently used codons, while the one-G/C codon (ATC) is used sparingly. (ii) Even though both TTA and CTA are [X]TA codons for leucine, the zero-G/C codon TTA is the preferred codon and not but not the one-G/C codon CTA. (iii) There are four synonymous codons for valine, and the one-G/C codons GTA and GTT are used more frequently than the two-G/C codon GTC. (iv) The [X]TG codons, such as CTG (for leucine) and CTG (for valine), are used sparingly, likely because synonymous codons with lower GC contents are available. The opposite is true when the synonymous codons of the high-GC-content group of bacteria are considered. Bacteria with high GC contents often prefer codons rich in G and C.
|
View this table: [in a new window] |
TABLE 2. Bacterial codon usages
|
A juxtaposition of [X][X]T-AG[X] codons would lead to the formation of TAG in the third reading frame. The AG[X] codons are used by two amino acids, namely, arginine and serine. Both of these amino acids have six synonymous codons. Again, the GC content of the chromosome seems to dictate codon usage in these bacteria. The arginine codon AGA (a one-G/C codon) is more commonly used by the low-GC-content group, but all high-GC-content bacteria favor the three-G/C codon (CGC). Similarly, there are six synonymous codons for serine: AGC, AGT, TCG, TCA, TCT, and TCC. The low-GC-content group (which has relatively more PSC TAG) generally prefers one-G/C codons (AGT, TCA, and TCT), while the high-GC group (which has less PSC TAG) generally prefers the two-G/C codons (AGC, TCG, and TCC).
Some bacteria show a very strong preference for a particular type of PSC. However, there is no evidence to support that such a strong preference for a particular type of PSC is affected by the codon usage. For example, Nocardia farcinica has the greatest percentage of PSC TGA in its genes. However, the frequencies of use of the [X]TG codon CTG (60.65/1,000) is only slightly higher that the frequency of CTG codon usage in D. radiodurans (59.83/1,000). B. garinii has the highest percentage of PSC TAA in its genes, and Chlamydia trachomatis has the highest percentage of PSC TAG. However, the frequencies of [X]TA codons of these two bacteria are generally comparable to those of other low-GC-content bacteria. Staphylococcus aureus is a low-GC-content bacterium. It has significantly fewer PSC on its genes, but the usages of [X]TA and [X]TG codons by Staphylococcus aureus are not significantly less frequent than the usages of these codons by the rest of the low-GC-content group.
|
|
|---|
The Ohno theory of evolution by genome expansion is widely accepted (4, 15, 18). Genome expansion by DNA recombinations would create a repertoire of redundant genes. The extra copies of duplicated genes are free substrates for the evolution of proteins through base substitutions (6). Many details regarding the mechanism of genome evolution remain unclear. The environment of some bacteria may change rapidly and, the species would become extinct if they failed to response in time. However, genomic change through base substitutions is a slow process (11). Fusion of the surplus redundant genes could allow some bacteria to leap forward rapidly. Gene recombination has two patterns: in-frame recombination and off-frame recombination. Obviously, in-frame concatenation of two gene fragments would allow a gene to elongate, but because the amino acids encoded by the newly formed gene are essentially identical to the original parental sequences, the topology (and thus the function) of a newly formed fusion gene product might not differ significantly from that of its parental proteins. This new gene would require a long time of continual modifications to evolve into a functionally different protein. Therefore, in-frame recombination genes might not change fast enough to allow adaptation to a rapidly changing environment. The most effective way to create new protein significantly different from the parental proteins is by off-frame recombination. Because of the frameshift effect, off-frame recombination would instantaneously create a protein with a different topology (and possibly new function). If this new protein could enhance the survival of the cell, the species would survive. Thus, the success of off-frame recombinations in creating a new functional gene would be influenced by the quantity of PSC trimers in the DNA: the fewer the number of PSC trimers in the gene fragment, the more likely a longer DNA fragment could be inserted into an existing gene without truncation.
In addition to the number of PSC trimers in the genes, the quality of each type of PSC trimers might also be important for the success of an off-frame gene recombination event. The efficiencies of TGA, TAA, and TAG as translational stop codons are quite different. In Salmonella enterica serovar Typhimurium, the leakiness of the TGA codon occurs at a frequency of at least 10–2 to 10–3 (21), while TAG and TAA are more error-proof, at about 7 x 10–3 to 1.1 x 10–4 (5) and 9 x 10–4 to 1 x 10–5 (22), respectively. The mechanism of termination at the TGA site is complex. In many cases, TGA is the site of a programmed frameshift. A programmed frameshift allows the ribosome to pass over the UGA stop signal and continue reading the mRNA. Programmed frameshift is a regulatory mechanism for controlling the gene expression of many proteins (13, 19). Furthermore, protein termination at the bacterial UGA site is not straightforward. Successful termination at the UGA site is influenced not only by the concentration of the releasing protein RF2 (25) but also by the ribonucleotide sequences before and after the UGA site on the mRNA (26). In fact, the expression of the bacterial RF2 gene is itself governed by a programmed frameshift (25). TGA also serves as the code for selenocysteine (16), the 21st amino acid, in a wide range of organisms but tryptophan in other organisms and organelles (14). Interestingly, mutations at the TGA site in various genes are very rare (21). All these evidences suggested that TGA might not serve solely as a terminator for protein translation. Perhaps, in the event of an off-frame gene fusion, the presence of these leaky TGAs might allow some off-frame reading, at least temporarily, until better DNA repair (such as base substitution) could occur. Thus, the bias toward a leaky TGA might provide an immediate and provisional relief to a bacterium facing a rapidly changing environment. The use of TGA as a "switching point" might be analogous to the use of ATA codon in many proteins. For an example, Barrai et al. (2, 3) showed that ATA could stimulate polymerase attachment inside coding sequences. In fibrinogen, the relative abundance of the ATA codons might permit the production of pieces of the fibrinogen molecule still functional in a stoppage network and benefit the organism.
The strategy of creating new genes by off-frame recombination might benefit only some organisms and might be harmful to others, depending on the ecophysiological status of the individual organism. The metabolically versatility of a bacterium is often correlated to its environment (9). This might explain why many bacteria, such as Deinococcus, Pseudomonas, Salinibacter, Azoarcus, Klebsiella spp., and the skin-associated Staphylococcus, all known for their metabolic versatilities, contain very few PSC trimers in their genes (Fig. 3). Symbionts, such as Fusobacterium, Rickettsia, and Borrelia, often contain large number of PSC trimers in their genes. Symbionts live in a stable environment. A sudden change by frameshift recombination would create an undesirable product and disrupt their long-established relationship with the host, leading to the collapse of the symbiotic relationship. This might explain why there are many more PSC serving as check marks in the genes and the use of the more error-proof TAA and TAG as PSC in their genes. We should emphasize that, although the number of PSC trimers in the gene and GC content of a chromosome are generally inversely related, and although the intrinsic nature of the stop codons are AT rich, the notion that TA-rich genomes should contain more PSC is not true. For example, the genome of Staphylococcus aureus is AT rich (GC content = 32.8%), but its PSC in the gene is also quite low (less than 25 per gene). This versatile pathogen, which is well known for its resistance to antibiotics, is commonly found on the skin. Unlike other intracellular parasites, the environment of the skin changed rapidly. Having fewer PSC in its genome might enhance survival of S. aureus.
The inverse relationship between the GC content and the number of TAG codons remains unexplained. Perhaps bacteria use a yet-unknown mechanism to remove TAG sequences from their genes. Systematically replacing those AG[X] and [X]TA codons with other codons in the gene would eventually reduce the number of TAG triplets in the genome. Therefore, the higher the GC content, the fewer TAG codons that could be found in the gene.
We kindly acknowledge K. Gartner, M. Beck, and S. Schwartzback for their critical reviews and discussions.
Published ahead of print on 15 August 2008. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»