Previous Article | Next Article ![]()
Journal of Bacteriology, October 2004, p. 6536-6543, Vol. 186, No. 19
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.19.6536-6543.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
School of Molecular and Microbial Biosciences, University of Sydney, Sydney, Australia,1 Centre for Functional Genomic Research, TEDA College, Nankai University, TEDA, Tianjin, People's Republic of China2
Received 17 March 2004/ Accepted 1 July 2004
|
|
|---|
|
|
|---|
The genes involved in O-antigen biosynthesis are generally found on the chromosome as an O-antigen gene cluster, and in E. coli, S. enterica, and C. freundii these gene clusters generally lie between the galF and gnd genes (15). We, among others, have undertaken an extensive analysis of many O-antigen gene clusters to determine the genetic basis of O-antigen evolution. More than 20 E. coli and 7 S. enterica O-antigen gene clusters have been sequenced (17), but until now no sequence data regarding a C. freundii O-antigen gene cluster has been reported. Overall, O-antigen gene clusters have G+C contents lower than the genome average (usually less than 40% in E. coli and S. enterica, compared to the usual 51%). This atypical G+C content seen in so many O-antigen gene clusters provides strong genetic evidence that the O-antigen gene clusters were acquired, by interspecies lateral transfer, from different bacterial speciesin each case a species whose genome has an average G+C content lower than that of these gram-negative bacteria. It is presumed that the extensive diversity of O antigens is a result of this lateral gene transfer. Indeed, evidence for recombination events involving O-antigen gene clusters within and between different species has been found (5, 19, 21, 22, 28).
E. coli, S. enterica, and C. freundii are closely related. However, a combination of serological and structural analyses of the corresponding O antigens has revealed that they have very few structures in common. Only three structures are found in both E. coli and S. enterica; one structure is found in E. coli O111 and S. enterica O35 (8), the second structure is found in E. coli O55 and S. enterica O50 (8, 11), and the third structure is found in E. coli O157, S. enterica O30, and C. freundii F90 (3, 13, 24). Other examples include the C. freundii O35 and S. enterica O59 O antigens, which are also identical (9).
In general, we can envisage three possible explanations for the presence of a common O antigen in two different species. The most simple explanation is that the O-antigen gene cluster was present in the common ancestral species. In this case the genomic organizations of the individual gene clusters would be similar, and the level of variation between the genes would be typical of the level of variation found in housekeeping genes. Second, the O-antigen structures may have arisen as a result of acquisition by interspecies lateral transfer of the cluster since species divergence. In this scenario the genomic organizations of the gene clusters would be similar, but the level of variation between the genes would be unrelated to the level of variation expected for typical E. coli, S. enterica, and C. freundii housekeeping genes; it would be less if the variation is among the species being considered and greater if the variation is from more divergent species. Finally, the corresponding O-antigen gene clusters could have arisen independently, in which case the genomic organization of the individual gene clusters would be different and the level of variation in the genes would generally be high.
The fact that there are very few O antigens common to the related and well-studied species E. coli and S. enterica shows that there has been extreme turnover of O antigens since species divergence. In this paper we examine the basis for the presence of identical O-antigen structures in these species and also C. freundii. The E. coli O111 and S. enterica O35 O-antigen gene clusters have been sequenced previously, and it has been proposed that the most likely reason for the presence of identical O antigens in these two organisms is that the two clusters diverged from a common ancestor (27). The E. coli O157 and E. coli O55 O-antigen gene clusters have also been sequenced previously (25, 26) (accession numbers AF061251 and AF461121), and the genes necessary for O-antigen biosynthesis have been identified. We report here additional sequencing of the S. enterica O30, C. freundii F90, and S. enterica O50 O-antigen gene clusters, which allowed analysis of the genetics of all O antigens common to E. coli and S. enterica. Below we comment on the basis for the identity of different O-antigen structures across species.
|
|
|---|
Bacterial strains. S. enterica O30 (lab stock number M284) and S. enterica O50 (lab stock number M290) were obtained from the Institute of Medical and Veterinary Sciences, Adelaide, Australia.
C. freundii F90 (lab stock number M1972) was obtained from N. Strockbine, Centers for Disease Control and Prevention, Atlanta, Ga. (3).
Construction of random shotgun libraries. Chromosomal DNA was prepared by using Wizard DNA preparation kits from Promega. The O-antigen clusters were PCR amplified from the chromosomal DNA by using the Expand Long Template PCR system (Roche). The primers used were 1523 (5'-ATTGTGGCTGCAGGGATCAAAGAAATC) and degenerate primer 1524 (5'-TAGTCRCGCTGNGCCTGRATYARGTTMGC) (M = A or C; N = A, C, G, or T; R = A or G; Y = C or T), which bind to the upstream galF gene and the downstream gnd gene, respectively. The PCR cycles were as follows: 94°C for 2 min; 30 cycles of 94°C for 10 s, 60°C for 30 s, and 68°C for 15 min; and then 68°C for 7 min. The PCR products were sheared by using Geneworks Hydroshear according to the manufacturer's instructions. The DNA was then purified by using a Wizard PCR DNA preparation kit (Promega) and was resuspended in 35 µl of water. Eight nanograms of DNA was subjected to T4 DNA polymerase repair and single deoxyribosyladenine tailing with a Novagen single deoxyribosyladenine tailing kit. The reaction product (85 µl) was then extracted with chloroform-isoamyl alcohol (24:1) and ligated to pGEM-T-easy (Promega) according to the manufacturer's instructions. Ligation was carried out overnight at 4°C, and the ligated DNA was precipitated and resuspended in 20 µl of water before it was electroporated into E. coli JM109 and plated on agar plates containing 5-bromo-4-chloro-3-indolyl ß-D-galactopyranoside and isopropyl-ß-D-1-thiogalactopyranoside (IPTG). A DNA template was prepared from the resultant colonies by using a 96-well-format Millipore plasmid DNA miniprep kit. Three microliters of DNA was sequenced with primers M13F (5'-TGTAAAACGACGGCCAGT) and M13R (5'-CAGGAAACAGCTATGAC).
Specific primers were designed to PCR amplify any regions of the DNA in which sequence was missing. Each PCR was performed in a 50-µl (total volume) mixture by using Taq polymerase (NEB) as recommended by the protocol. The PCR cycles were as follows: 94°C for 2 min; 30 cycles of 94°C for 30 s, 56°C for 30 s, and 72°C for 2 min; and 72°C for 5 min. Two microliters of the PCR products was electrophoresed on an agarose gel to check for amplified DNA and was subsequently purified by using a Wizard PCR DNA preparation kit (Promega) and resuspended in 35 µl of water. Three microliters of DNA was sequenced with the same primers used for PCR amplification.
PCR of upstream and downstream regions of S. enterica O50. Chromosomal DNA was prepared by using Wizard DNA preparation kits from Promega. The gne gene, which encodes N-acetylglucosamine 4-epimerase (2), was PCR amplified by using primers 5278 (5'-ACAGATTGGTGATGTTCG) and 5280 (5'-GATTTCTTTGATCCCTGCAGCCAC), which bind at the 5' end of the gne gene and in the downstream galF gene, respectively. The PCR was performed in a 50-µl (total volume) mixture by using Taq polymerase (NEB) as recommended by the protocol. The PCR cycles were as follows: 94°C for 2 min; 30 cycles of 94°C for 30 s, 56°C for 30 s, and 72°C for 2 min; and 72°C for 5 min. Two microliters of the PCR products was electrophoresed on an agarose gel to check for amplified DNA and was subsequently purified by using a Wizard PCR DNA preparation kit (Promega) and resuspended in 35 µl of water. Three microliters of DNA was separately sequenced with primers 5278 and 5280. A similar method was used for PCR amplification of the downstream regions.
Sequencing and analysis. Sequencing was performed with an Applied Biosystems 377 automated DNA sequencer. Sequence data were assembled and analyzed by using the Australian National Genomic Information Service, which incorporates several sets of programs (A. H. Reisner, C. A. Bucholtz, J. Smelt, and S. McNeil, Proc. 26th Annu. Hawaii Int. Conf. Systems Sci., 1993). BLAST and PSI-BLAST (1) were used for searching databases, including the GenBank and Pfam protein motif databases, for possible functions. Sequence alignment and comparisons were performed by using the ClustalW program (23). The TMHMM v2.0 analysis program (http://www.cbs.dtu.dk/services/TMHMM-2.0/) was used to identify potential transmembrane segments from the amino acid sequence.
Nucleotide sequence accession numbers. The DNA sequences of the three O-antigen gene clusters have been deposited in the GenBank database under accession numbers AY730592, AY730593, and AY730594.
|
|
|---|
![]() View larger version (10K): [in a new window] |
FIG. 1. O-antigen structures. (A) E. coli O157-S. enterica O30-C. freundii F90 O-antigen structure. (B) E. coli O55-S. enterica O50 O-antigen structure.
|
![]() View larger version (21K): [in a new window] |
FIG. 2. O-antigen gene clusters. (A) E. coli O157, S. enterica O30, and C. freundii F90 O-antigen gene clusters. (B) E. coli O55 and S. enterica O50 O-antigen gene clusters. Glycosyltransferase genes whose designations begin with w are indicated by the final letter of the designation (e.g., E. coli O157 wbdN is indicated by N, and E. coli O55 wbgM is indicated by M).
|
The orders of the 11 genes common to the E. coli O157, S. enterica O30, and C. freundii F90 O-antigen gene clusters are identical, indicating that the clusters have a common ancestor. If the O-antigen gene clusters were acquired via a lateral transfer event(s), the levels of similarity would be unrelated to the levels of similarity generally observed for housekeeping genes in these species. In order to ascertain the relationships of conserved housekeeping genes across the different species, we compared six housekeeping genes that have been sequenced previously in E. coli, S. enterica, and C. freundii. The levels of identity between equivalent genes in E. coli and S. enterica ranged from 86.3 to 91.8% (Table 1). This is in accordance with Sharp's observation that 93% of E. coli and S. enterica housekeeping genes have levels of identity between 76.3 and 100% (18). When either E. coli or S. enterica was compared to C. freundii, the levels of identity of the housekeeping genes were very similar, ranging from 84.5 to 89.7% (Table 1). Table 2 shows the levels of amino acid identity for the putative proteins encoded by the genes in each of the O-antigen gene clusters.
|
View this table: [in a new window] |
TABLE 1. Levels of DNA identity for six E. coli, S. enterica, and C. freundii housekeeping genes
|
|
View this table: [in a new window] |
TABLE 2. Levels of amino acid identity for the proteins encoded in the E. coli O157, S. enterica O30, and C. freundii F90 O-antigen gene clusters and G+C content of each gene in the clusters
|
E. coli O55 and S. enterica O50 O-antigen gene clusters. The S. enterica O50 and E. coli O55 O antigens contain D-Gal, D-Gal2NAc, D-Glc2NAc, and D-Col (Fig. 1B). The E. coli O55 O-antigen gene cluster (25) contains nine genes between galF and gnd, including the genes required for the initial stages of GDP-D-Col synthesis, namely, manB, manC, and gmm (manA is found elsewhere on the chromosome). The gene cluster is atypical in that col1 and col2, the genes required for the final stages of GDP-D-Col biosynthesis, are downstream of gnd (25), in a region that can be thought of as an extension of the typical O-antigen gene cluster. The E. coli O55 O-antigen gene cluster also contains four glycosyltransferase genes (wbgM, wbgN, wbgO, and wbgP) (25) and the O-antigen processing genes, wzx and wzy. Finally, immediately upstream of galF there is a gne gene for biosynthesis of UDP-D-Gal2NAc from UDP-D-Glc2NAc. Most O-antigen gene clusters include all of the genes between the galF and gnd genes. Clustering of genes in this manner is best explained by the selfish operon model (10), in which clustering confers selective benefit to genes that together have a function subject to lateral transfer. This is because being in a cluster facilitates lateral transfer as a group. O-antigen gene clusters are well known to undergo lateral transfer, and the E. coli O157 gene cluster is only one case that has been documented (22, 25). In the case of O antigens, lateral transfer occurs readily within a species, where it usually involves replacement of the preexisting O-antigen gene cluster by homologous recombination outside the gene cluster. The arrangement in this group of O antigens allows cotransfer but is not the most efficient arrangement and can be seen as an intermediate stage in the process of bringing all of the genes into a single cluster (25).
The S. enterica O50 O-antigen gene cluster was amplified by long-range PCR by using primers based on the galF and gnd genes, and it was sequenced as described in Materials and Methods. Genes both upstream and downstream of this region were also sequenced. The extended 21.5-kb S. enterica O50 O-antigen gene cluster has the same genes in the same order as the E. coli O55 O-antigen gene cluster (Fig. 2B), indicating that these clusters have a common ancestor. Similar to the situation seen with E. coli O157, S. enterica O30, and C. freundii F90, the level of similarity of genes within the E. coli O55 and S. enterica O50 O-antigen gene clusters is lower than the level of similarity for housekeeping genes. Furthermore, although all four S. enterica O55 glycosyltransferases and the O-antigen processing proteins are most similar to the proteins encoded by the E. coli O55 O-antigen gene cluster, the levels of similarity between the molecules are still relatively low, lower than the levels of similarity for the nucleotide-sugar biosynthesis proteins (Tables 3, 4, and 5). As with the E. coli O157-S. enterica O30-C. freundii F90 group, we discuss probable explanations for these observations below.
|
View this table: [in a new window] |
TABLE 3. Levels of DNA identity for the GDP-L-Fuc and GDP-D-PerNAc biosynthesis genes in the E. coli O157, S. enterica O30, and C. freundii F90 O-antigen gene clusters
|
|
View this table: [in a new window] |
TABLE 4. Levels of DNA identity for the GDP-D-Col biosynthesis genes in the E. coli O55 and S. enterica O50 O-antigen gene clusters
|
|
View this table: [in a new window] |
TABLE 5. Levels of amino acid identity for the proteins encoded by the genes in the E. coli O55 and S. enterica O50 O-antigen gene clusters and G+C content of each gene
|
The presence of GDP-sugar synthesis pathway genes in the O-antigen gene cluster indicates that before acquisition of col1 and col2, the ancestral gene cluster included a sugar pathway for a GDP-sugar other than GDP-D-Col. Incorporation of the two GDP-D-Col synthesis genes by lateral transfer, probably mediated by H-rpt and/or a transposase, subsequently allowed synthesis of GDP-D-Col. We suggest that the ancestral GDP-sugar was L-Fuc, which is very closely related to D-Col, because in addition to manA, manB, manC, and gmd, which are common to many GDP-sugar pathways (17), there is a remnant of the GDP-fucose-specific fcl gene situated between gmd and gmm in the S. enterica O50 O-antigen gene cluster. Except for an additional transferase gene, this is the same gene order as the order of the GDP-L-Fuc biosynthesis genes in the E. coli and S. enterica CA clusters. There is no fcl gene in the E. coli O55 O-antigen gene cluster, presumably due to more extensive deletions than in S. enterica O50, which have occurred since the acquisition of the GDP-D-Col synthesis genes. It is interesting that there is no transferase gene associated with the col genes, and presumably the transferase that originally transferred Fuc now transfers Col. The product of wbgM, upstream of gnd, has similarity to GDP-L-Fuc transferases and is the most likely candidate.
Interestingly, neither the E. coli O55 nor the S. enterica O50 O-antigen gene cluster contains a complete gmm gene, as is found in all other E. coli and S. enterica clusters whose corresponding O-antigens contain a GDP-sugar that has been synthesized from GDP-mannose. However, in both clusters there is a remnant gmm gene (57% identity to O157 gmm) between gmd and manC. In S. enterica O50 this remnant gene is situated adjacent to the remnant fcl gene, and it is possible that mutation of the S. enterica O50 gmm and fcl genes resulted from a single deletion event. Similarly, it is likely that the deletion event that resulted in the mutation of E. coli O55 gmm also caused the complete deletion of the E. coli O55 fcl gene.
|
|
|---|
We first looked at the complicating issue of an interaction between the O-antigen gene clusters and the CA gene cluster. The CA gene cluster, just upstream of galF in E. coli, S. enterica, and presumably C. freundii, contains a set of GDP-Fuc pathway genes (Fig. 3). These genes have a relatively high G+C content, which interestingly is higher in E. coli K-12 than in S. enterica LT2. Of interest is the finding that in several cases the manB gene of the O-antigen gene cluster has the G+C content of the CA manB gene and a high level of sequence similarity (7, 20). This is attributed to some form of gene rearrangement that at least has the effect of a gene conversion event, in which the gene in the O-antigen gene cluster is replaced by the equivalent gene of the CA gene cluster. The same situation has been shown to apply to some gmd genes, and in the strains discussed here the findings are applicable to gmd, fcl, gmm, and manB of S. enterica O30 and O50. Each of these genes has a high G+C content (55.4 to 61.0%) (Tables 2 and 5), and the sequence is very similar to the sequences of the corresponding S. enterica CA genes (Table 6). We concluded that there were gene conversion events in which the S. enterica O30 and O50 gmd, fcl (now a remnant in S. enterica O50), gmm, and manB genes were replaced by genes from the CA gene cluster. Indeed, examination of the DNA sequence at the 3' end of the CA and O-antigen gmm genes allowed us to identify a putative recombination site 40 bp prior to the end of the coding sequence (Fig. 4). This phenomenon also applies to the manB gene of the E. coli O157 and C. freundii gene clusters. It is interesting that the manC gene is not affected in any of the cases. The C. freundii gmd, fcl, and gmm genes may also have undergone conversion as the G+C contents are higher than those for other genes in the gene cluster; however, because of a G+C content of about 45% the situation is not as clear as it is for S. enterica, and we do not have C. freundii CA genes for sequence comparison.
|
View larger version (4K): [in a new window] |
FIG. 3. Region of the E. coli CA gene cluster that contains the GDP-L-Fuc biosynthesis genes.
|
|
View this table: [in a new window] |
TABLE 6. Levels of amino acid identity for the proteins encoded in the CA gene cluster and by O-antigen gene cluster GDP-L-Fuc biosynthesis genesa
|
![]() View larger version (21K): [in a new window] |
FIG. 4. Comparison of the DNA sequences at the 5' end of the gmm gene in the S. enterica O50 CA gene cluster and the 5' end of the gmm gene in the S. enterica O50 O-antigen gene cluster for identification of a putative recombination site.
|
There are several factors that could contribute to higher-than-usual levels of divergence. First, housekeeping genes are subject to random genetic drift across species. Mutations neutral to natural selection accumulate, and sequences of related species diverge. The level of fitness loss that natural selection treats as neutral depends on the effective population size (Ne). For housekeeping genes Ne is related to the whole species, although population structure has a complicating effect. In contrast, for genes in polymorphic gene clusters, like those discussed here, Ne is much less, greatly reducing the level of fitness required for a mutant to be treated as neutral and subject to fixation by random genetic drift. Thus, a higher proportion of mildly deleterious mutations would be liable to fixation by random genetic drift, increasing the rate of sequence divergence. A second possibility is related to the presumed origin of the genes from outside the E. coli-S. enterica-C. freundii species group, based on their generally low G+C content. There could well be ongoing selection pressure for better adaptation to the enterobacterial situation after transfer from low-G+C-content species. This would lead to fixation of mutations for better adaptation and, as there are probably many routes to such adaptation, would also increase the rate of sequence divergence. Finally, there is the possible effect of the genes being in only a small proportion of the species on opportunities for recombination. Strains with any given O antigen are part of the whole species in terms of chance of DNA transfer, and in most cases such DNA is from cells with a different O antigen, although some of the same genes may be present. The effect of this in the dynamics of recombination at the molecular level are not known, but it could well influence the rate of sequence divergence for an O antigen in two related species.
If the divergence observed for genes in the E. coli O157-S. enterica O30-C. freundii F90 and E. coli O55-S. enterica O50 groups of gene clusters is due to an increased rate of divergence of these genes during derivation from gene clusters in the common ancestor, then we have to account for the situation found in the E. coli O111-S. enterica O35 pair. These clusters have levels of divergence that were considered in 2000 (27) to be due to derivation from a gene cluster in the common ancestor. If the rate of divergence in O-antigen gene clusters is generally higher than that for housekeeping genes, then this view would have to be revised, and there is the likelihood that the gene cluster was transferred from one species to the another since species divergence.
The E. coli O157-S. enterica O30-C. freundii F90 and E. coli O55-S. enterica O50 groups of gene clusters have very similar patterns, and the levels of amino acid identity are generally between 55 and 65% for glycosyltransferase and O-antigen processing genes (69.4% for Wzx of E. coli O157 and S. enterica O30) and between 75 and 93% for nucleotide-sugar pathway genes. For the E. coli O111-S. enterica O35 pair the ranges are 75 to 83% and 87% to 93%, respectively. There is a consistent pattern of greater divergence in the glycosyltransferase and O-antigen processing genes than in the nucleotide-sugar pathway genes. This has to be accounted for regardless of the time taken for divergence, as in each case it seems clear that the gene clusters have a common ancestor. It has been observed previously that nucleotide-sugar biosynthesis genes are generally more conserved than the glycosyltransferase and O-antigen processing genes, but this could be related to the different specificity of the latter classes of genes. However, if the gene clusters discussed here originated from a common ancestor, we have to ask why the glycosyltransferase and O-antigen processing genes are diverging at a higher rate than the nucleotide-sugar pathway genes. We now have the sequences of three sets of the same gene cluster in two or three species. There is good evidence that there are different rates of divergence among the genes in these gene clusters. We can only assume that there is some difference in the pressures exerted by natural selection or drift that results in this consistent pattern of divergence in the genes of O-antigen gene clusters.
It is clear that we do not at present have sufficient evidence to determine unequivocally the origins of the gene clusters for the three structures discussed here. There is a strong possibility that two of the three clusters were in the common ancestor, in which case there has been a much higher-than-normal rate of divergence for these gene clusters. There is a need for both experimental and theoretical analyses of the phenomenon and of the reasons for different rates of divergence for different classes of genes if this is indeed the case.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»