Previous Article | Next Article ![]()
Journal of Bacteriology, January 2002, p. 479-487, Vol. 184, No. 2
0021-9193/01/$04.00+0 DOI: 10.1128/JB.184.2.479-487.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
Microbial Evolution Laboratory, National Food Safety and Toxicology Center, Michigan State University, East Lansing, Michigan 48824
Received 23 July 2001/ Accepted 25 October 2001
| ABSTRACT |
|---|
|
|
|---|
, Int-ß, or Int-
. The sequence from the O111:H9 clone consistently showed a close relationship with that from E2348/69, a distantly related strain that expresses Int-
. The results suggest that there have been multiple acquisitions of the LEE in the EHEC 2/EPEC 2 clonal lineage, with a recent turnover in either O111:H8 or its close relatives. Amino acid substitutions that alter residue charge occurred more frequently than would be expected under random substitution in the extracellular domains of intimin, suggesting that diversifying selection has promoted divergence in this region of the protein. An N-terminal domain that presumably functions in the periplasm may also be under positive selection. | INTRODUCTION |
|---|
|
|
|---|
The acquisition of the LEE in the EPEC and EHEC clonal lineages is considered a key evolutionary event that set the stage for the parallel emergence of pathotypes (29). According to one model (38), LEE inserted into the selC site and the subsequent divergence and acquisition of different mobile virulence elements gave rise to two pathogenic lineages, EPEC 1 and EHEC 1 (40). In another ancestral clone, LEE inserted into the pheU site (32) and the EHEC 2 and EPEC 2 clones diverged through a second series of acquisition events (5). The stepwise evolutionary scenario provides a working hypothesis for the persistence of LEE in clonal lineages and the subsequent in situ divergence of genes through mutation and recombination.
Genetic variation in the intimin gene is essentially concordant with the evolutionary model. At least five antigenic types have been detected, and the allelic types are generally associated with particular clonal lineages (1). Int-
is characteristic of EPEC 1 strains, whereas Int-
is found in EHEC 1 and closely related O55:H7 strains. The two sister groups, EPEC 2 and EHEC 2, usually carry Int-ß. However, there are puzzling exceptions to the clonal framework. For example, the EHEC 2 strains with serotype O111:H8 generally do not react with Int-ß antibodies, nor do they amplify with Int-ß specific PCR primers (28). This is an enigma because O111:H8 strains typically carry an eae homologue, detectable by either gene hybridization (4, 31) or PCR (28), and these strains can produce mucosal lesions in experimental infections (34). How can this be? One possibility is that the O111 eae sequence is a minor variant, say, of the Int-ß allele, that has diverged by point mutations. A second possibility is that it is a novel sequence resulting from intragenic recombination, such as seen in the
, ß, and
intimin variants (23). Another possibility is that the distinct intimin of the O111:H8 clone reflects the acquisition of an entirely different LEE island into an EHEC 2 clonal frame.
The objective of this study was to infer the evolutionary history of the intimin gene in the O111:H8 clone. To accomplish this goal, we sequenced the eae gene in five O111:H8 strains originally recovered from separate cases of disease more than 40 years apart. We also included two isolates of an O111:H9 clone whose intimin type is unknown. The nucleotide sequences were used to construct gene phylogenies for all or parts of the intimin alleles, to detect recombination within the eae gene, and to assess whether natural selection has promoted or constrained the rate of amino acid change in different intimin domains.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
Enzymatic amplification and nucleotide sequence determination. The program Primer Designer (version 2.0) was used to design primers in the two genes flanking eae (escD and cesT) (6, 7): escD1143F 5'-CAT TCT GAA AGG AGG CTA TGT C-3'; and cesT242F, 5'-TAT GGT TTG CAG AGA ATG GTG G-3'. The primers were used in the PCR at a final concentration of 0.5 µM with deoxynucleoside triphosphates at 0.2 mM each, 2.5 U of TaqPlus Precision (Stratagene), and 100 ng of DNA template.
The thermal cycle, which was proceeded by a 3-min soak at 94°C, was run for 35 cycles on a Perkin-Elmer 9700 with the following parameters: 92°C, 40 s; 66°C, 1 min; and 72°C, 3.5 min. The resulting 3.4-kb amplicons were purified with Qiaquick PCR purification kits (Qiagen). DNA was electrophoresed in ethidium bromide-stained gels and quantified under UV illumination by comparison to a low-mass DNA ladder (Gibco BRL).
Cycle sequencing reactions were performed with CEQ dye terminator cycle sequencing kits (Beckman) with approximately 50 fmol of template and a final primer concentration of 2 µM. The thermal cycle was run for 30 cycles with the following parameters: 94°C, 20 s; 57°C, 20 s; and 60°C, 4 min. Reactions were purified with Centricep columns (Princeton Separations), and the DNA ladder was detected on a Beckman CEQ2000 capillary sequencer. Sequences were concatenated and aligned with the program SEQMAN in the computer package DNASTAR. Internal primers for sequencing were sequentially designed as data were generated.
Sequence analysis. CLUSTAL X (33) was used to produce a multiple alignment of 28 inferred amino acid sequences, which included the seven sequences determined here and 21 sequences retrieved from GenBank. Phylogenetic trees were constructed with the neighbor-joining algorithm (30) with the program MEGA version 2.0 (18). Trees were based on synonymous distance (dIs) calculated by the modified Nei-Gojobori method (15, 25) with a Jukes-Cantor correction applied to account for multiple substitutions at single nucleotide sites. The genes were partitioned into three segments for phylogeny estimation. The first segment specifies the 186 residues at the N terminus, which presumably functions as the periplasmic (PP) domain. The second segment includes the conserved central domain identified by McGraw et al. (23), which spans the region from the alanine residue at position 187 through the lysine residue at position 517 (Fig. 1). The third segment included the residues at the C terminus (residues 518 to 945), which comprises the four extracellular (EC) domains identified by Luo et al. (20).
|
We used two methods to detect the past action of natural selection in intimin evolution. First, the proportion of synonymous differences per synonymous site (pS) and the proportion of nonsynonymous difference per nonsynonymous site (pN) were estimated by the Nei-Gojobori method (25) with MEGA (18). Variation in functional constraint across the eae gene was examined by tabulating the average pS and pN for each of the functional domains delineated by Luo et al. (20) and for 30-codon subsets in a sliding window using a computer program called PSWIN. These quantities have been used to detect adaptive evolution (26), because the rates of evolution per site are expected to be equal for selectively neutral mutations (pS = pN), the synonymous rate exceeds the nonsynonymous rate for purifying (negative) selection (pS > pN), and the nonsynonymous rate exceeds the synonymous rate for diversifying (positive) selection (pN > pS). Second, we used the method of Hughes et al. (13) to assess whether amino acid replacements that change residue property (radical change) occur more often than chance would dictate. The method estimates the rate of radical nonsynonymous change (pNR) versus the rate of conservative nonsynonymous change (pNC). We computed pNC and pNR with radical changes defined as charge or polarity for the transmembrane (TM) and EC domains. These are "per site" measures that are expected to be equal for selectively neutral mutations (13).
Nucleotide sequence accession numbers. The nucleotide sequences determined in this study have been submitted to GenBank under accession numbers AF449414 to AF449420. The eae alignment is available from the authors website (http://foodsafe.msu.edu/whittam/).
| RESULTS |
|---|
|
|
|---|
To determine the evolutionary relationships of the eae genes from O111 strains with other known intimin variants of other pathovars, the O111 eae sequences were aligned with 21 homologous genes retrieved from GenBank. A multiple alignment with CLUSTAL X yielded a total of 951 amino acid positions of the combined data set of 28 inferred amino acid sequences. A number of gaps were introduced in the alignment, the majority of which were in the C-terminal region of the gene. Among the 28 sequences, there were 21 unique variants at the nucleotide level. Part of the multiple alignment, including the predicted amino acid sequences for the intimins of the O111:H8 and O111:H9 strains to the
(EPEC O127:H6), ß (EPEC O111:H2), and
(EHEC O157:H7) intimins, is shown in Fig. 1. The alignment reveals that the eae sequence of the O111:H8 clone is divergent from other intimins (Fig. 1). We hereafter refer to O111:H8 intimin as the "Int-
" allele. Likewise, the sequence in O111:H9 is similar to the allele designated "
" (GenBank accession no. AJ298279.1) (J. Jores, K. Zehmke, L. Roumer, and L. Wieler, unpublished data), so we refer to the O111:H9 eae sequence as the Int-
allele class.
The level of divergence between intimins varies in different regions of the molecule (Table 2). For example, in the N-terminal region (codons 1 to 550) (Fig. 1), Int-
is most closely related to Int-ß, differing at 3.1% of the amino acid positions. In the carboxyl end of the molecule (codons 551 to 951) (Fig. 1), however, Int-
is most similar to Int-
and is more than 35% different from the other intimin alleles. For the Int-
of the O111:H9 lineage, the primary structure is nearly identical to Int-
in the N terminus but is much more divergent (24%) in the C-terminal domains.
|
-intimin in complex with Tir, Luo and colleagues (20) identified five functional domains: the N-terminal membrane anchor region, which includes the PP and TM domains, and four EC domains, labeled D0 to D3 (Fig. 1). Of these five domains, the PP and TM domains were the most conserved, with 104 (19%) of the 550 amino acid positions being variable. Most of the variable amino acid positions were found in the C terminus: 237 out of 393 sites (60%) were polymorphic across the four EC domains, and D3the domain that binds to Tirhad the greatest concentration of variable amino acid positions. At the nucleotide level, the five domains differ dramatically in the proportions of pS and pN nucleotide substitutions (Table 3). The average pS (x 100) equals 12.0 in the N-terminal domains and is more than four times greater in the EC domains. The average pN is also substantially greater in the EC domains than in the N-terminal region, with the greatest value in D3. The ratio of pS to pN, a measure of the strength of natural selection at the molecular level, ranges from the most conserved value of 4.4 in the PP + TM domains to the least conserved of 1.4 in D3 (Table 3). These ratios indicate that on average the N-terminal membrane anchor region has twice the selective constraint of the EC domains of intimin.
|
and Int-
in pathogenic O111 strains, we inferred a phylogeny for the three main regions of the eae gene: the N-terminal 186 residues that comprise the putative PP domain, the 331 residues that comprise the central conserved domain, and the 434 residues of the C-terminal EC domains (Fig. 2). The evolutionary relationship of the O111:H8 Int-
allele with the
, ß, and
alleles depends on the segment of the gene that was used to infer a phylogeny. The topology for the N-terminal segment places Int-
with the Int-ß allele cluster, a cluster that includes an EPEC O111 strain and an EHEC O26 strain (Fig. 2A). In contrast, Int-
clusters first with Int-
in the tree constructed from the conserved central domain (Fig. 2B). The tree based on the EC domains indicates a third relationship in which Int-
is most closely related to the Int-
alleles characteristic of the EHEC O157:H7 strains (Fig. 2C). The close connection between Int-
and Int-
was also obtained when individual gene trees were constructed for each EC domain separately (results not shown).
|
sequence for O111:H9 strains consistently showed a closer relationship with the Int-
than with either Int-ß or Int-
, regardless of the segment that was used to construct the gene tree. Sequences for two O84 isolates (Int-
) also consistently clustered with Int-
and were closest to the O111:H9 eae sequences. However, the Int-
sequence was divergent from the four
alleles in the EC domains (Fig. 2C). Such striking differences in the phylogeny for different parts of a molecule can result from natural selection altering the rate of molecular evolution (and thus distorting the tree) or from past horizontal transfers and recombination which can create mosaic alleles from gene segments with distinct histories. In the next sections, we analyze the mosaic structure of intimin alleles and present evidence for radical amino acid change in the EC domains.
Heterogeneity and mosaic structure.
To determine points of significant heterogeneity in sequence divergence, we applied the maximum chi-square method (21) to pairs of intimin alleles. The comparison of Int-
to Int-
disclosed seven breakpoints with significant kmax values. The notable segment is the piece between positions 34 and 182, which differs at 12% of the nucleotides and is embedded in a region of 1 to 2% sequence divergence (Fig. 3). At the 3' end, there is remarkable heterogeneity ranging from 7 to 88% sequence difference in short stretches of 50 to 100 bp of DNA. Int-
and Int-
also show differences in the 3' end of the gene which encodes the PP domain. In this case, the conservation of the central domain extends through to codon 675 and includes D0. The final EC segment is 16.1% divergent in the two sequences, which reflects substantial sequence divergence but is less than half the divergence seen in the EC domains between other intimin allele classes. There were no significant breakpoints detected in the 5' end of the Int-
and Int-ß comparison. The first breakpoint is at position 490, followed by three points marking significant heterogeneity spaced at
100-bp intervals in the 3' end of the sequences.
|
|
with -
, -ß, and -
. This region presumably is involved in the periplasmic functions of intimin. The second region is in D2 and D3 of the C terminus of the protein. In this case, pN > pS only in the comparison of Int-
with Int-
and Int-ß. The comparison of pNC versus pNR shows that a greater proportion of amino acid replacements in the TM domain involves conservative substitutions. In contrast, amino acid replacements that involve charge changes occur more frequently in the EC domains than expected under random substitution (Fig. 5). However, radical changes that involve polarity occur less frequently. The results suggest that amino acid replacements that alter charge are selectively favored in the external domains of intimin.
|
| DISCUSSION |
|---|
|
|
|---|
Why is the intimin in the O111 strains uncharacteristic of that in the EHEC 2/EPEC 2 evolutionary lineage? One hypothesis is that recombination could have altered intimin in the two O111 clones. Alternatively, it is possible that the O111 strains have independently acquired a different copy of the LEE. To better resolve the history of the O111 sequences, we sequenced additional LEE genes from two isolates of O111:H8 (CL-37 and 3215-99) and one of O111:H9 (921-B4). We selected two genes that are known to be highly polymorphic: tir, which is in the same operon as eae, and sepZ, which is located
10 kb upstream. The tir and sepZ sequences from the O111:H8 lineage clustered with those from EPEC 1 strain E2348/69 (results not shown), which raises the possibility that the backbone of the LEE in O111:H8 is most closely related to the Int-
-associated LEE from EPEC 1. Sequences for O111:H9 also clustered with sequences from E2348/69. The results from additional LEE genes suggest a third hypothesis: that
-LEE is ancestral and that the O111 clones have retained the ancestral copy while related strains have lost
-LEE and gained a divergent copy of the island that carries Int-ß.
Sperandio et al. (32) found that two O111:H9 strains shared with EPEC 1 the selC insertion site for the LEE. Thus, it is clear that the LEE in O111:H9 has been acquired independently of other strains in EHEC 2 and EPEC 2, in which the LEE is typically in the pheU site. It is plausible that O111:H9 has retained an ancestral copy of the LEE in the selC site; however, on the basis of a phylogenetic reconstruction of pathogenic E. coli (5), at least six parallel losses of the LEE would be required if this hypothesis were true. A greater number could be required, depending on the exact placement of the O111:H9 clone in a complete phylogeny for EHEC 2 and EPEC 2. If O111:H9 separated prior to the diversification of the rest of the group, then a single loss of
-LEE and gain of ß-LEE are required in the clone that gave rise to the rest of the group. However, if O111:H9 diverged long after the diversification of the group, then many more independent losses and gains would have to be inferred. The most parsimonious explanation (and hence the likeliest) minimizes the number of evolutionary events that are required under a given hypothesis. It is more parsimonious to assume that the O111:H9 clone has lost (or never had) the ß-LEE backbone and independently acquired an
-LEE.
Retention of an ancestral
-LEE is not a plausible explanation for O111:H8 because the selC site is not occupied and because the LEE island is presumably in the pheU site (32). Intimin in O111:H8 is a mosaic of segments that have different evolutionary histories, with sequence from one region (the PP domain) clustering with those from other EHEC 2 strains. Has recombination simply modified an Int-ß in O111:H8? This seems unlikely because the divergence of Int-
from Int-ß at synonymous sites (dS = 0.095) in the PP domain is 30 times greater than the average divergence among Int-ß alleles (dS = 0.003); the distance between the two-allele classes is considerable, as it represents about half the divergence among intimin allele classes in E. coli (Fig. 2A). Moreover, not only is the intimin in O111:H8 different from other EHEC 2 clones, but the sequences for tir and sepZ are actually more similar to those from the EPEC 1 strain that harbors Int-
. Thus, the LEE in O111:H8 is a novel mosaic made up of divergent segments that differ from those found in other E. coli strains. One implication of these results is that the LEE has turned over recently either in O111:H8 or in its sister O26 clone. It is interesting that O111:H8 has a Tir-binding domain that is
-like, whereas Tir itself is
-like. It remains to be investigated how modified LEE genes can function in a divergent LEE backbone, how such divergence influences pathogenesis, and how each LEE backbone is regulated in different genetic backgrounds.
Evidence for selection on intimin domains. Bacterial genes encoding proteins that are secreted or exposed on the cell surface characteristically show a high level of sequence polymorphism (19, 37). One hypothesis to explain the variation in exposed proteins is that diversifying selection accelerates the rate of amino acid substitutions, thereby generating new protein variants that are not recognized by the host immune system. The observation that the outer domains of intimin are highly immunogenic and highly polymorphic raises the possibility that diversifying selection promotes evolutionary change in these domains (1, 23). Alternatively, the constraints on amino acid substitution may be relaxed in the intimin EC domains relative to the TM domain (where hydrophobicity must be maintained in order to anchor the protein in the cell membrane) so that amino acid changes accumulate more rapidly in the EC domains under neutral evolution.
If the external domains of intimin are evolving under diversifying selection, then the rate of nonsynonymous substitutions should be higher than the rate of synonymous substitutions (pN - pS > 0). In this case, amino acid substitutions are favored by selection so that replacement substitutions accumulate at a higher rate than synonymous substitutions. Alternatively, if the domains evolve under neutrality, then the rates of pN and pS should be similar (pN - pS
0). The pattern of substitution for eae (Fig. 4) suggests that purifying selection predominates in intimin evolution over time, as synonymous changes outnumber nonsynonymous changes (pN - pS < 0), even over most of the EC domain. The rate of nonsynonymous substitution is higher in the external and PP domains than in the TM domain, indicating that amino acid changes are less constrained in the two end regions. Despite the prevalence of purifying selection, some evidence for positive selection over restricted regions was apparent both in the PP and EC domains (Fig. 4).
Although we predicted that diversifying selection would be apparent in the EC domains, we did not anticipate finding evidence that amino acid changes are favored in the PP region. The boundary of the domain was not clearly defined based on crystallography data (20), nor has its function been well characterized. Here the PP domain comprises the 186 amino acids at the N terminus of the conserved central domain; the central domain was defined by McGraw et al. (23) based on amino acid conservation between intimin and invasin and was delineated by the outermost two conserved amino acid residues that did not encompass any alignment gaps. Because the function of the domain is not clear, it is difficult to speculate why amino acid substitutions would be favored. It is plausible that it must evolve to interact with different or divergent proteins in the various E. coli strains that harbor the LEE island.
Although we found evidence of diversifying selection in the EC domains, the effect was not as great as we expected. A complicating factor in detecting diversifying selection in highly divergent genes such as intimin is that sites that are free to vary may have become saturated (have undergone multiple substitutions), so that further changes are obscured. Among the intimin sequences, most pairwise comparisons involve either very closely related sequences, where there are too few changes for statistical inference, or very distantly related sequences, where sites are likely saturated. To provide additional evidence for diversifying selection, we used another test that compares two classes of nonsynonymous substitutions: those that are conservative with respect to amino acid properties and those that are radical, where the substitution alters a residue characteristic such as charge or polarity. Natural selection may act on the amino acid replacements so that substitutions are not random with respect to a residue property. When pNC > pNR, substitutions are occurring in such a way to maintain amino acid characteristics, whereas when pNR > pNC, substitutions that change residue property are occurring more frequently than expected by chance; the implication is that natural selection resists or favors changes in residue property, respectively.
The comparison of pNR and pNC provides not only evidence for selection but can also reveal the residue property that is under selection. Residue charge is important, as it is a major determinant in protein binding. Comparison of pNR and pNC suggests that natural selection may favor residue charge changes in surface proteins of pathogens as well as in host defense molecules. For example, some comparisons of the peptide binding region of major histocompatibility complex class I molecules suggest that natural selection favors replacements that alter residue charge (12): in the products of two class I loci, HLA-A and HLA-B, pNR significantly exceeded pNC. In addition, the pattern of charge changes differed between the two loci, and there was a correspondence between charge variation in each HLA protein and the peptides that it binds (11). Residue charge changes also appear to be favored in defensins (14), which are small, antimicrobial peptides that are secreted into the gut lumen in response to microbial invasion (9). Some surface pathogen proteins also show a propensity towards charge changes. In Plasmodium falciparum, amino acid replacements that altered charge occurred more frequently than conservative changes for four of five surface proteins (but not in four nonsurface proteins) (11). The implication from these studies is that charge changes in pathogen surface proteins are favored by selection because the changes allow the pathogen to escape host defenses.
The comparison of radical and conservative amino acid replacements in intimin suggests that the EC domains of intimin are under diversifying selection (Fig. 5). If the EC domains were evolving under neutrality, then amino acid replacements would be random with respect to charge and polarity. However, changes are not random with respect to either property. When radical amino acid replacements are defined as those that alter residue charge, radical changes outnumber conservative changes (pNR > pNC), suggesting that such replacements are selectively favored. However, when radical changes involve polarity differences, pNC exceeds pNR, so alterations in this residue property are not favored (Fig. 5). The propensity for charge-changing amino acid substitutions supports the hypothesis that diversification of intimin extracellular domains is driven by natural selection, perhaps as a means to escape immune surveillance in vertebrate hosts.
| ACKNOWLEDGMENTS |
|---|
The research was supported by grants from the National Institutes of Health.
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
and intimin ß from enteropathogenic Escherichia coli. Infect. Immun. 66:56435649.
, ß, and
intimins of pathogenic Escherichia coli. Mol. Biol. Evol. 16:1222.[Abstract]
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Appl. Environ. Microbiol. | Infect. Immun. | Eukaryot. Cell |
|---|---|---|
| Mol. Cell. Biol. | J. Virol. | Microbiol. Mol. Biol. Rev. |
| ALL ASM JOURNALS |