Previous Article | Next Article ![]()
Journal of Bacteriology, November 2003, p. 6269-6277, Vol. 185, No. 21
0021-9193/03/$08.00+0 DOI: 10.1128/JB.185.21.6269-6277.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Seong Karp Hong,1 Hae Kyung Lee,2 Mi-Yeoun Park,2 and Yoon-Hoh Kook1*
Department of Microbiology and Cancer Research Institute, Institute of Endemic Diseases, SNUMRC, Seoul National University College of Medicine, and Clinical Research Institute, Seoul National University Hospital, Seoul 110-799,1 Laboratory of Rickettsial and Zoonotic Disease, Department of Microbiology, Korean National Institute of Health, Seoul 122--701, Korea2
Received 19 May 2003/ Accepted 11 August 2003
|
|
|---|
|
|
|---|
Recently, it was reported that DotA is secreted extracellularly (24). In addition, there is significant similarity between the amino acid sequence of DotA and that of TraY of plasmid ColIb-P9 in Shigella sonnei (18, 32, 43).
As a component of a type IV transporter, TraY is involved in the conjugal transfer of the plasmid. The similarity between the Dot/Icm proteins of L. pneumophila and the Tra/Trb proteins in the ColIb-P9 plasmid of S. sonnei suggests that the dot/icm genes may have originated from such a plasmid (32). In addition, sequence homologies of the Dot/Icm system with the chromosomal sequences of Coxiella burnetii (32, 44) have also been found. C. burnetii is an intracellular pathogen which causes Q fever and is evolutionarily close to L. pneumophila. The IncI1 plasmid conjugation system of S. sonnei might have been transferred into an unknown common ancestor of Legionella and Coxiella (18). These findings suggest that the evolutionary origin of the dot/icm genes in L. pneumophila is complicated.
In a previous study, the possibility that horizontal gene transfer or intraspecies recombination had occurred in L. pneumophila was raised (17) on the basis of analysis of partial dotA sequences (360 bp). Ninety-six strains of L. pneumophila were classified into six subgroups (four subgroups in Legionella pneumophila subsp. pneumophila and two subgroups in Legionella pneumophila subsp. fraseri) on the basis of both rpoB and dotA gene sequences. However, the phylogenetic relationships between the subgroups generated from the dotA sequences differed dramatically from those for the housekeeping rpoB sequences. A similar result was mentioned in the report of Bumbaugh et al. (6), in which they compared dotA and mip gene sequences. However, the results obtained were insufficient to elucidate the molecular origin or evolution of dotA in L. pneumophila. Thus, we undertook this study to find definite evidence for horizontal gene transfer or intraspecies recombination of dotA by comparing nearly whole dotA sequences from all serogroups (SGs).
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Reference strains of L. pneumophila used in this study
|
|
View this table: [in a new window] |
TABLE 2. Primers and their sequences used in this study
|
Sequence alignment. Raw sequences were analyzed and concatenated by DNASTAR (Madison, Wis.). Multiple alignments were accomplished with amino acid sequences inferred by using CLUSTAL X (39). Amino acid sequences were deduced with the MegAlign program of DNASTAR. Based on the alignments of deduced amino acid sequences, an aligned data set of nucleotide sequences was obtained.
Sequence analysis. Phylogenetic trees were inferred from amino acid and nucleotide sequences by using the parsimony methods in PAUP (version 4; Sinauer Associates, Sunderland, Mass.) and the midpoint rooting option. Phylogenies were evaluated from nucleotide sequences in three partitioned regions of dotA, respectively. The first part specified the 5'-end region (residues 31 to 414), which corresponds to a region from the first to the third transmembrane domain (TM1 to TM3). The second part corresponded to the second periplasmic domain (PP2; residues 415 to 1944), and the third included the 3'-end region (residues 2767 to 3159), which spans TM8 and the fifth cytoplasmic domain (CP5) (see Fig. 1 and 2). The branch supporting values were evaluated with 500 bootstrap replications (12, 15).
![]() View larger version (45K): [in a new window] |
FIG. 1. Schematic representation of the deduced DotA amino acid sequences of the 15 reference strains of L. pneumophila. Alignment gaps are shown as open spaces (1 in IDR-A1, 9 in IDR-A2, 19 in IDR-A3, 1 in IDR-A4, 28 in IDR-B, and 1 each in IDR-C1 and IDR-C2). Subgroups (17) are given on the left of each SG. Deduced amino acids given at the top are those of SG 1 (ATCC 33152) after multiple alignment. TM1 to TM8, eight transmembrane domains; PP1 to PP4, four periplasmic domains; CP2 to CP5, four cytoplasmic domains; IDR, insertion-deletion regions. The first cytoplasmic domain (CP1) at the 5' end was excluded in this study because it corresponded to the position of primer 1F.
|
![]() ![]() View larger version (133K): [in a new window] |
FIG. 2. Polymorphic sites within the dotA gene of L. pneumophila. The nucleotides at each of the polymorphic sites in dotA from SG 1 (ATCC 33152) are shown. Nucleotides shared with SG 1 are indicated by dots. A total of 2,304 nucleotide sites are the same in all sequences, and there are 180 gaps (not shown). The position of each polymorphic site within dotA is given above the sequences; the numbers are to be read downward (e.g., the first is position 34). Shaded regions, nucleotide sequences shared with SG 14; underlined regions, nucleotide sequences shared with SG 8. Sequence regions of SG 5 shared with subgroup P-III (SGs, 2, 6, and 12) are boxed.
|
Nucleotide sequence accession numbers. The dotA sequences determined in this study have been submitted to GenBank under accession numbers AY194414 to AY194424.
|
|
|---|
The locations of the eight TM domains and gaps are indicated in Fig. 1. Three insertion-deletion regions, IDR-A (IDR-A1, -A2a, -A2b, -A3, and -A4), IDR-B (IDR-Ba, -Bb, and -Bc), and IDR-C (IDR-C1 and -C2), were found in the PP2, PP4, and CP5 domains, respectively. Strains belonging to the same subgroup, which had been previously classified from partial rpoB and dotA sequences (17), showed identical IDR patterns, except for SG 2 in P-III (Fig. 1). Figure 2 shows polymorphisms in the nucleotide sequence, excluding gaps (180 bp) in the aligned data set. The nucleotide polymorphism in the whole of the aligned sequences was 21.87% (645 of 2,949 nucleotides). Various levels of nucleotide sequence polymorphisms were observed in each domain (in the eight TM domains, 10.23% [48 of 469 nucleotides]; in the four CP domains, 19.65% [135 of 687 nucleotides]; and in the four PP domains, 25.77% [462 of 1,793 nucleotides]). However, the sequence polymorphism level of the amino acids was 18.64% (183 of 983 amino acids).
Heterogeneous similarity patterns of sequence polymorphisms and identities were observed in different regions of dotA in SG 14, SG 8, and SG 15. The sequence of a region (from the 5' end to bp 416, including the TM1, PP1, TM2, CP2, and TM3 domains) of SG 14 was very similar (99.4% similarity) to the same region of SG 13 (Fig. 2A, shading). Another two regions, i.e., the region from bp 2529 to 2682 and the region from bp 2810 to the 3' end of the SG14 sequence, were similar to those of SGs 2, 6, and 12 and to those of SGs 3 and 10 (Fig. 2B, shading), respectively. However, the dotA sequence of SG 14 in the other regions showed distinct nucleotide polymorphisms (Fig. 2).
The dotA sequence of SG 8 also showed a heterogeneous similarity in different regions. Its sequence from TM1 to TM3 (from the 5' end to residue 402) was very similar (97.9 to 98.5%) to the corresponding sequences of SGs 3, 10, and 11 (Fig. 2A, underlining). However, SG 8 also showed a high sequence homology (99.1 to 99.4%) with SGs 7 and 13 in the PP2 domains (residues 804 to 1341) and had a sequence similarity of 97.7% with SG 11 in the CP5 domain (residues 2857 to 3117).
The SG 5 strain, which belongs to L. pneumophila subsp. fraseri, showed a dotA PP2 domain sequence (from residue 450 to 1035) similar (98.1 to 98.3%) to those of the L. pneumophila subsp. pneumophila strains representing SGs 2, 6, and 12 (Fig. 2A, box). However, it also had sequences similar to those of the SG 4 and 15 strains, which are strains of L. pneumophila subsp. fraseri, in the other regions.
Phylogenetic analysis. Phylogenetic relationships of L. pneumophila, inferred from almost-complete dotA nucleotide sequences and the deduced amino acid sequences, are shown in Fig. 3. Four subgroups (P-I to P-IV) of L. pneumophila subsp. pneumophila and two subgroups (F-I and F-II) of L. pneumophila subsp. fraseri, which were defined in a previous report (17), also occurred in trees based on the full dotA sequences. Although incongruence exists in the positions of SGs 3 and 10, the two phylogenies from the nucleotide and deduced amino acid sequences were similar. The clade of SGs 3 and 10, which was designated subgroup P-II, clustered with SGs 1 and 9 of subgroup P-I in the amino acid tree (Fig. 3B) but not in the nucleotide tree (Fig. 3A). Other incongruences, such as that in the position of SG 11 and that of the relationships among strains of subgroup P-III, also appeared in the two trees.
![]() View larger version (16K): [in a new window] |
FIG. 3. The most parsimonious trees inferred from nearly complete dotA sequences. (A) Tree from the nucleotide sequences, which required 975 steps; CI = 0.834; RI = 0.918. (B) Tree from deduced amino acids, which required 281 steps; CI = 0.904; RI = 0.950. The midpoint rooting method was used to root the trees. Subgroups (17) are indicated by dotted vertical lines on the right. Branch lengths are proportional to changes in the nucleotides or amino acids. Branches supported by values higher than 50% in the bootstrap analysis (500 replications) are indicated.
|
![]() View larger version (18K): [in a new window] |
FIG. 4. Gene phylogenies inferred from three regions of the dotA gene. These trees were constructed from nucleotide sequences by parsimony analysis. (A) One of the six most parsimonious trees (87 steps) inferred from the 5'-end regions corresponding to TM1 to TM3 (residues 31 to 414); CI = 0.851; RI = 0.930. (B) The unique parsimonious tree (531 steps) constructed from the PP2 domain of residues 415 to 1944; CI = 0.887; RI = 0.948. (C) One of the four most parsimonious trees from the 3'-end regions, corresponding to the TM8 and CP5 domains (residues 2767 to 3159); CI = 0.748; RI = 0.845. The branch lengths are proportional to changes in the nucleotides. The numbers on the branches are the percentages of support from bootstrap analysis (500 replications). The three strains that have different positions in the three phylogenies are shaded.
|
![]() View larger version (11K): [in a new window] |
FIG. 5. Split graph showing the relationships among the 15 reference strains of L. pneumophila. The split graph was generated by using SPLITSTREE, version 3.1 (16), from the pairwise distances of the sequences of the dotA gene based on the Kimura two-parameter model. The fit value was 0.81, indicating that the phylogenetic signal in the data was represented moderately well by the split graph. The network indicates the lack of a treelike relationship between the dotA sequences. All branch lengths are drawn to scale.
|
|
View this table: [in a new window] |
TABLE 3. Ratio of synonymous substitutions per synonymous site (dS) to nonsynonymous substitutions per nonsynonymous site (dN)
|
![]() View larger version (23K): [in a new window] |
FIG. 6. Cumulative increases in synonymous (solid line) and nonsynonymous (dotted line) substitutions in dotA sequences. This graph was generated by using the SNAP program, obtained from an Internet website (http://www.mlst.net). In this analysis, all alignment gaps (30 in IDR-A, 28 in IDR-B, and 2 in IDR-C) were excluded; positions and numbers of codons are indicated below the x axis. The y axis indicates the cumulative number of nucleotides causing synonymous or nonsynonymous amino acid changes.
|
|
|
|---|
Discrepancy between trees based on nucleotide and amino acid sequences (Fig. 3) may suggest that several synonymous mutations have occurred at the same sites. Multiple hits at a single site, which cause a saturation of base substitution, can prevent one from inferring true evolutionary events. Such multiple hits may result in synonymous substitutions in the genes encoding functional proteins, which may obscure phylogenetic relationships (27). Considering the consistency and retention indices (CI and RI, respectively) of nucleotide and deduced amino acid data sets (Fig. 3), loss of phylogenetic signals in the nucleotide data set may be related to such multiple hits (34). The slight divergence of the deduced amino acid sequences of SGs 7, 8, 11, and 13, SGs 1 and 9 (subgroup P-I), and SGs 3 and 10 (subgroup P-II) may indicate that their divergence is recent. Also, the differentiation of serogroups within subgroup P-III may be recent, which suggests that the factors determining the serogroup of L. pneumophila may be a restricted to a gene product (33) that seldom varies.
The phylogenetic positions of SGs 8, 11, and 14 did not coincide in the tree constructed with the sequences of different regions (Fig. 4). Sequence comparisons also indicated quite different similarities depending on the regions compared (Fig. 1 and 2). This inconsistency can be explained by intragenic recombination among the strains of L. pneumophila. In other words, the dotA gene of L. pneumophila may have a mosaic structure composed of segments with different histories, as has demonstrated for intimins of pathogenic Escherichia coli (21). The result of network-like phylogeny by split decomposition analysis (Fig. 5) also supports the notion of intragenic recombination events in dotA. Because this analysis does not make an a priori assumption of a tree-like process of sequence divergence, conflicting phylogenetic signals in the data, such as evidence of recombination, will generate an interconnected network rather than a tree (1, 9, 35, 37).
There are good examples of bacterial genes composed of diverse segments with different histories; i.e., intimin in pathogenic E. coli (21, 38), the leukotoxin operon in Pasteurella species (8), the capsular biosynthetic locus and penicillin-binding protein in Streptococcus pneumoniae (7, 11), and the outer membrane protein (ompA) in Chlamydia species (22). Genes that exhibit such mosaic structures mainly encode proteins that either are extracellularly secreted, are exposed on the cell surface, or act as virulence factors (19). The mosaicism of dotA was suspected in a previous study, which used a portion of the dotA sequence (17), and this was supported by a report that DotA is a secretory protein (24). In addition, L. pneumophila has been reported to be naturally transformable (23, 36), and its competence makes it possible to exchange portions of genes naturally (14, 23).
Comparison of synonymous and nonsynonymous mutations has also shown that dotA does not have a homogeneous structure. A dS/dN ratio that exceeds 1 means that there is negative selection for amino acid change. On the other hand, a dS/dN ratio less than 1 indicates positive selection for amino acid substitution (10, 26). The highest dS/dN ratio of the TM domains in dotA of L. pneumophila (33.84) indicates that it is under a strong negative selective constraint. However, that of the PP domains was close to 1, and the ratio was lower than 1 in the PP2 domain. This means that the periplasmic regions in dotA are under strong positive pressure for amino acid change, or relaxed selective constraint. Interestingly, the PP2 domain shares little similarity with TraY of plasmid ColIb-P9 in spite of the overall similarity between dotA and traY (18, 43). This suggests that the PP2 domain has evolved in a different manner from the other regions of dotA. Thus, dotA is believed to have a mosaic structure due to transfer from two or more origins and to have experienced an extremely complicated evolutionary history even within a single domain.
In addition, individual domains within dotA have heterogeneous structures. In a region within the PP2 domain, the sequence of SG 5 of L. pneumophila subsp. fraseri was similar to those of SGs 2, 6, and 12 of L. pneumophila subsp. pneumophila (Fig. 2). The sequence of the PP2 domain in SG 8 (residues 804 to 1452) was very similar to those of SGs 7 and 13, while they were clearly different in other regions (Fig. 2). Moreover, nonsynonymous substitutions did not increase linearly after amino acid residue 520 in the PP2 domain (Fig. 6). Therefore, it must be the case that the dotA gene of L. pneumophila has been exposed to high recombinational pressure.
DotA, as mentioned above, is a secreted protein, which is assembled into a ring-shaped structure with a central channel. It has been hypothesized that the conserved TM domains of DotA play an important role in assembly that is necessary for secretion (24). The heterogeneous characteristics of the PP2 domain sequences may affect the structure of DotA. However, little is known about the secretion of DotA, though the high rate of amino acid change and frequent recombination events in the PP2 domain may be related to the secretion mechanism.
In conclusion, this study shows that the dotA gene of L. pneumophila has a complex mosaic structure produced by multiple intragenic recombinations. The PP2 domain, the largest periplasmic domain of DotA, exhibits the highest variability and shows a strong positive selection for amino acid substitution. Amino acid substitutions of the virulence gene can affect the fate of intracellular pathogens. DotA affects the ability of L. pneumophila to prevent phagolysosome fusion and to survive within macrophages (4). Thus, the rapid evolution of dotA via multiple recombination and frequent nonsynonymous mutations has provided L. pneumophila with increased fitness in certain environmental niches, such as within a particular biofilm community or species of amoebae, by generating novel antigenic variations at surface-exposed sites.
Present address: Infectious Disease Research Institute, Asian-Pacific Research Foundation for Infectious Diseases (ARFID), Seoul, Korea. ![]()
|
|
|---|
, ß, and
intimins of pathogenic Escherichia coli. Mol. Biol. Evol. 16:12-22.[Abstract]
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»