Phylogenetic evidence for horizontal transfer of an intervening sequence between species in a spirochete genus

The 23S rRNA genes (rrl genes) of some strains of certain species of the spirochete genus Leptospira carry an intervening sequence (IVS) of 485 to 759 bases flanked by terminal inverted repeat and encoding an open reading frame for a putative protein of over 120 amino acids. The structure and the sporadic distribution of the IVS suggest that it might be a mobile element that can be horizontally transferred within or between species. Phylogenetic hypotheses based on the sequences for six IVS open reading frames from various species were compared with hypotheses constructed by using DNA sequences from the 16S rRNA gene (rrs), which is not closely linked to rrl in this genus. The predicted phylogenies for the IVS and rrs differed in a major respect: one strain that claded with L. weillii in the tree based on the rrs data claded with L. noguchi in the tree based on the IVS data. Neither set of data supported a tree in which this strain was constrained to be in the same clade as was supported by the other set of data. This result indicates a probable horizontal transfer of the IVS from a recent ancestor of L. noguchi to a recent ancestor of one of the L. weillii strains. This observation is the first indication of horizontal transfer of elements encoded on the chromosomes of spirochetes.

The 23S rRNA genes (rrl genes) of some strains of certain species of the spirochete genus Leptospira carry an intervening sequence (IVS) of 485 to 759 bases flanked by terminal inverted repeat and encoding an open reading frame for a putative protein of over 120 amino acids. The structure and the sporadic distribution of the IVS suggest that it might be a mobile element that can be horizontally transferred within or between species. Phylogenetic hypotheses based on the sequences for six IVS open reading frames from various species were compared with hypotheses constructed by using DNA sequences from the 16S rRNA gene (mn), which is not closely linked to rrl in this genus. The predicted phylogenies for the IVS and rrs differed in a major respect: one strain that claded with L. weilhii in the tree based on the mS data claded with L. noguchi in the tree based on the IVS data. Neither set of data supported a tree in which this strain was constrained to be in the same lade as was supported by the other set of data. This result indicates a probable horizontal transfer of the IVS from a recent ancestor of L. noguchi to a recent ancestor of one of the L. wejlli strains. This observation is the first indication of horizontal transfer of elements encoded on the chromosomes of spirochetes.
An intervening sequence (IVS) with an extensive open reading frame (ORF) is observed in the 23S rRNA genes (rrl genes) of some but not all species of Leptospira and in some but not all strains in those species (22,23). This sporadic distribution of the IVS suggests the possibility of horizontal transfer. This possibility can be examined by comparing phylogenetic data from the IVS with phylogenetic data collected from elsewhere in the genome. If the phylogenetic hypotheses constructed for the IVS and for a gene located elsewhere in the genome are concordant, then lateral transfer is not supported. However, if the phylogenetic trees for the IVS and another gene are different, then this difference is circumstantial evidence for horizontal transfer.
Horizontal transfer between species has been observed in a large number of organisms and is now recognized as an important process in evolution (see, e.g., reference 27). Most examples of horizontal transfer of chromosomal genes have been in naturally transformable species (reviewed in reference 17). Some of the best examples are the transfer of antibiotic resistance genes on plasmids or transposons in bacteria (14) and the movement of retroelements (29). Other good examples include group I and II introns that seem to have moved between species by horizontal transfer, on the basis of remarkable similarities in sequence in divergent organisms (2,4,10,15,16,18,19,31).
Spirochetes may exchange genetic information relatively rarely. Sequence evidence from a spirochete, Borrelia burgdorferi, indicates clonality of the chromosome (5). No case of natural horizontal transfer of an element on a chromosome has been confirmed in these organisms. However, transfer of plasmids between species has been suspected (5), and it has been speculated that the unusual ends of the linear Borrelia plasmids have a common ancestry with certain eukaryotic viruses (12). Gene transfer in the laboratory has proved difficult although not impossible (24a, 28).
The IVS in rrl of Leptospira species showed all the hallmarks of a horizontally transferable element (22,23). We investigated the phylogeny of this element because it might show the first evidence of natural horizontal transfer between spirochete chromosomes.

MATERIALS AND METHODS
Genomic DNAs. Genomic DNAs from the strains listed in Fig. 1 were prepared by Phillipe Perolat, using a previously described method (21). The same strains were also acquired from David Miller (U.S. Department of Agriculture, Ames, Iowa).
PCR. PCR was performed on approximately 1 ng of genomic DNAs, using 500 nM primers, 1.25 U of Taq polymerase (Perkin-Elmer Cetus, Norwalk, Conn.), and 0.2 mM each deoxynucleoside triphosphate along with 2,5 j±Ci of 3,000-Ci/ mmol [a-32P]dGJ7P in a volume of 50 txl of the recommended buffer plus 1.75 mM MgCl2. The temperature profile was 1 min at 940C, 1 min at 50'C, and 2 min at 720C for 35 cycles. Ten microliters of each PCR mixture was loaded on a 1% agarose gel (FMC, Rockland, Maine), and DNA products were separated by electrophoresis. All DNAs gave successful amplifications.
The IVS in rrl was amplified by PCR using the primers 23S-1150 and 23S-1432R (listed below) directed against conserved rrl sequences flanking the insertion. These primer sequences were located at bases 1147 to 1171 and bases 1466 to 1432, respectively, in the sequence under GenBank accession number X14249 (7). The ms genes were amplified by PCR using the primers 16S-20 and 16S-1500 (listed below), which were located at bases 10 to 27 and bases 1492 to 1507, respectively, in the sequence under GenBank accession number X17547 (8).
in the antisense strand is indicated with an R. The underlined sequences are restriction sites used for cloning. Cloning. The IVS PCR product was cleaved with EcoRI and Sall, and the rrs product was cleaved with SacI and Sall; this was followed by ligation with T4 DNA ligase. The recognition sites for these enzymes were engineered into the oligonucleotides used in these PCRs and permitted directional insertion of the PCR products into the pBSK and pBKS polylinkers (Stratagene Inc., La Jolla, Calif.). Sequencing. Single-stranded DNAs for sequencing were rescued by using the helper phage VCSM13 (Stratagene). DNA sequence determination was performed by using the Sequenase reagent kit (U.S. Biochemicals), various oligonucleotide primers, and a-35S-dATP (Amersham). The -40 primer (U.S. Biochemicals) and T7 primer (Stratagene) anneal to sequences flanking the insert in the Bluescript vector. The primers LEP16S-326 and LEP16S-696 (listed above) anneal to the sense strand of the 16S rRNA gene at nucleotides 326 and 696, respectively. LEP16S-91OR and LEP16S-1199R anneal to the antisense strand at nucleotides 910 and 1199, respectively. In addition, primer IVSa was designed to hybridize to a sequence within the conserved ORFs of the IVSs and was used for sequencing the 3' ends of the sense strand of the ORFs and the 3' untranslated region. The resulting reaction products were separated by electrophoresis on 5% polyacrylamide-7 M urea gels. The gels were fixed in 10% glacial acetic acid-12% methanol, dried, and autoradiographed with XAR-5 film (Kodak). In all cases, at least two independent clones from each ligation were sequenced.
PCR of rrl was performed with the primers LEP23S-1295 and LEP23S-2567R. PCR products (4 ,ul) were cleaved without further processing with 1 to 5 U of each restriction enzyme in a reaction volume of 24 ,ul, using the manufacturers' recommended buffer and incubation temperature. Each 50-,lI PCR amplification produced enough material for 12 restriction digests.
Electrophoresis. Three microliters of each restriction digestion mixture was mixed with 1 tAl of Ficoll-dye loading buffer and electrophoresed on a 0.35-mm-thick 5% acrylamide native gel in lx TBE (90 mM Tris-borate, 2 mM EDTA). After electrophoresis, the gel was dried on a vacuum dryer and autoradiographed for 12 to 72 h on Kodak X-Omat X-ray film. Computer programs. The parsimony analysis package for the Macintosh was kindly supplied by D. L. Swofford (26). The program was run on a MacIl ci.
Nucleotide sequence accession numbers. The GenBank accession numbers for the U12669 sequences obtained here are through U12677.

RESULTS AND DISCUSSION
The sporadic occurrence of an IVS in some strains of some species of Leptospira might be explained by the occurrence of the IVS in a common ancestor and loss in some subsequent lines. However, the IVS could be a transposable element that is sometimes moved between strains and sometimes lost. To examine the possibility that the IVS has moved between strains, we compared the sequences of the IVS ORFs between species and within a species, selecting those IVSs that seemed to vary within species on the basis of MRSPs (22,23). Such variation could be due to divergence or could reflect lateral VOL. 176, 1994 transfer. In order to distinguish between these possibilities, we compared the phylogenetics of the IVSs with the phylogenetics of another gene, mrs, located many kilobases distant in the same genomes (9). Discordance between these phylogenies would be evidence for lateral transfer between strains.
Phylogenetic tree for the IVS. We had previously sequenced four IVSs found in the rrl genes of certain strains of Leptospira (22,23). Two more sequences from L. weillii serovar worsfoldi strain Worfold and L. weillhi serovar celledoni strain Celledoni were obtained by PCR between conserved portions of rrl flanking the IVS. These IVSs were chosen for sequencing because MRSP data indicated that they seemed to be from strains within the L. weiii group but the IVSs seemed to be different from each other and from the closest relative we had already sequenced (reference 23 and data not shown). The alignment of the DNA and amino acid sequences from the six ORFs is presented in Fig. 1.
Phylogenetic hypotheses were constructed for these data. We used parsimony analysis because this may be the most effective strategy for constructing and assessing the confidence of a cladogram (26). First, a most-parsimonious tree for the ORF in the IVS DNA sequences was constructed by using the PAUP program (26). Only data from the DNA sequence in the ORF were used because this was the only part of the IVS that could be confidently aligned between species. This tree assumed that transition mutations would be twice as common as transversions. One method to ascertain the confidence of such a hypothesis is to perform repeated tree constructions from random samples of the data and build a consensus tree (6). This strategy allows an estimate to be made regarding the proportion of data supporting each node. When such a "bootstrap" tree was built ( Fig. 2A), each branch was supported and the topology was the same as that of the most-parsimonious tree, lending considerable support to the phylogeny predicted. The major discrepancy between this phylogeny and the previously assigned species designations was the clading of the IVS from L. weilli WA52 with the IVS of L. noguchi NB36.
Phylogenetic hypothesis for the rRNA genes. To confirm species assignments, the mrs genes were investigated. PCR using primers directed toward highly conserved regions near the 5' and 3' ends of rrs and rrl generated products that could be used to perform restriction digestions. In a previous study 20 MRSPs had been collected from the rrs and rrl genes by this strategy (23). An additional 10 MRSPs were collected, and all are presented in Table 1.
The complete set of 30 MRSPs was analyzed by parsimony, using the assumption that during DNA sequence divergence the disappearance of a restriction site was more likely than the reappearance of the site. In evolutionarily neutral DNA the ratio of loss to reappearance of a four-base restriction site is about 4:1 (3). The exact ratio of gain to loss in a real sequence is difficult to assess and almost certainly varies widely between sites in the highly selected rRNA genes. Nevertheless, most sites are probably very rarely regained once they have been lost. We investigated a variety of gain-to-loss ratios, and there was little effect on the tree topology in the range from 5:1 to 2:1. Earlier evidence from DNA-DNA hybridization (24,30) and arbitrarily primed PCR (23) allowed us to constrain the tree to make each species monophyletic except L. weilli, for which we had evidence of significant intraspecific divergence (23). A bootstrap tree from 100 replications optimized by heuristic TBR (Tree bisection-reconnection) was constructed on the basis of the MRSP data with a gain-to-loss weighting of 4:1 (data not shown). There were clearly discrepancies in the positions of WA52, WA45, and NB36 between the cladograms in Fig. 2A and the cladogram constructed by using the MRSP data (not shown) that might indicate a horizontal transfer event. However, use of the MRSP data alone in a bootstrap consensus revealed that there was not enough information to support the discrepancy between the trees. For this reason more phylogenetically informative characters were sought.
Sequences of 16S rRNA genes. Sequencing of rrs was performed to obtain more phylogenetic data from this gene. A region spanning 1,465 bp was amplified by using the same primers as used for generating the MRSP data. The resulting products were cloned and sequenced (see Materials and Methods). A phylogenetic tree was constructed for these data and is presented in Fig. 2B.
A pairwise comparison of the genetic similarities between strains from the IVS-ORF and irs sequence data is presented in Table 2. The irs sequences are identical for L. weihi WA45 and WA52, yet WA52 carries an IVS that is much more similar to one found in L. noguchi NB36 than to the IVS found in WA45, consistent with the data in Fig. 2A and B.
Interestingly, WA45 and WA52 also differs in their irs sequences compared with the other L. weilli strain, WB46, by more than they do compared with L. santarosai. Strains WA45 and WA46 also differ substantially in their arbitrarily primed PCR fingerprints (23). These strains had been grouped previously on the basis of serology, while strain WA52 had remained unclassifiable on this basis (20a). While the definition of species is somewhat arbitrary, particularly in organisms such as spirochetes that may behave largely clonally (5), it is clear that L. weihi consists of at least two distinct clades that may differ sufficiently to warrant designation as different species.
Phylogenetic trees of the Leptospira 16S rRNA genes were also constructed after adding the data of Fukunaga et al. (7) and Paster et al. (20) and the recent extensive data of Hookey et al. (13). The topologies of the taxa in Fig. 2A did not change. While we are very confident of our data, we found that some published sequence determinations were different from ours. These discrepancies did not change the topologies or the conclusions drawn.
It could be argued that it is not acceptable to use the combined irs and rrl data to construct a cladogram because ml is so intimately linked to the IVS that the rrl gene might be transferred along with the IVS. Below we explain the evidence indicating that this is unlikely. Thus, the MRSP data from rrl were combined with irs sequence data in order to produce the most data available to date for relating the species. The resulting cladogram, shown in Fig. 3, is concordant with that presented in Fig. 2A. The apparent differences are due to the cladogram in Fig. 3 being rooted by using L. biflexa as an outgroup. Discrepancies between this phylogeny and previous published phylogenies (13,23) may be attributed to an increase in the number of informative characters used here.
Difference between the two observed tree topologies. The phylogenetic hypothesis for the IVS in Fig. 2A differs from that for the irs in Fig. 2B and 3. In particular, the location of strain WA52 differs in these trees. WA52 clades with NB36 in the tree constructed by using the IVS data and with WA45 in the tree constructed by using the irs data. New phylogenetic hypotheses were generated in which the tree topologies were constrained to force WA52 to clade with L. weilli WA45 when the IVS data were used (Fig. 2C). Similarly, WA52 was constrained to clade with L. noguchi NB36 in a tree constructed by using rrs sequence data (Fig. 2D). These constrained trees are considerably less parsimonious than the trees constructed without forcing the topology. Tree lengths of 617 versus 646 steps were obtained for the IVS data, and lengths of 45 versus 63 steps were obtained for the irs sequence data. Of the 105 tree topologies that are possible for six taxa, the  Fig. 1. Each unrooted tree represents the consensus generated after 100 bootstrap replications of the Branch and Bound algorithm (26). BD50 is shown on the left of each cladogram to assist in comparing topologies. The percentage of bootstrap trees consistent with each clade is indicated at each node. The branch lengths between nodes are indicated below each branch. Transversions versus transitions were weighted 2:1. (A) Phylogenetic hypothesis for the ORF in the IVS. (B) Phylogenetic hypothesis for the 16S rRNA gene (irs) based on sequence data. (C) Tree constructed by using the IVS-ORF sequence data with the constraint that WA45 and WA52 clade together. (D) Tree constructed by using the rrs sequence data with the constraint that WA52 and NB36 clade together. artificially constrained trees constructed by using the IVS and rrs sequence data were longer than 15 and 43, respectively.
The possibility of confusion among strains was eliminated by analysis of two further sets of the six strains, one repeat shipment from P. Perolat (New Caledonia, France) and a shipment from a separately maintained collection held by David Miller (U.S. Department of Agriculture, Ames, Iowa). MRSP data were acquired for rrs, ar, and the IVS for each of these independently acquired strains (23). All of the strains gave restriction data entirely consistent with the strain designations and sequences we report here.
There is little previous evidence for lateral transfer of genes in spirochetes. Nevertheless, the data presented here are most simply explained by the horizontal transfer of the IVS from an ancestor of L. noguchi NB36 to an ancestor of L. weilli WA52. The alternative explanation that is now much less likely is that the sporadic distribution of the IVS is due solely to the loss in some lines of the IVS insertion that was present in the ancestor of all these strains.
The possibility that irs was transferred from an ancestor of WA45 to an L. noguchi ancestor of WA52 is not supported by the following data. First, arbitrarily primed PCR data show that the whole genomes of WA45 and WA52 are very similar (23; data not shown). Second, there are five MRSPs in ril that distinguish NB36 from WA45/WA52 (Table 1). Thus, any gene conversion event from WA45 to WA46 that involved not only [C] VOL. 176,1994  -+ + + ---+ + -+ --+++ + + --+ -+ ++ ± + + + +±-L. interrogans B 7c _ + + + _ + + _ + _ _ + + + + _ + + + _ + + + + + + +and L. meyeri L. kirschneniA The type strain in each species is designated profile A. The strains assigned to each MRSP profile are listed in Table 1 of reference 23. Five L. interrogans B strains and two L. meyeri strains. d Serotype kambale strain Kambale has been assigned to L. kirschneri. rrs but also rrl would need to avoid conversion of the IVS embedded in rrl. One of the MRSPs is 150 bases from the site of IVS insertion; the most conservative interpretation of this is that the IVS is transferred alone or with very little flanking rrl sequence.
The indication of horizontal transfer between species of spirochetes that is provided by phylogenetic data leads to some  Fig. 1. The number of differences in DNA sequence between each pair of strains was calculated for the IVS-ORF and rs sequence data. The largest pairwise distances were 266 and 21 for the IVS and rrs data, respectively. These genetic distances were rescaled for each data set to cover a range from 1 to 0, with 1.0 indicating a perfect match and 0.0 indicating the largest pairwise distance observed. Rescaling allowed the similarities between strains measured for the IVS and rrs data to be directly compared and discrepancies between them to be noted more easily. The top number in every pair is the rescaled IVS genetic similarity, and the bottom number is the rescaled rs similarity. The two greatest discrepancies between the pairwise genetic similarities for the IVS and rs data are shown in boldface. These discrepancies involve strains WA52, WA45, and NB36 and are consistent with the indication of lateral transfer from parsimony analysis (Fig. 2).
interesting possibilities. The IVS is transcribed in the sense strand of the 23S rRNA, and the IVS is released after transcription (22) by the conserved ORF is involved in some step in the transfer process. However, other eubacterial 23S rRNA genes (rrl) that carry small insertions of about 90 to 112 bases that do not carry ORFs have been observed (1,11,25). Therefore, the ORF could have some other function. We have recently expressed a protein from this ORF in Escherichia coli (22a). It will be interesting to investigate this protein further and determine its mode of action. Ultimately, it may be possible to genetically engineer the IVS and use this sequence as an avenue to insert and express genes in the Leptospira genome. With suitable manipulation, the gene might even be transferable to other genera of spirochetes.