Previous Article | Next Article ![]()
Journal of Bacteriology, November 2004, p. 7521-7528, Vol. 186, No. 22
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.22.7521-7528.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Department of Molecular Microbiology,1 Department of Internal Medicine, Washington University School of Medicine, St. Louis, Missouri,3 Department of Clinical Pathology, St. Mary's Hospital, Catholic University Medical College, Uijungbu, Korea,2 Institute of Biotechnology,4 Clinic of Gastroenterology, Vilnius University Hospital, Vilnius, Lithuania5
Received 2 July 2004/ Accepted 29 July 2004
|
|
|---|
|
|
|---|
Each of the four known species of IS elements in H. pylori (IS605, IS606, IS607, and ISHp608) belongs to the distinctive IS605 mobile element family, and each seems to be chimeric, containing two transposition-related genes, orfA and orfB, that may have different phylogenetic origins (12, 14, 16). The IS605 element family is divisible into two subfamilies based on orfA homologies; in one subfamily, represented by IS607, orfA encodes a putative serine recombinase that helps IS607 transpose to multiple sites (in contrast, most serine recombinases mediate site-specific recombination), and in the other subfamily, represented by IS605, IS606, and ISHp608 (28 to 36% amino acid identity for sequences encoded by orfA genes), orfA encodes a transposase related to that encoded by IS200, a single-gene element that is widespread in enteric species (5, 19, 25) whose product is distinct from serine recombinase proteins. The proteins encoded by orfB genes of IS605 family members exhibit 25 to 35% amino acid identity to one another; their homologs in other species are often annotated in sequence databases as putative transposases, and they are also homologs of the protein encoded by gipA, a Salmonella prophage gene that enhances bacterial growth in Peyer's patches (22).
IS605, IS607, and ISHp608 each have been found to transpose in Escherichia coli (12, 14, 16). With each of the two elements tested (IS607 and ISHp608), transposition depended on orfA and not on orfB. Hence, the constant presence of orfB in IS605 family members suggested either involvement in transposition in certain species or a contribution to bacterial fitness (11a). Inspection of sequences at sites of insertion in H. pylori and E. coli indicated that (i) IS605, IS607, and ISHp608 insert with their left (orfA) ends immediately downstream of specific AT-rich sequences (5'-TTTAA or 5'-TTTAAC for IS605, 5'-TTTAT for IS606, and 5'-TTAC for ISHp608), and their right (orfB) ends seem to join to target DNAs nonspecifically; (ii) in contrast, IS607 whose orfA transposase gene is unrelated to the other genes, inserted preferentially between adjacent GG nucleotides in target DNA; (iii) none of these elements duplicated or deleted sequences at sites of insertion; and (iv) none contained terminal inverted repeats (12, 14, 16). The H. pylori ISHp609 element described here is nonrandomly distributed geographically and is unique among elements of the IS605 family in containing four open reading frames instead of the usual two.
|
|
|---|
Specific PCR were carried out in 20-µl mixtures containing 5 to 10 ng of DNA, 0.1 U of Taq polymerase (Biolase; Midwest Scientific, St. Louis, Mo.) or the Expand High Fidelity Taq-Pwo polymerase mixture (Boehringer-Mannheim, Indianapolis, Ind.), 2.5 pmol of each primer, and each deoxynucleoside triphosphate at a concentration of 0.2 mM in a standard buffer for 30 cycles with the following cycling parameters: denaturation at 94°C for 30 s, annealing at a temperature appropriate for the primer sequence (generally 50°C) for 30 s, and DNA synthesis at 72°C for an appropriate time (1 min per kb). PCR primers are listed in Table 1.
|
View this table: [in a new window] |
TABLE 1. Primersa
|
DNA sequence editing and analysis were performed with programs in the GCG package (Genetics Computer Group, Madison, Wis.), programs and data in the H. pylori genome sequence databases (2, 24), and BLAST and Pfam (version 14.0) homology search programs (http://www.ncbi.nlm.nih.gov/BLAST/BLAST.cgi; http://pfam.wustl.edu/hmmsearch.shtml).
Phylogenetic analysis.
H. pylori OrfA and OrfB IS element protein sequences were aligned with CLUSTALX by using a PAM250 amino acid substitution matrix. OrfA and OrfB neighbor-joining phylogenies were constructed by using the Jones-Taylor-Thornton distance matrix and variable rates among sites (modeled with a gamma distribution shape parameter,
= 0.5). Gaps in the protein alignments were deleted in pairwise comparisons. Phylogenetic analysis and DNA diversity calculation were done with Mega2.1 (www.megasoftware.net).
Bacterial strains and plasmids. Most H. pylori strains used were obtained from the Berg laboratory collection and have been described in detail elsewhere (4, 13, 15, 16, 20). The H. pylori strains used in this study included 71 Peruvian strains, 24 Spanish strains, 47 Lithuanian strains from Vilnius, 28 African strains (16 strains from Soweto in South Africa and 12 strains from Gambia), 69 Indian strains (48 strains from Calcutta, 15 strains from Chennai, and 6 strains from the Santal tribe), 71 Japanese strains (24 strains from Ube and 47 strains from Fukui), 46 strains from Alaskan natives, 47 Chinese strains (40 strains from YunNan and 7 strains from Changle), and 47 Korean strains.
Plasmids carrying an intact ISHp609 element marked with chloramphenicol resistance downstream of the orfB stop codon were constructed, and transposition assays in E. coli were carried out as described previously for ISHp608 (16).
GenBank accession numbers of DNA sequences. The sequence of ISHp609 from strain HUP-B43 corresponds to nucleotides 14337 through 16733 of the 33,671-nucleotide sequence of the plasticity zone in GenBank accession number AY487825. The other three full-length ISHp609 elements were accessioned in GenBank as follows: from strain HUP-B79, nucleotides 287 through 2684 of the 3,084 nucleotides in accession number AY639112; from strain LitA7, nucleotides 381 through 2794 of the 2,875 nucleotides in accession number AY639110; and from strain LitA38, nucleotides 20 through 2417 of the 2,603 nucleotides in accession number AY639111. The full length of the rare type of ISHp609 element (ISHp609var) from strain Chen4 corresponds to nucleotides 313 through 2346 of the 2,496 nucleotides in accession number AY639119. The GenBank accession numbers for partial ISHp609 sequences (all but 24 nucleotides on the left and 34 nucleotides on the right due to the positions of PCR binding sites) were as follows: strain HUP-B80, AY639113; Alaska64, AY639115; Alaska97, AY639116; AfricaR48, AY639114; I-86, AY639117; and SJM27, AY639118.
|
|
|---|
![]() View larger version (10K): [in a new window] |
FIG. 1. Structures of ISHp609 and related elements. (A) Full-length predominant ISHp609 type with four characteristic open reading frames and 99% DNA identity, independent of geographic origin. Sequence analysis of 10 full-length elements from Spanish strains (HUP-B43, HUP-B79, and HUP-B80; accession numbers AY487825, AY639112, and AY639113), Lithuanian strains (Lit-7 and Lit38; accession numbers AY639110 and AY639111), Alaskan strains (Al64 and Al97; accession numbers AY639115 and AY639116), a Peruvian strain (SJM27; accession number AY639118), an Indian strain (I-86; accession number AY639117), and an African strain (R48; accession number AY639114) showed limited internal divergence. In strain Lit7 orfA contained a frameshift due to a 14-bp duplication; in strain HUP-B79 orfA contained an in-frame stop codon due to a G-to-T substitution; and in strain I-86 orf1 contained an in-frame stop codon due to a C-to-T substitution. (B) ISHp609var. This rare variant element was found in one Indian strain (Chennai4; accession number AY639119) with 81% DNA identity to the predominant type. ISHp609var has full-length orfA and orfB genes, although orfB is probably inactive due to a frameshift. It lacks orf2 and most of orf1, and it has the first 79 bp of orf1, but there is no start codon at its left end. (C) orf1-2 remnant. This DNA segment consists of orf1 (jhp960 in strain J99), orf2 (jhp961 in strain J99), and the first 37 bp of orfA and is located between homologs of jhp959 and jhp962 in most or all strains (as determined by PCR with primers jhp959 and jhp962, primers jhp959 and 609.R6, and primers FlankL and jhp962).
|
![]() View larger version (28K): [in a new window] |
FIG. 2. Phylogenetic relationships among H. pylori IS elements. (A) OrfA and homologs. Two subfamilies were identified, one represented by OrfAs of IS605 (accession number NP_208326), IS606 (accession number AAD11513), and ISHp608 (accession number AAL06576), which are not considered to encode serine recombinases based on amino acid homologies, and the other represented by IS607 (accession number AAF05600), ISHp609 (accession number AAR83266.1), ISHp609var (accession number AY639119), and the closest homolog in T. tengcongensis (T. teng.) (tte0714; accession number AAM23976), which are thought to encode serine recombinases. Branches with significant bootstrap support ( 50) are indicated. Bar = 1 amino acid substitution per site. (B) OrfB and homologs. OrfBs of H. pylori IS605 (accession number NP_208324), IS606 (accession number AAD11514), IS607 (accession number AAF05601), ISHp608 (accession number AAL06577), and Salmonella's GipA protein (accession number NP_752781) form an OrfB subfamily different from that of ISHp609 (accession number AAR83267.1), ISHp609var (accession number AY639119), and the closest homolog in T. tengcongensis (tte0715; accession number AAM23977). (C) Partial C-terminal sequence alignment of H. pylori IS element OrfBs, GipA (22), and the corresponding sequence in T. tengcongensis. A single C-terminal Zn(II) binding tetracysteine motif, CX(2)CX(15)CX(2)C (C4-type zinc finger), is well conserved among IS605, IS606, IS607, and ISHp608 OrfBs and GipA and might potentially facilitate DNA or RNA binding or protein-protein interaction. Notably, this motif is not present in ISHp609 OrfB (both predominant and variant).
|
Two short coding sequences in ISHp609, orf1 (85 codons) and orf2 (61 codons), closely matched two genes of reference strain J99 whose functions are unknown, jhp960 (99% DNA identity and 100% protein identity) and jhp961 (98% DNA identity, 96% protein identity, and 98% protein similarity). These open reading frames are located between jhp959 and jhp962 in strain J99, not within jhp928, but they are absent from the other fully sequenced genome (strain 26695) (24). Although annotated as a gene whose function is unknown, orf1 belongs to a gene family whose protein products generally contain a characteristic ligand binding domain next to a helix-turn-helix DNA binding domain (Pfam domain 03681, UPF0150; 14 to 68 bp; E value, 1e10), and the protein encoded by orf2 exhibits a conserved predicted periplasmic or secreted lipoprotein motif throughout its length (61 amino acids; cluster of orthologous groups, COG1724; E value, 6e13). Small genes related to orf1 and orf2 are found in many bacterial species, in some cases together (for example, ssr1765 and ssr1766 in Synechocystis sp. strain PCC 6803 [GenBank accession numbers BAA16930 and BAA16931], with 46% amino acid identity and 70% amino acid similarity to orf1 and 35% amino acid identity and 38% amino acid similarity to orf2, respectively). However, most homologs of either orf1 or orf2 occur singly and are not linked to a homolog of the other gene in other bacterial species. No orfA or orfB homologs were found next to ssr1765 and ssr1766 in Synechocystis, nor were orf1 or orf2 homologs found next to the orfA and orfB homologs (tte0714 and tte0715) in T. tengcongensis.
ISHp609 elements and adjacent DNAs from three additional strains (strains HUP-B79, LitA7, and LitA38) were sequenced by primer walking on genomic DNAs in order to better understand the structure of the element and its insertion specificity. All three elements were 2.4 kb long, contained four open reading frames (orf1, orf2, orfA, and orfB) matching the open reading frames found in strain HUP-B43, and were 99% identical in DNA sequence to one another. A comparison of sequences flanking ISHp609 with corresponding empty sites in the 26695 and/or J99 genome showed that the element was located at a different genetic locus in each strain (Fig. 3A). These results, coupled with PCR data (see below), indicate that ISHp609 is transposable and that it can insert into many genomic sites. Sequence comparisons also identified the probable termini of the element and its insertion specificity. Based on data summarized in Fig. 3A, the ends of ISHp609 seemed to be 5'-TAT or CAT on the left and 5'-CAT on the right. The left end was inserted preferentially next to 5'-TAT, whereas there seemed to be no target specificity for the right end, and there was no evidence of target sequence duplication or deletion during insertion.
![]() View larger version (29K): [in a new window] |
FIG. 3. Terminal sequences of ISHp609, ISHp609var, and the orf1-2 remnant and their sites of insertion in H. pylori. ISHp609, ISHp609var, and orf1-2 remnant termini are in uppercase type. Flanking DNA and empty sites in reference strains 26695 and J99 (gene designations beginning with hp and jhp, respectively) are in lowercase type. (A) ISHp609 predominant type. (B) ISHp609var type. Terminal sequences were extrapolated from a comparison with sequences of the predominant ISHp609 type, because its flanking sequences did not have homology with known H. pylori sequences and therefore the site of insertion (empty site) was not known. (C) Predicted left end of the orf1-2 remnant and its flanking sequences. Sites of insertion could not be determined precisely due to local sequence heterogeneity in strains lacking this element (intergenic region between jhp959 and jhp962) (see Fig. S2 in the supplemental material). (D) Predicted right end of the orf1-2 remnant compared to the corresponding region of orfA in the ISHp609 sequence.
|
99% identical in DNA sequence to other elements, regardless of the geographic origin. The ratio of synonymous to nonsynonymous changes in the four open reading frames ranged from 3.2 to 7.0 (average, 5.0) (Table 2), values that are typical of H. pylori housekeeping genes (1, 9). |
View this table: [in a new window] |
TABLE 2. DNA divergence in ISHp609
|
In further experiments, a plasmid clone of ISHp609 from strain HUP-B43, marked with a chloramphenicol resistance determinant, was used to select for transposition to the F factor pOX38 in E. coli in a standard mating-out assay, essentially as done previously with IS607 and ISHp608 (14, 16). However, no transposition (<109) of this marked ISHp609 element was detected in any of four repetitions of this assay. In contrast, IS607 and ISHp608 transposed at frequencies of about 107 in equivalent assays (14, 16).
ISHp609 geographic distribution and structural analysis. The frequency of ISHp609 carriage in various H. pylori populations was estimated by PCR with 479 strains by using two orfB-specific primers (primers 609.F1 and 609.R1). ISHp609 was found in 35 to 40% of the strains from Europe (Spain and Lithuania), in 20 to 40% of the strains from the Americas (Peru, Guatemala, and Alaska), and in 10 to 15% of the strains from India and Africa but in only 1% of the strains from East Asia (Table 3). All 479 strains, including 68 additional strains from India, were tested for the variant type (ISHp609var) that had been found in one Indian strain by using specific primers (primers 609t2-1 and 609t2-2). No additional strains harboring this variant were found. This seemed to be reminiscent of the single case of a rare type 3 ISHp608 element found in one Indian strain but not in 116 other Indian strains or in 606 strains from other regions (16).
|
View this table: [in a new window] |
TABLE 3. Distribution of ISHp609 in different geographic regions
|
The connection between jhp959 and jhp962 was highly conserved among all H. pylori strains, independent of the geographic origin; PCR with primers jhp959 and jhp962 showed that jhp959 and jhp962, which flanked the orf1-2 remnant whenever it was present, were next to one another in at least 96% of the orf1-2 remnant-free strains. This included reference strain 26695, in which the hp422 gene, a close homolog of jhp962 (100% protein identity), is adjacent to hp423, a distant homolog of jhp959 (49% protein identity and 64% protein similarity).
To try to define a precise insertion site of the orf1-2 remnant, we sequenced the putative empty site region in 15 strains that lacked this remnant. Here too jhp959 sequences and the intergenic region leading to jhp962 were highly divergent, both in length and in base substitution differences (25 to 30% DNA divergence) (see Fig. S2 in the supplemental material). Consequently, no unique empty site sequence could be discerned. The reason for this extreme diversity is not known. One possible explanation involves slipped strand mispairing during normal DNA replication; an alternative explanation involves deletion of an element and error-prone gap repair.
Other deletions in ISHp609 were also common; up to one-half of ISHp609 elements were truncated, mostly from the right end, terminating at different points in orfB or orfA sequences (Table 2). A distinct 0.9-kb internal deletion due to recombination between 11-bp direct repeats (CCTT[T/G]CTAAAA) located in orf1 and the 3' end orfA was found in ISHp609 in three of nine Peruvian strains and two of nine Spanish strains (this study). This sequence matched a separately reported sequence from a Costa Rican strain (clone CR2 in reference 6; accession number AF326626). This 0.9-kb deletion was not found in any ISHp609 element from 15 Lithuanian, 10 Alaskan, 8 Guatemalan, 4 Indian, 3 African, and 2 Korean strains. In all six cases the ISHp609 element with the 0.9-kb internal deletion was inserted at the same chromosomal site that was defined by PCR (with primers CR2-LF and CR2-RF). Because the 0.9-kb deletion is less widely disseminated than the orf1-2 remnant, it may have arisen more recently, but it also may have been spread by interstrain recombination. Full-length ISHp609 sequences from two other strains (one Lithuanian strain and one Guatemalan strain) were found in the same location as the elements containing the 0.9-kb internal deletion.
PCR was used to test for ISHp609 at each of the other four chromosomal sites identified by sequencing (Fig. 3A) (locations in strains HUP-B43, HUP-B79, LitA7, and LitA38) and also the jhp959-jhp962 location of the orf1-2 remnant (with primers in Table 2) in 60 ISHp609-positive strains from Spain, Peru, Guatemala, Lithuania, Alaska, Africa, India, and Korea. ISHp609 was found at the HUP-B43 site (in the jhp928 homolog) in two of four ISHp609-positive Indian strains but not in the other 56 ISHp609-positive strains from various countries. No other strain was found to carry ISHp609 at the specific sites containing this element in strains LitA7, LitA38, and HUP-B79. Also, no full-length ISHp609 element was found between jhp959 and jhp962, the site normally occupied by the orf1-2 remnant.
|
|
|---|
ISHp609 was more abundant in strains from Europe, the Americas, and South Asia (10 to 40%) than in strains from East Asia (1%) (Table 3). This is reminiscent of the ISHp608 distribution (16). The difference between East Asian and other strains in terms of the relative abundance of the ISHp608 and ISHp609 elements is in accord with the major geographic differences in the H. pylori gene pool, as determined by PCR or multilocus sequence typing of virulence-associated and ordinary housekeeping genes (1, 9, 15, 26, 28), and also with the geographic differences in the sequences of orfA and orfB from IS605 and IS607 elements (11a). This geographic partitioning can be ascribed to H. pylori's preferential transmission within families and local communities rather than in sweeping epidemics (10, 21).
The sequences of ISHp609 elements from various geographic regions are more closely related to one another (99%) than normal chromosomal gene sequences from the same populations are. This is reminiscent of patterns seen with IS elements of natural isolates of E. coli, where the sequences within each IS element type are also more highly conserved than the sequences of chromosomal genes are (11). This, plus the frequent occurrence of elements on bacterial plasmids, suggested that E. coli IS elements were spread easily among bacterial lineages by conjugation and transposition (18). Such easy spread would have allowed little time to accumulate neutral variation and would have resulted in the observed sequence homogeneity. Most strains of H. pylori are readily transformable, and extensive interstrain recombination is evident in the gene pool (1, 8, 13, 23), although it is restricted geographically because transmission is highly localized. The distribution and sequence uniformity of ISHp609 suggest that this element entered the H. pylori gene pool recently on the evolutionary time scale (perhaps after separation of East Asian strains from other H. pylori strains) and spread rapidly, leading to its present abundance in some regions (up to 40% of the strains). This suggests that there was selection for ISHp609 carriage, stemming from a significant contribution to H. pylori host fitness and/or strong molecular drive (selfish DNA). The similar sequence uniformity and distribution of the orf1-2 remnant (orf1, orf2, and 5' end of orfA) also favor a selection model, but one in which orf1 and/or orf2 contributes to fitness. The abundance of truncated (inactive) ISHp609 elements also suggests such dynamics, but with selection for mutations that rendered many elements inactive. This is reminiscent of the dynamics of P elements in Drosophila (17).
The ISHp609 gene(s) that mediates transposition has not been identified experimentally because we could not, in several tries, detect this element's movement in E. coli. The many genomic locations of ISHp609 in H. pylori populations imply that there is easy transposition in this host and thus perhaps involvement of a host factor that E. coli K-12 lacks. The constancy of the genomic positions of two ISHp609 deletion variants (orf1-2 remnant and 0.9-kb internal deletion) suggests that neither orfB (intact in the 0.9-kb deletion element) nor orf1 and orf2 (intact in the orf1-2 remnant) are sufficient for ISHp609 transposition. Rather, orfA of ISHp609 may be needed for movement of this element, as is the case with orfA homologs of IS607 and ISHp608 (14, 16).
Other questions concerning transposition mechanisms emerged from the analysis of ISHp609 sequences. The OrfA phylogenies (Fig. 2A) illustrate two distinct subfamilies, one represented by the orfA genes of IS607 and ISHp609, which appear to encode serine recombinases, and the other represented by the orfA genes of IS605, IS606, and ISHp608, which encode another type of transposition-associated function. Despite ISHp609 orfA's homology to IS607's putative transposase gene, its insertion specificity (downstream of 5'-TAT) seems to be more closely related to that of the IS605-IS606-ISHp608 subfamily (downstream of 5'-TTTAA or 5'-TTTAAC, 5'-TTTAT, and 5'-TTAC) than to that of IS607 (between GG). Further studies are needed to determine if this stems from functional differences in IS element-encoded proteins or in host factors with which they interact. In terms of orfB phylogenies (Fig. 2B), the orfB gene of ISHp609 is more distant from the gipA Salmonella virulence gene than the genes of other IS605 family members are. Therefore, together with its homolog in T. tengcongensis, this gene could be considered a member of a distinct subfamily. Most striking in this regard is the C-terminal tetracysteine Zn(II) binding motif CX(2)CX(15)CX(2)C (C4-type zinc finger) that could potentially facilitate DNA or RNA binding or protein-protein interaction. This motif is found in GipA and the orfB products of IS605, IS606, IS607, and ISHp608, but it is absent from the ISHp609 orfB product (both predominant and rare variant types) and is also absent from the close homolog in T. tencongensis (Fig. 2C). Although the functions of IS element orfB genes are not known, the zinc finger motif difference between OrfB of ISHp609 and the other elements hints at possible differences in the interactions with nucleic acids or other cellular constituents, which in turn might affect transposition or control of host gene expression.
This research was supported by grants RO1 AI38166, RO3 AI49161, RO1 DK53727, RO1 DK63041, and P30 DK52574 from the National Institutes of Health.
Supplemental material for this article may be found at http://jb.asm.org. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»