Previous Article | Next Article ![]()
Journal of Bacteriology, December 2004, p. 8114-8122, Vol. 186, No. 23
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.23.8114-8122.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Infectious Diseases and Clinical Microbiology, Nuffield Department of Clinical Laboratory Sciences, John Radcliffe Hospital,1 Institute of Biological Anthropology,5 Molecular Infectious Diseases, Department of Paediatrics, Oxford University,6 Centre for Ecology and Hydrology, Oxford,2 The Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom,3 Virginia Polytechnic Institute and State University, Blacksburg, Virginia4
Received 7 May 2004/ Accepted 24 August 2004
|
|
|---|
-Proteobacteria. These diverse genomic islands shared a common evolutionary origin, insert into tRNA genes, and have diverged widely, with G+C contents ranging from 40 to 70% and amino acid homologies as low as 20 to 25% for shared core genes. These core genes are likely to account for the conjugative transfer of the genomic islands and may even encode autonomous replication. Accessory gene clusters were nestled among the core genes and encode the following diverse major attributes: antibiotic, metal, and antiseptic resistance; degradation of chemicals; type IV secretion systems; two-component signaling systems; Vi antigen capsule synthesis; toxin production; and a wide range of metabolic functions. These related genomic islands include the following well-characterized structures: SPI-7, found in Salmonella enterica serovar Typhi; PAP1 or pKLC102, found in Pseudomonas aeruginosa; and the clc element, found in Pseudomonas sp. strain B13. This is the first report of a diverse family of related syntenic genomic islands with a deep evolutionary origin, and our findings challenge the view that genomic islands consist only of independently evolving modules. |
|
|---|
It was recognized that this antibiotic resistance was transferred by conjugation and was encoded by 40- to 60-kb related mobile elements (10, 14, 15, 25, 31-33). Transposons, inserted at separate sites into the conjugative element, contained the resistance genes. The ß-lactamase gene encoding Apr formed part of Tn3, and the genes encoding tetracycline and chloramphenicol resistance resided in a Tn10-like compound transposon (14, 15, 25, 26). Surprisingly, free plasmids could not be easily detected in clinical isolates (10, 52). However, extrachromosomal plasmids could be isolated from transconjugants (10, 52). It is now recognized that these related resistance elements not only conjugate but integrate site specifically into H. influenzae tRNALeu (11). This site-specific insertion into a tRNA gene was intriguingly similar to genomic islands (GIs) (13, 23) and raised the question of whether these large conjugative resistance elements had evolutionary relationships with GIs.
Genomic islands which are part of the horizontal or flexible gene pool often constitute a large part (more than 10%) of bacterial genomes (22, 23, 29, 49). These islands are typically characterized by a G+C content different from that of the host genome. In Proteobacteria, they are often integrated into tRNA genes (13, 22). Genes important in habitat-specific adaptation cluster on these islands (13, 22). Well-recognized examples include islands with genes involved in pathogenesis (23), avirulence (1), chemical degradation (58), antibiotic resistance (27), and metabolic functions (30). Hitherto, the evolutionary origin of these islands has not been elucidated (13). This is not surprising as the islands are thought to consist of modules with diverse origins joined together in a single structure, often referred to as a mosaic (5, 56). Such a structure would not have a unified evolutionary history. This mosaic type of structure is supported by studies of the organization of a number of well-characterized elements and closely related elements, such as SXT (2) and ICESt1 (6, 40). These elements have only short regions of homology with other distantly related elements that correlate with a single module. However, investigators have recently reported unexpectedly extensive homologies between the completed genome sequences of a wide range of Proteobacteria and the GIs PAP1 and pKLC102 found in Pseudomonas aeruginosa (24, 27), the clc element found in Pseudomonas sp. strain B13 (58), and the pathogenicity island SPI-7 found in Salmonella enterica serovar Typhi C18 and Ty2 (41). The extent of these homologies invites the thought that some GIs may have a coherent structure with a shared origin.
Here we report the findings of a systematic analysis of GIs exhibiting extensive homology. Homologies for most of these islands have recently been reported by other workers (24, 27, 41, 58), but the evolutionary implications of these homologies were not studied. The starting point of this investigation, however, was a bioinformatics study of the complete sequence of a large conjugative Haemophilus resistance element, ICEHin1056, which was previously referred to as p1056 (3, 10, 11, 35). Identification of homologous sequences in ICEHin1056 and elements or genomic islands found in ß- and
-Proteobacteria indicated the presence of a syntenic core element with a shared evolutionary origin. A wide range of accessory gene clusters with phylogenies independent of the phylogenies of the core genes nestled between the core genes of each related GI.
|
|
|---|
Bioinformatics analysis. By interrogating the National Center for Biotechnology Information (NCBI) database with the TBLASTX algorithm, we identified GIs homologous to ICEHin1056. The input ICEHin1056 sequence consisted of the entire element with all the antibiotic resistance-associated sequences (i.e., Tn10-like and Tn3 sequences) removed. Sequences producing significant alignments (i.e., e value scores of <105) were individually interrogated by TBLASTX searching for contiguous sequences homologous to ICEHin1056. Twenty candidate GIs (including ICEHin1056) were identified for further investigation.
The Artemis comparison tool (ACT) (44) was used to visually compare GIs pairwise for homology by using the TBLASTX algorithm. Preliminary analysis of open reading frames of the homologous GIs indicated that four potential GIs contained only a short sequence with shared homology or were from incompletely sequenced genomes that limited or precluded investigation of a possible coherent GI. The remaining 16 GIs with extensive homology were investigated further. Initially, to determine possible phylogenetic relationships among these 16 GIs, the amino acid sequences encoded by two predicted genes present in all of them were aligned by using ClustalX for multiple-sequence alignment and tree estimation (55). This resulted in identification of four clusters. One well-known representative GI of each cluster (ICEHin1056, SPI-7 [41], PAP1 [24], and PAGI-3 [30]) was used to distinguish sequences shared by these related GIs and thereby identify potential core genes.
Core GI sequences were defined as GI sequences that were present (BLASTP e value, <105) in at least three of the four GIs. The presence of GI genes was visually scored by performing pairwise comparisons of each of the GIs with ACT and individually aligning potential homologues with the BLASTP algorithm. Thirty-three core genes fitting the definition were identified (designated genes 1 to 33) (Fig. 1). The sequence of each gene was used to construct four virtual core GIs representative of ICEHin1056, SPI-7, PAP1, and PAGI-3. Virtual core GIs consisting of either amino acid sequences or nucleotide sequences were constructed for each of the four GIs. For the genes missing from a GI, the homologue from PAP1 was used. The concatenated nucleotide sequences of core genes from each of these four representative GIs were used to systematically reinterrogate the NCBI databases with TBLASTX to identify any additional GIs missed in the original screening with ICEHin1056 (none were identified).
![]() View larger version (40K): [in a new window] |
FIG. 1. Array of predicted genes present in related coherent and syntenic GIs found in Proteobacteria. The clc element is also present in the complete sequence of B. fungorum. The tRNA data indicate the site of GI integration. The G+C contents of both the genomic island and the host genome are indicated. un, unnamed.
|
75%) of the 33 core genes. Sequences of the virtual core GI genes are available at http://www.ndcls.ox.ac.uk/mohd-zain.html. The inferred attributes of the predicted noncore genes for all the GIs were determined by reannotation of the GIs by using Artemis and BLASTP available through NCBI. As a negative control, the virtual core element was compared to the following three well-characterized GIs (12) that had not been found to show homology during the TBLASTX interrogation of the NCBI database: PAI I536 (GenBank accession number AJ488511), PAI II536 (GenBank accession number AJ494981), and PAI III536 (GenBank accession number X16664).
![]() View larger version (58K): [in a new window] |
FIG. 2. Modified Artemis Comparison Tool view of the homologous regions identified in four genomic islands (ICEHin1056, SPI-7, PAP1, and the clc element) and the core genes of the virtual genomic island. Homologous sequences (>30% amino acid identity) are indicated by red lines joining regions of the five schematic representations of the GIs.
|
Nucleotide sequence accession number. The nucleotide sequence of ICEHin1056 has been deposited in the GenBank database under accession number AJ627386
|
|
|---|
75%] were present). These GIs were restricted to ß- and
-Proteobacteria (two and nine species, respectively). The isolates included members of the Burkholderiaceae (Burkholderia fungorum and Ralstonia metallidurans), members of the Pasteurellaceae (one H. influenzae isolate, one Haemophilus ducreyi isolate, and two Haemophilus somnus isolates), three members of the Enterobacteriaceae (one S. enterica serovar Typhi isolate, one Yersinia enterocolitica isolate, and one Photorhabdus luminescens isolate), and seven members of the Pseudomonadaceae (three P. aeruginosa isolates, two Pseudomonas fluorescens isolates, one Xanthomonas axonopodis isolate, and Pseudomonas sp. strain B13). As the genomic island in B. fungorum (GenBank accession number NZ_AAAJ00000000) exhibits nucleotide sequence identity to the clc element of Pseudomonas sp. strain B13 (58), it was not analyzed as a separate GI. The species and strains harboring the syntenic and coherent GIs are listed in Fig. 1 and Table 1. |
View this table: [in a new window] |
TABLE 1. Species with syntenic GIs, their sizes, Genbank accession numbers, and major attributes of accessory genes
|
Features of the core genes. The elements were all predicted to or are known to integrate into tRNA genes, as shown in Fig. 1. There were five different tRNA genes that were sites of insertion for these GIs, and 6 of 15 GIs inserted into tRNAGly. The core genes of each of the GIs were largely in the same order (i.e., synteny) (Fig. 2). The level of amino acid sequence homology between GIs at the extremes of homology was as low as 25 to 30% for many of the shared core genes, indicating that there was wide evolutionary divergence. Of the 33 core genes, 10 had homology to genes with known functions. These included four genes (parA, dnaB, ssb, and topB) among genes 1 to 10 (Fig. 1 and 2). Homologues of these genes are recognized to play a role together in plasmid replication (16, 20). One of the genes in this region, inrR, has recently been shown to regulate integrase function in the clc element and controls excisive and integrative recombination of that element with tRNAGly (48). For the most part the functions of genes 11 to 32 are unknown, but four of these genes exhibit homology to pilL (core gene 11), virB4 (core gene 27), traD (core gene 17), and a relaxase or traI (core gene 32); homologues of these four genes are recognized to play a role in conjugative DNA transfer (60).
Integrases were found in all GIs other than that in R. metallidurans. As the DNA sequence of this organism is not complete, it is premature to conclude that an integrase associated with this GI is absent. The remaining GIs possess an integrase of either the P4 or XerC/D lineage. SPI-7, the pathogenicity island found in S. enterica serovar Typhi, possesses copies of both types of integrase (Fig. 2). The xerCD integrase gene that was immediately adjacent to the innermost tRNAPhe sequence present at the right end of the GI (the putative attR) was selected as the integrase gene shown in Fig. 2, because only this integrase gene was consistently present in SPI-7-like GIs identified in a range of Salmonella enterica serovars (46).
Fifteen of the core GI genes were conserved in all the GIs (Fig. 1). The G+C contents of the GIs were determined and ranged widely, from 40.2% for H. somnus 2336 to 69.6% for X. axonopodis. Compared to the average G+C content of the host's genome, nine GIs had higher G+C contents. These GIs consisted of all the GIs found in members of the Pasteurellaceae and Enterobacteriacae (except SPI-7 in S. enterica serovar Typhi, which had a lower G+C content than its host genome). Similarly, the GIs found in Pseudomonas sp. strain B13, B. fungorum, (genomic G+C content, 61.8%), R. metallidurans, and X. axonopodis had higher G+C contents than their hosts' genomes. However, the GIs found in both P. fluorescens isolates and the GIs PAGI-3, PAP1, and pKLC102 found in P. aeruginosa had lower G+C contents than their hosts.
Phylogenetic relationships between GIs. The phylogenetic relationships of each of the 15 genes common to all the GIs exhibited a congruent structure, with a few minor exceptions. The trees for four representative core genes (genes 1, 6, 18, and 27) are available as supplemental material (http://www.ndcls.ox.ac.uk/mohd-zain.html). The phylogenetic relationships of the GIs as determined by alignment of the concatenated amino acid sequences of all 15 conserved genes are shown in Fig. 3 and are congruent with the phylogenetic relationships of each of the individual genes.
![]() View larger version (16K): [in a new window] |
FIG. 3. SplitTree tree based on ClustalX analysis of the 15 GIs. The amino acid sequences of the 15 predicted genes common to all 15 GIs were concatenated and aligned by using ClustalX. The alignment of each of the genes alone was consistent with the alignment illustrated. Each strain containing a GI is identified, and where available, the designation of the GI (e.g., ICEHin1056) is given in parentheses. S. typhi, S. enterica serovar Typhi.
|
|
|
|---|
It has been hypothesized that GIs are mosaics of independently evolving genes or clusters of genes (5, 56). This is referred to as modular evolution. In the reports of Burrus et al., it was demonstrated that heterogeneous modules were assorted in different GIs (5, 6, 40). Such a lack of coherent structure is probable following repeated recombination arising from frequent horizontal spread of GIs. This would be expected to erase coherent structures in different GIs over time. The findings reported here, at least, identify one family of related GIs that challenges this hypothesis. A coherent syntenic element with a common evolutionary origin has been found. The accessory genes of these related GIs are inserted in a manner typical of modular evolution. However, variable modules consisting of accessory genes nestle in a conserved syntenic core GI. The presence of a syntenic core GI suggests a fitness property of the conserved core genes acting collectively that has ensured their survival as a coherent syntenic whole. It is unclear what property this may be, yet it is reasonable to surmise that it is the ability of these GIs to transfer and propagate better as a coherent whole. The role of accessory genes in the survival of this family of elements appears to be interwoven with allowing organism-specific adaptation to habitats. This necessarily involves complex interactions among the core syntenic GI, accessory genes, and the survival of the host bacterium in new hostile environments.
Bioinformatics analysis of the core genes and review of the functional properties of some of the well-characterized examples of this family of GIs provide insight into the possible functions of the core GI genes. An integrase gene is present in all but one (R. metallidurans) of these GIs. The integrase genes are likely to encode integration with tRNA genes, and all belong to the family of tyrosine integrase genes. The only GI in which this has been conclusively shown is the clc element found in Pseudomonas sp. strain B13, which contains a homologue of the P4 integrase gene (43). A curious aspect of the tyrosine integrase genes found in these GIs is that they belong to two different lineages, a P4 lineage and a XerC/D lineage. Both types of integrase genes are known to encode recombination with tRNA genes in other mobile elements (36). This is the only core gene of these syntenic elements which has an evolutionary history more typical of modular evolution. How members of the GIs have acquired these two divergent lineages of integrase genes is intriguing, and the available data provide no obvious explanation. The observation that SPI-7 found in S. enterica serovar Typhi contains both integrase gene lineages, while SPI-7-like GIs found in other S. enterica serovars do not (46), may suggest that there are ready opportunities for exchanging integrase gene lineages.
How GIs transfer horizontally has not been clearly explained (13), although it has been recognized that the clc element is a genomic island and is capable of conjugative transfer (58). The observation made here that another conjugative element, ICEHin1056, found in H. influenzae, is also a related GI strengthens this observation. Hitherto, only the clc element and ICEHin1056 have been shown to conjugate. Both the clc element and ICEHin1056 transfer by conjugation at frequencies of 106 to 107 (transconjugants/donors) (10, 43). The genes encoding this property have not been identified in either GI. However, the genes encoding this function are likely to be present among the core genes as none of the accessory genes present in either GI is shared. Also, none of their accessory genes is known to encode conjugative functions. Furthermore, four of the core GI genes are homologous to pilL, traD, virB4, and traI and are interspersed along a contiguous 22-gene segment of the GI. These four genes are known to be involved in DNA conjugative transfer systems (60). Therefore, it can be inferred that core genes are likely to encode transfer functions that would explain how this family of GIs transfers between hosts. Before this can be firmly concluded, however, this possibility needs to be formally investigated and demonstrated experimentally.
Autonomous replication would be an unexpected property of GIs (13). However, 4 of the first 10 genes are homologues of parA, dnaB, ssb, and topB, which are genes known to be associated with plasmid replication (16, 20). Furthermore, both ICEHin1056 found in H. influenzae and pKLC102 found in P. aeruginosa C appear to exist as extrachromosomal closed circular plasmids under some conditions (9, 10, 27, 35). ICEHin1056 or other related Haemophilus conjugative elements are found largely in plasmid form in transconjugants immediately following conjugation (9, 10). This conclusion is supported by two observations: first, ready isolation of plasmids from transconjugants and not from parent donor clinical isolates (10); and second, the appearance of Southern blot hybridization patterns consistent with a closed circular form in transconjugants and a pattern indicating chromosomal integration in parent donor strains (9, 10, 35). pKLC102 has not been formally shown to replicate autonomously; however, Klockgether et al. reported the presence of an extrachromosomal form and sequences close to the ssb gene with features typical of an oriV (27). These data suggest that some, if not all, of these GIs may be capable of autonomous replication under some conditions, but this needs to be formally determined experimentally. If this was found to be true, it would be a new, hitherto unrecognized dimension of genomic islands (13, 22) or the integrating and conjugating elements (ICEs) reviewed by Burrus et al. (5). Furthermore, an element that was capable of integrative and excisive recombination, conjugative transfer, and autonomous replication would require finely coordinated regulation of these functions to ensure that their timing was appropriate to the relevant state of the GI.
The origin and host range of this family of GIs are not apparent from the data obtained in this investigation. The wide range of G+C contents (40 to 70%) for the shared core GI genes is not substantially different from the range found for ß- and
-Proteobacteria (28). This could be interpreted to suggest that there is a longstanding relationship with this phylum. This interpretation is based on the proposition that GIs are constrained to this phylum and the proposition that horizontally acquired DNA with a different G+C equilibrates over time with its host's genomic G+C content through a process referred to as amelioration (34). It follows that for this family of GIs to have evolved such divergent G+C contents, substantial time in hosts with similarly divergent G+C contents would have been necessary. It would also require limited horizontal transfer and recombination between divergent members of this family of GIs, as this would tend to homogenize the G+C contents. The presence of two GIs with G+C contents of 68.2 and 69.6%, which are markedly higher than the corresponding host genomic G+C contents (namely, the G+C contents of R. metallidurans [63.5%] and X. axonopodis [64.7%]) is curious. The fact that these G+C contents are at the upper limit of the values for Proteobacteria (28) raises the question of whether these GIs originated from nonproteobacterial hosts with higher G+C contents than Proteobacteria. Conclusive evidence of this may become apparent from the growing number of whole genomic sequences available for analysis and improved algorithms for interrogating the databases for large contiguous and weakly homologous sequences.
Antibiotic resistance accessory genes and this family of GIs. The observation of evolutionarily related coherent and syntenic GIs provides new insight into how the recent emergence and spread of ß-lactamase-positive Apr and/or tetracycline and chloramphenicol resistance in H. influenzae occurred. The phylogenetic relationship of ICEHin1056 to other distantly related GIs indicates that transferable resistance in H. influenzae has deep evolutionary origins. The resistance genes, conveyed by transposons (e.g., Tn10 or Tn3) (15, 26), form clusters of accessory genes in the core element that have apparently evolved stable relationships. No other accessory genes with different properties were apparent from an analysis of the whole sequence. The emergence of this resistance element in pathogenic H. influenzae only became readily detectable in the early 1970s (4, 14, 42, 53, 54). Curiously, resistance then rapidly emerged over the next few years worldwide among pathogenic H. influenzae strains and became very prevalent (20 to 30%) in many countries (42, 47). Evidence indicates that this resistance is accounted for by the appearance of ICEs (GIs) that are highly related to ICEHin1056 (9-11, 35, 42, 47). The epidemic expansion of this family of ICEs among pathogenic H. influenzae and the high prevalence among other commensal haemophili (35, 45, 46) suggest that this GI is well adapted to these bacterial hosts and provides a survival advantage under antibiotic exposure conditions. An intriguing question is whether the acquisition of resistance transposons and their intimate and apparent stable relationship with the GIs in haemophili evolved recently or have deeper origins preceding the antibiotic era. The latter possibility would support the notion that antibiotic resistance is an ancient adaptive response that pre-evolved in bacteria. Modern-day intensive use of antibiotics only serves to change the distribution or organization of the pre-evolved structures.
The association of resistance genes with this family of GIs is not unique to H. influenzae. GIs found in both H. somnus and P. aeruginosa contain accessory genes with homology to resistance genes. In the case of H. somnus 2336, homologues of tetA and romA (a multidrug resistance gene) are present, and the strain is tetracycline resistant (the MIC of tetracycline is 8 µg/ml [Inzana, unpublished data]). In P. aeruginosa strain C, the GI, pKLC102, contains aadB that encodes tobramycin and gentamicin resistance (27).
Major properties of other accessory genes. The accessory genes conveyed by the family of GIs described here have a wide range of attributes other than antibiotic resistance that contribute to the habitat-specific survival of their host organisms. The next most common associated attribute is type IV secretion found in five of the GIs, two of which, PAP1 (24) and SPI-7 (41), are well-described pathogenicity islands. The genes of all five GIs were homologous and in synteny, suggesting that they share a common evolutionary origin. The type IV secretion system in SPI-7 is involved in the adhesion of S. enterica serovar Typhi to macrophages and is important to S. enterica serovar Typhi's intracellular life cycle and in disease causation (38, 57, 61). The role of genes encoding the type IV secretion system in PAP1 is less clear. These genes were not reported as contributing to the virulence of P. aeruginosa PA14 (24). However, the most likely role of type IV secretion found in these GIs is adherence of the host organisms to specific targets in their habitats.
Other genes important in virulence are present in these GIs. The Vi antigen encoded by SPI-7 is a well-known capsular antigen characteristic of S. enterica serovar Typhi that contributes to the pathogenesis of typhoid fever and is a valuable vaccine antigen (41). PAP1 contains a number of virulence-associated accessory genes, including two-component regulators that may be involved in regulating pathogenicity activities (24). The island found in H. ducreyi contains the cytolethal distending toxin genes, which have an uncertain role in disease pathogenesis (19, 51, 59).
The accessory genes important in the survival of organisms in environmental habitats include a range of genes encoding metabolic functions. These genes are found in organisms that live in potentially nutrient-deficient environments, such as water or soil, including P. aeruginosa, Pseudomonas sp. strain B13, B. fungorum, and R. metallidurans. Another attribute of the accessory genes of some of these environmental species is highly processive degradative enzymes. In particular, the clc element found in Pseudomonas sp. strain B13 and B. fungorum encodes a number of genes that inactivate toxic chemicals, such as chlorocatechols (58). Organisms possessing these enzymes and related DNA elements have become widely distributed, presumably through selection, in environments that are heavily contaminated with toxic waste (7, 8, 58).
Conclusions. There is a very large and diverse population of GIs in bacteria (13), and most of these GIs do not exhibit detectable homology with the family of related GIs described here. However, the methods used here to identify and characterize one family of GIs may be used in principle to identify other families of coherent GIs among the many apparently unrelated GIs. It seems possible that there are examples of both mosaics without evidence of a coherent syntenic core GI structure and other GIs which have syntenic core genes and a common evolutionary origin. This will be increasingly possible to investigate with the improving bioinformatics techniques and the rapidly increasing sequence data available from complete bacterial genomes. Such a systematic classification of GIs is likely in turn to necessitate a commensurate naming system.
The H. ducreyi sequencing project was funded by NIH grant R01-AI45091. The H. somnus sequencing project was funded by USDA/CSREES grant 2001-52100-11314. D.W.C. was funded by Wellcome Trust Research Leave Fellowship 057366/Z/99/Z. Z.M.Z. was funded through a scholarship awarded by the Government of Malaysia.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»