Previous Article | Next Article ![]()
Journal of Bacteriology, January 2004, p. 518-534, Vol. 186, No. 2
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.2.518-534.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Klinische Forschergruppe, OE 6710, Medizinische Hochschule Hannover, D-30625 Hannover, Germany
Received 24 July 2003/ Accepted 19 October 2003
|
|
|---|
|
|
|---|
Our group studies genome diversity in the
-proteobacterium Pseudomonas aeruginosa. This ubiquitous and metabolically versatile microorganism (57) is characterized by a core genome with conserved synteny of genes and a low average nucleotide substitution rate of 0.5% (33, 35, 51, 66, 72). Only 2.5% of the coding sequences (CDS) exhibit significantly higher sequence diversity (66). Clone- or strain-specific genome islands and genome islets define the variable part of the chromosome, which results in variations of genome size between 5.2 and 7 Mbp (53, 62). Four genome islands have so far been sequenced (4, 40, 41). They all encode phenotypic traits that are absent in the completely sequenced reference strain, PAO1 (68). In the two cases analyzed in the major P. aeruginosa clone C (55), the genome island had been incorporated into tRNA genes (40). The tRNAGly-associated genome islands PAGI-2(C) and PAGI-3(SG) show a global structure similar to that of the 105-kb self-transmissible clc element of Pseudomonas putida, which is the only known genome island in the genus Pseudomonas that can be mobilized and laterally transferred to other strains, even across species and genus barriers (50, 63, 67). The site-specific integrative recombination between the clc element's attachment site (attP) and the chromosomal attachment site at the 3'end of the tRNAGly gene is accomplished by an integrase that is highly homologous to those encoded by PAGI-2(C) and PAGI-3(SG) (40, 50, 63).
PAGI-2(C) is located in a so-called hypervariable region close to the lipH locus. The other two hypervariable regions in the P. aeruginosa chromosome with pronounced genomic variability reside in the vicinity of the pilA and phnAB loci (33, 51, 53). tRNALys genes were identified as the hot spots for the integration and excision of DNA in these regions (36). The large plasmid pKLK106 sequentially recombined with either of the two tRNALys genes in P. aeruginosa clone K strains, giving rise to reversible rearrangements of a 106-kb genome island in sequential isolates. In clone C strains, the plasmid pKLC102 was reversibly incorporated into the tRNALys gene of the pilA region. Clone C isolates from the environment and most disease habitats harbored both the free plasmid and the chromosomally integrated pKLC102, whereas isolates from the lungs of patients with cystic fibrosis (CF) carried no episomal forms (53). Physical mapping revealed that one subgroup of clone C strains from CF lungs had captured additional DNA in pKLC102, which induced large chromosomal inversions in the progeny (39, 53).
The two related plasmids pKLK106 and pKLC102 are one of the very few cases known in which mobile DNA coexists as a free plasmid and a genome island in a bacterial cell. Hence, first we sequenced this connecting link between the plasmid and the island in order to resolve the features that allow this dual lifestyle and to get a clue to the impact of this extra DNA on the phenotype of the host. The clone C plasmid pKLC102 was selected for sequencing (Table 1) because clone C is a major clone of the present P. aeruginosa population in environmental and disease habitats, and hence, its genome organization has been studied in detail (53, 55, 62). Twenty-one clone C chromosomes have been mapped, two of which were chosen for the sequencing of the genome islands PAGI-2(C) and PAGI-3(SG) in the lipH hypervariable region (40). Second, the organization of the phn region and the makeup of pKLK106 and pKLC102 were compared in order to address the issue of why the clone K plasmid sequentially recombines with both tRNALys genes whereas the target site in the phn region is not accessible to the clone C plasmid. Third, the type of genetic element of the DNA inserted into the chromosomally integrated pKLC102 of subgroup C strains was identified by sequencing. All data were compiled to trace the evolution of the P. aeruginosa clone C chromosome. Annotation revealed that pKLC102 was assembled from a phage lineage and a plasmid lineage that endowed this hybrid with the uncommon flexibility to exist as a conjugative plasmid and a genome island. In other words, these peculiar features make pKLC102 a physically existing piece of evidence for the evolution of a genome island from mobile ancestors.
|
View this table: [in a new window] |
TABLE 1. Comparison of general features of sequenced gene islands and PAO1 genome
|
|
|
|---|
DNA techniques. DNA manipulations were done by standard procedures (5). A genomewide cosmid library was constructed according to the protocols of Wenzel and Herrmann (71) as described previously (40). Small-scale isolations of cosmid DNA were performed by using QIAprep spin miniprep kits (Qiagen); larger amounts of cosmid DNA were purified using QIAtip100 columns or QIAtip500 and the large-construct kit (Qiagen) according to the instructions of the supplier. The high-molecular-mass plasmids pKLK106 and pKLC102 were prepared on a large scale by modified alkaline lysis (5, 36).
Southern hybridization. For colony blots, cell suspensions were inoculated on Hybond N+ membranes (Amersham) by using a 96-needle replication device and were grown on 2YT-amp plates. Alternatively, colony lifts were performed directly from agar plates onto Hybond N+ membranes. The cells were lysed, and the DNA was fixed (71). Blotting of chromosomal or cosmid DNA digested with appropriate restriction enzymes to nylon membranes, the hybridization procedure, and immunological detection of probe signals were performed according to previously described protocols (52). For the screening of the library, probes were prepared from purified plasmid DNA, from the SpeI fragment SpAB-specific clone 2A (54), or from gel-purified restriction fragments of plasmids and cosmids by using a digoxigenin labeling kit (Roche) (52).
Construction of a pKLC102 tiling path in the strain C chromosome. The pKSCC cosmid library was screened with plasmid pKLC102 and clone 2A as probes. Thirty-seven probe-reactive cosmids were digested with BamHI or EcoRI plus HindIII and separated by agarose gel electrophoresis. Comparison of the gel-separated restriction fragment pattern with the restriction maps of pKLC102 and of clone C strains C and SG17 M in this chromosomal region identified the recombination point for chromosomal integration on the plasmid restriction fragment BmQ (53) and the integration of a further large 23-kb DNA segment on BmG. The cosmids were ordered by Southern hybridization of restricted pKSCC cosmids with BamHI fragments of pKLC102. The cosmids pKSCC785, -187, -050, and -867 represented the contig of minimal overlap and hence were selected for sequencing. The remaining large 2.6-kb gap between cosmids pKSCC187 and -050 (Fig. 1), reaching from fragments BmY1 to BmO (53), was closed by recombinant PCR using GoldStar DNA polymerase (Eurogentech).
![]() View larger version (48K): [in a new window] |
FIG. 1. (a) Restriction map of plasmid pKLC102 (inner circle, EcoRI; outer circle, PvuII). The recombination site for chromosomal integration and the position of the insertion of integron TNCP23 in strain C are indicated. The thick arcs represent the tiling path of cosmids and gap-spanning PCR products utilized for sequencing pKLC102 DNA in the strain C chromosome. The darkly shaded area is absent in pKLK106 (see panel b). (b) Comparative restriction analysis of pKLC102 and pKLK106. (I) Separated HindIII, EcoRI, and PvuII restriction digests of cosmids pKSCC785 (lanes 1), pKSCC187 (lanes 2), pKSCC050 (lanes 3), and pKSCC867 (lanes 4). PCR, gap-spanning PCR product (undigested); , BstEII digest of DNA used as a size standard. (II) Southern blot of gel I, hybridized with plasmid pKLK106. The letters in gel I indicate bands with no or lower-than-expected hybridization signals due to DNA that is not represented in the pKLK106 probe. P, PAO1 DNA flanking the inserted pKLC102 in strain C; V, vector DNA; T, integron TNCP23; C (circled), pKLC102-specific DNA absent in pKLK106.
|
Sequencing. The ends of cosmid inserts (500 to 800 bp) were determined by single reads of one strand using T3 or T7 primers. Inserts of cosmids pKSCC785, -187, -050, -867, and -260 were completely sequenced by random sequencing of small-insert plasmid libraries (1.0 to 2.5 kb). After assembly, the sequence gaps were closed by editing the ends of sequence traces and/or primer walking on plasmid clones, and physical gaps were closed by combinatorial PCR followed by sequencing of the PCR product. The final sequences had an accuracy of >99.99%.
Annotation. Putative open reading frames (ORFs) were identified by using a dictionary-driven gene-finding program (64; http://cbcsrv.watson.ibm.com/tgi.html) and by GeneMark and GeneMark.HMM programs (6, 44). Predicted ORFs were reviewed individually for the assignment of the start codon based on additional contextual information, such as the proximity of ribosome-binding sequence motifs and alignments with known proteins retrieved by BLAST search (2). tRNA genes were identified by the program tRNA-Scan SE (43). Public databases were searched for similar sequences with the BLASTN, BLASTX, and BLASTP/PSI- and PHI-BLAST algorithms. Sequence comparisons with the P. aeruginosa PAO1 genome (68) were retrieved from the website of the Pseudomonas Genome Project (http://www.pseudomonas.com). The sequences were scanned for palindromes, tandems, and signal sequences using programs available at http://bioweb.pasteur.fr/. The features of the predicted proteins were examined by the programs Pfam (http://www.sanger.ac.uk/Software/Pfam/search.shtml), Block Searcher (http://blocks.fhcrc.org/blocks/blocks_search.html), COGnitor (http://www.ncbi.nih.gov/COG/xognitor.html), "DAS" Transmembrane Prediction server (16), and SOSUI (http://sosui.proteome.bio.tuat.ac.jp/cgi-bin/sosui.cgi?/sosui_submit.html). Secondary DNA-RNA structure was analyzed by a Greedy algorithm with an energy threshold of -10 kcal using the programs GeneBee, available at http://www.genebee.msu.su/genebee.html (11), and Mfold, available at http://www.bioinfo.rpi.edu/applications/mfold/ (60). The program BioEdit version 5.0.9 (http://www.mbio.ncsu.edu/BioEdit/bioedit.html) was used for storing sequences in a database, pairwise comparison, alignment, and phylogenetic tree design. GC contents and GC skew were calculated with in-house programs. Restriction maps were constructed with the program Webcutter version 2.0, written by Max Heiman (available at http://www.firstmarket.com/cutter/cut2.html).
Nucleotide sequence accession numbers. The nucleotide sequences reported in this paper have been deposited in the GenBank database {accession numbers AY257538 [pKLC102], AY257539 [TNCP23], and AY258138 [PAGI-4(C)]}.
|
|
|---|
100 kb in size, integrate into the chromosome at the 3' ends of tRNALys genes (the att site), and exhibit virtually identical BamHI/SpeI restriction maps. Map differences were evident in only three regions. In order to differentiate pKLK106-homologous segments from nonhomologous sequence in the pKLC102 region of the clone C chromosome, a tiling path represented by the gel-separated HindIII-, EcoRI-, or PvuII-restricted cosmids pKSCC785, pKSCC187, pKSCC050, and pKSCC867 and a gap-spanning PCR product (see Material and Methods) was hybridized with pKLK106 (Fig. 1B). The comparison of the gel (Fig. 1B, left) and the blot (Fig. 1B, right) uncovered strong hybridization signals for almost all restriction fragments derived from the episomal plasmids, indicating that pKLC102 is composed of >97% sequence that is homologous with pKLK106. Restriction fragments (Fig. 1B, left) with no or weak hybridization signals (Fig. 1B, right) represent cosmid-vector, transposon TNCP23 (in pKSCC187 [see below]), PAO1 DNA (in pKSCC785 and pKSCC867), or apparently pKLC102-specific DNA that is absent in pKLK106 (Fig. 1b). All pKLC102-specific DNA was assigned within or adjacent to fragment PvP (Fig. 1a). PvP is the only part of pKLC102 in which the ORFs exhibit the highest number of BLAST hits with P. aeruginosa PAO1 sequence (see Table 3 and Fig. 3). The CDS CP84, CP85, and CP86 are homologous to PA2566, PA2565, and PA2564, respectively, and are flanked by two 239-bp direct repeats upstream of CP84 and downstream of CP86. Hence, this stretch of sequence has the characteristics of a "mobile cassette" that was probably incorporated into the plasmid after the divergence of pKLC102 and pKLK106 from a common ancestor. Besides CP84 to CP86, no further segments that did not hybridize with pKLK106 were detected in plasmid pKLC102. These data confirm the prediction that clone K and clone C strains harbor almost identical plasmids.
|
View this table: [in a new window] |
TABLE 3. Annotation of all ORFs located within pKLC102
|
![]() View larger version (56K): [in a new window] |
FIG. 3. Gene map of pKLC102. The map is calibrated to the chromosomal integration attP site, marked by a flag. The leading strand was defined by colinearity with the P. aeruginosa PAO1 genome sequence. Predicted coding regions are shown by arrows indicating the direction of transcription. The genes are color coded according to their functional categories, as shown in the legend below the map. All genes carry identification numbers according to the CDS numbering in Table 3. Homologs in other microorganisms retrieved by a BLASTP search and identified gene names are highlighted beneath the corresponding CDS. oriV is the predicted origin of replication. The putative CDS within the origin of replication is shown by a dotted arrow. The syntenic CDS CP73 to CP81 that were subjected to cladistic analysis (Fig. 5) are marked by bent arrows.
|
In strain PAO1, the tRNALys(1) gene is located between CDS PA0976 and PA0977 (68). The 8.9-kb DNA block 3' of tRNALys from PA0977 to PA0987 represents a nonconserved insertion that terminates with 22 duplicated base pairs of the 3' end of the tRNALys(1) gene, presumably the former attP site of the integrated element. This 8.9-kb block of PAO-specific DNA is absent in clone K strains harboring PA0988 as their first PAO homolog downstream of tRNALys(1) (36).
The sequence annotation of pKSCC260 revealed that in strain C, a large 23.4-kb gene island called PAGI-4(C) is integrated at this tRNALys(1) site (Table 2 and Fig. 2). PAGI-4(C) substitutes PA0977 for PA0994, and correspondingly the chaperone-usher cupC cluster (PA0992-PA0994) (70) is missing in strain C. PAGI-4(C) apparently consists of two blocks of non-PAO sequence, each flanked by short stretches of PAO-homologous sequence. The first 370 bp downstream of the tRNALys(1) gene show 92% identity with the PAO sequence. The CL1 gene is a truncated homolog of PA0977; a frameshift mutation gives rise to a stop codon 48 nucleotides prior to the 3' end of PA0977. Another stretch of 832 bp in the middle of PAGI-4(C) is 95% identical with the PAO1 sequence and contains the PA0980 homolog, CL11, and the initial 57% of the sequence of PA0981.
|
View this table: [in a new window] |
TABLE 2. Annotation of ORFs located within PAGI-4(C) in P. aeruginosa strain C
|
![]() View larger version (33K): [in a new window] |
FIG. 2. Map of tRNALys-phnAB regions of strain K, PAO1, and C chromosomes. The tRNALys sites are indicated by thick black bars. In clone K strains, the tRNALys site can be used for reversible integration of plasmid pKLK106 (green triangle) (36). PAO1 carries an additional block (light gray triangle) at this site, comprising CDS PA0977 to PA0987. Strain C carries the gene island PAGI-4(C) at this position. Base pair counting starts after tRNALys. Two small segments (dark gray) with ORFs PA0977 and PA0980 are homologous to PAO1 sequence; two larger areas (yellow and orange) are C specific. The blue arrows show PAO1 CDS and their counterparts in K and C; the yellow and orange arrows represent C-specific CDS in PAGI-4(C). The blue boxes represent truncated PAO1 CDS in strain C.
|
The other block of novel DNA between CL1 and CL11 consists of 9.5 kb of non-PAO-homologous sequence. CL2a is predicted to encode a XerC-like integrase (23) (Table 2). All CDS of the CL2a-CL10 block have homologs in plasmid pKLC102, with conserved synteny and 87 to 99% amino acid sequence identity (CP103a, CP102, and CP93-CP87) (Table 3). CL10, adjacent to PA0980, is homologous to CP87 in pKLC102. The CP87-CP86 sequence contig in pKLC102 contains the 239-bp direct repeat (see above), and we noted that the repeat is 90% conserved in the CL11-CL10 contig in PAGI-4(C) (nucleotide identity at 216 of 239 positions). Moreover, the first 68 bp of the repeat (88% sequence identity) occur once in the PAO1 chromosome close to PA0980, in the intergenic region between PA0981 and PA0982 (Fig. 2). Shared sequence is known to trigger incorporation of donor into recipient DNA (22), and correspondingly, the direct repeat could have been involved in the evolution of the present PAGI-4(C) from an ancestor.
PAGI-4(C) was probably generated by at least two independent recombination events at a transposition close to the tRNALys(1) recognition site. The 9.5-kb part adjacent to the tRNALys(1) gene is homologous not only with sequences of the chromosomal and episomal versions of pKLC102 in clone C but also with the tRNAGly-associated gene island PAGI-2(C) (Table 2). The >95% sequence identity of the 9.5-kb stretch of DNA with parts of pKLC102 suggests the following scenario. An ancestor C strain, like the present clone K strains, was reversibly harboring a pKLC102-like plasmid at this site. When the 239-bp direct repeat was captured by the plasmid, a short stretch of sequence matched with the intergenic sequence between PA0981 and PA0982 located just five genes downstream of the att site in the tRNALys gene (Fig. 2). A similar situation is encountered in the tRNAGly-associated gene islands PAGI-2(C) and PAGI-3(SG) of clone C strains (40), in which another stretch of the direct repeat (positions 158 to 177) is found close to the attB sequence at the end of the island. We consider this coincidence to be relevant, because no further hits of sequences matching the direct repeat were retrieved from the databases. Thus, we propose that additional matching sequence in the vicinity of the att integration signal at the 3' end of the tRNA gene could stabilize the maintenance of a genome island in the chromosome. However, in the case of the ancestor clone C strain, the acquisition of direct-repeat sequence may also have predisposed it to secondary changes, such as the truncation of the plasmid and the integration of the additional transposon. This proposal is substantiated by the fact that the clone K strains, which reversibly integrate pKLK106 at the tRNALys(1) site, do not harbor the repeat sequence in the chromosome (no PA0981-PA0982).
Sequence of pKLC102 at tRNALys(2), close to the pil region. The organization of predicted CDS within the large 103,532-bp plasmid pKLC102 is displayed in Fig. 3. The annotation (Table 3) revealed 105 CDS, in two of which a smaller CDS resided in a larger CDS on the opposite strand (CP62a and -b and CP103a and -b).
Plasmid replication and recombination genes. Of 105 identified CDS, 60 were classified as hypothetical or of unknown origin. Many of these hypothetical genes have DNA replication, recombination, and modification genes as neighbors (Fig. 3). Syntenic sets of homologous genes were identified in other plasmids and gene islands among gram-negative bacteria, including PAGI-2(C) and PAGI-3(SG) of P. aeruginosa clone C (40) (see Fig. 5). These genes may play a role in plasmid maintenance or horizontal gene transfer. At least 18 identified genes of pKLC102 are involved in plasmid conjugation, recombination, and repair, among them genes for two phage integrases (CP62a and CP103a), soj (encoding a chromosome-partitioning protein; CP1), genes for four helicases (CP9, CP30, CP56, and CP69), ssb (encoding a single-strand binding protein; CP22), the topoisomerase gene topA (CP27), and traG and traI (encoding conjugative proteins; CP67 and CP102).
![]() View larger version (33K): [in a new window] |
FIG. 5. Circular domain similarity plot. The inner and outer circles represent 50 and 100% similarities, respectively. Plasmid coordinates are shown along the outer circle.
|
The region between CP18 and CP19 was recognized as the possible origin of replication, oriV, of pKLC102 (Fig. 3). Sixteen highly conserved 57-bp direct repeats constitute the right part of oriV (Fig. 4). All repeats except the last terminate with the 19-bp palindrome 5'-GTGGTGCCACTGGCACCAC (complementary sequence underlined), similar to synchrons of the Pseudomonas fluorescens plasmid pL6.5 (AJ250853) (P. Herbelin, unpublished data). The highly conserved nonpalindromic part of the repeats (38 bp) may serve as replication protein binding sites; however, their sequence is not similar to those of iterons of experimentally characterized oriVs of plasmids (20). In the left part of oriV (Fig. 4), an A+T-rich region is preceded by four palindromes, GAGTTCGGATGCCGAACTC, with the first loop inverted with respect to the others. A similar organization of the oriV locus, albeit shorter at the right side with only four repeats, was found in the intergenic region between Psyr3998 and Psyr3999 in the Pseudomonas syringae pv. syringae B728a genome. The oriV locus of pKLC102 is flanked by genes that are typically found in the ori regions of plasmids, such as dnaB (CP9), ssb (CP22), and topA (CP27). The episomal pKLC102 is probably replicated by the strand displacement mechanism (20, 28), because (i) no turning point indicative of the terminus of replication was detected by GC skew and (ii) in silico analysis of secondary DNA structure by the energy-optimized Greedy algorithm (11) predicted thermodynamically stable hairpins at the ori locus, which is typical for this mode of replication.
|
View larger version (8K): [in a new window] |
FIG. 4. Structure of the origin of replication of pKLC102. Identical sequences are indicated by the sizes of the symbols. Adjacent solid and open boxes represent palindromes; the arrows indicate the sequences of 16 consecutive direct repeats. The A+T-rich region is indicated by a horizontal black bar.
|
A putative operon of 10 genes from CP33 to CP42 is similar in size, sequence, and gene arrangement to the pil operon of the Escherichia coli IncI plasmid R64 (73) and of the major pathogenicity island of Salmonella enterica serovar Typhi (75). In both cases, these pil operons encode type IV thin sex pili (42). The closest homolog of the pKLC102 pil operon was found to be a functionally uncharacterized operon in the P. syringae pv. syringae B728a genome, with the level of identity ranging from 29 to 47%. The CP39 gene product is homologous to the prepilin PilS, which is processed prior to assembly by pilU (CP40), which removes the N-terminal leader peptide. The adhesin at the pilus tip is encoded by pilV (CP41). In contrast to enterobacterial pil operons, in which the terminal pilV gene is followed by shufflon sequences (38), and the site-specific recombinase gene rci, the pil operons of pKLC102 and P. syringae terminate with pilM (CP42) and do not contain any recombination genes. The genetic organization of the pil operon in pKLC102 is appropriate for mating but lacks the option to evade the eukaryotic host immune response as it has evolved in enterobacteria. The transport of plasmid DNA through the sex pili requires coupling and pilot proteins (42). A putative FtsK coupling protein and the pilot protein (encoded by the conjugative relaxase gene traI) were identified as being encoded by CP81 and CP102, respectively. The activity of the FtsK proteins is controlled by a XerC integrase (1) represented by CP103a in pKLC102. Hence, the plasmid contains all of the genes that are necessary for conjugation. This pil operon is unrelated in sequence and genetic organization to the pil clusters of the P. aeruginosa chromosome that confer twitching motility and type II secretion (46), which corroborates the conclusion that pKLC102 encodes conjugative sex pili.
Besides chvB and the pil cluster, annotation provided no unequivocal clues about the additional extra metabolic features that pKLC102 confers on its host strain. Two genes (CP99 and CP100) encode novel fatty acid synthases. A putative chemotaxis operon (CP84 to CP86) and a cold adaptation protein (encoded by CP28) may provide further options for the response to environmental signals, and a polyketide synthase (encoded by CP57) and a protein with a VagC domain (encoded by CP26) are putative virulence-associated proteins. Moreover, an Arc repressor (encoded by CP16), a phage antirepressor (encoded by CP21) (17), and four putative transcription regulators (encoded by CP59, CP61, CP92, and CP96) were identified.
Origin, source, and horizontal gene transfer. According to sequence database comparisons, plasmid pKLC102 shares DNA with numerous proteobacteria, of which P. aeruginosa PAO1 contributed only a minor part (the gene cassette PA2566-PA2564 [see above]) (Fig. 5). The genetic repertoire of pKLC102 was predominantly assembled from two lineages. One part exhibits strong homology with gene islands in the P. syringae pv. syringae B728a and enterobacterial genomes. This DNA block includes oriV, the pil cluster, and conjugative elements, which points to the inheritance of these genes from a common ancestral plasmid (Fig. 5). The other major DNA block is homologous to several tRNA-integrated genome islands, of which 35 CDS distributed on six segments are similar to CDS in the clone C islands PAGI-2(C) and PAGI-3(SG) (40) and genome islands of other proteobacteria (Fig. 5). To explore the phylogenetic relationships in more detail, the longest conserved gene contig (CP73 to CP81) of the six segments was selected for cladistic comparison. pKLC102 of strain C was found to segregate with other tRNALys-associated gene islands found in Azotobacter vinelandii and P. fluorescens, whereas PAGI-2(C) of strain C was more closely related to other tRNAGly-associated gene islands of Burkholderia fungorum and Ralstonia metallidurans.
In summary, pKLC102 is composed of a mosaic of blocks of diverse origin. The orthologs and paralogs with the highest sequence similarities were typically identified in A. vinelandii, P. syringae, P. fluorescens, and Burkholderia spp., all of which are associated with plants, particularly with the rhizosphere. Hence, pKLC102 most likely evolved in plant-associated microbial communities.
Integrases. pKLC102 recombines within the 3' end of the tRNALys(2) structural gene in the chromosome. tRNA genes are typical integration sites for phages, but not for plasmids (12). Annotation revealed that integration and excision are probably mediated by the phage tyrosine integrase XerC (encoded by CP103a) (Fig. 3 and 6). CP103a shows 60 and 55% amino acid identity with the xerC genes Avin0928 and AAM77365 detected by BLAST in the A. vinelandii and P. syringae strain BR2R genomes, suggesting that these three XerC integrases have a common chromosomal target site. Tyrosine integrases are a family of site-specific recombinases found in bacteria, plasmids, and bacteriophages (1, 9, 15, 25, 32, 47, 56). The conserved C-terminal protein domains cleave and religate the DNA; thus, a covalent intermediate is formed between DNA and the tyrosine in the active site of the integrase (7, 37, 61). The nonconserved N-terminal domains possess high-affinity DNA binding sites and act as context-sensitive modulators of enzyme activity.
![]() View larger version (32K): [in a new window] |
FIG. 6. Inner opposite CDS of XerC integrases CL2ab, CP103ab, and Avin0928. The integrase genes and putative traI genes CL3, CP102, and Avin0927 located upstream of the integrases are shown by open arrows. The integration attachment sites downstream of the integrases are indicated by solid boxes. Identified inner ORFs (putative excisionases) are depicted by shaded arrows. The boxed sequences indicate the putative termination loops following the inner ORFs in pKLC102 and A. vinelandii.
|
The opposite activities of an integrase to catalyze both integration into and the excision from the chromosome are regulated by an excisionase (13, 23). In enterobacteria, the integrase and excisionase are encoded by adjacent int and xis genes that may partially overlap, as is the case for the E. coli phage
(7, 23, 37, 61). Hence, a complete overlap of the two genes is reasonable. Accordingly, the outer ORF, CP103a, and the inner ORF, CP103b, were annotated as int and xis; thus, the gene product of the latter, like its weak homolog Cox of phage P2 (58, 74), may function not only as an excisionase but also as a trancription regulator for proteins that mobilize the gene island.
The int locus should play a key role in the chromosomal incorporation and mobilization of pKLC102. In the case of the clc element (50, 63), which so far is the only experimentally characterized gene island in Pseudomonas, the presence of int was necessary and sufficient for integration into and mobilization from the chromosome. In order to execute these opposite activities through one locus, a complex genetic structure is instrumental in expressing just one activity at a time. The divergent transcription of the same sequence observed in CP103 and Avin0928 (Fig. 6) is a mechanism of genetic control to meet this requirement.
Integron TNCP23 within pKLC102 of subgroup C chromosomes. A large 23,061-bp class I transposon (Table 4) inserted into an AT-rich region of pKLC102. This transposon, called TNCP23, was found only in clone C chromosomes of subgroup C (39, 53). TNCP23 is flanked at both ends by the insertion sequence (IS) element IS6100 (65). TNCP23 integrated upstream of the pil operon at position 28,440 of pKLC102 (Fig. 1); thus, the last 8 nucleotides 5' of the breakpoint (positions 28,433 to 28,440) were duplicated so that the 17-mer inverted repeats at the termini of IS6100 are flanked on both sides by the direct repeat 5'-TTCCGAAC. Hence, the sequences spanning the integration point read 5'-TTCCGAACGGCTCTGTTGCAAAAAT at the right end and 5'-ATCTTTGCAACAGAGCCTTCCGAAC at the left end. Inspection of the adjacent plasmid sequence did not disclose any known recombination signals, such as direct or inverted repeats; however, the breakpoint is located approximately in the middle of a 2-kb region with a GC content (42%) significantly lower than the average GC content (60.9%) of pKLC102 (Table 1). The lower thermodynamic stability of base pairs in AT-rich regions may have facilitated the targeting of the transposon to this site.
|
View this table: [in a new window] |
TABLE 4. Annotation of all ORFs located within integron TNCP23
|
1, sulI, and orf5i (TNCP3) (Table 4); and an integrase gene, int1; the last, however, is truncated by 203 bp at the 5' end and therefore is probably nonfunctional. As in pKLC102, a divergently transcribed xis gene was identified within int1 (TNCP7a and -b). A gene cassette with an aadB gene (TNCP6) encoding an aminoglycoside-adenyltransferase for gentamicin and tobramycin is inserted into attI. Integrons of similar structure are known from the P. aeruginosa plasmid R1033 (accession no. U12338) and the Corynebacterium glutamicum plasmid pCG4 (48), but the deletion in int1 at the 5' end has so far not been reported. |
View larger version (15K): [in a new window] |
FIG. 7. Gene map of TNCP23. The map is calibrated to the site of integration into the chromosome of strain C. The leading strand was defined by colinearity with the P. aeruginosa PAO1 genome sequence. Predicted coding regions are shown by arrows indicating the direction of transcription. The genes are color coded according to their functional categories as shown in the legend below the map. All of the genes carry identification numbers according to the CDS numbering in Table 4, but the abbreviation Tn was used instead of TNCP due to space limitations. Gene names are highlighted beneath the corresponding CDS. oriV is the predicted origin of replication.
|
![]() View larger version (33K): [in a new window] |
FIG. 8. Evolution of P. aeruginosa strains linked to plasmid DNA. (a) Reversible integration of plasmid DNA into two possible sites of clone K strains. (b) Different forms of plasmid DNA in clone C strains. In subgroup SG17M, pKLC102 is found episomally and integrated into the genome at tRNALys(2). Strain C5 apparently lost the pKLC102 DNA, while strain C2 harbors only the integrated form. In subgroup C, the integron TNCP23 inserted into chromosomally integrated pKLC102. Free plasmid is not detectable in subgroup C strains, indicating that TNCP23 prevented mobilization. TNCP23 is flanked by copies of IS6100. Intramolecular transposition of the left copy of IS6100-L is coupled with an inversion of the chromosomal region between the transposed copy and IS6100-L in some strains of subgroup C. For these strains C8, C9, C10, and C19, the tRNALys(1) area is not shown.
|
Genome evolution in P. aeruginosa clone C. The related clones C and K are among the major clones of the present P. aeruginosa population (36, 55). The abundance of several hundred C and K isolates in our collection of >3,000 strains from clinical and environmental habitats made it possible to evaluate intraclonal genome diversity by physical mapping and sequencing and, as shown here, to trace the underlying genome rearrangements. The P. aeruginosa clones K and C are thus among the first examples for which bacterial genome evolution could be documented by analyzing related isolates retrieved from their natural habitats.
pKLK106 and pKLC102 are highly homologous plasmids. pKLK106 reversibly recombines with clone K chromosomes at one of the two tRNALys genes (Fig. 8). In all investigated clone K strains, both episomal and chromosomal copies were detected. During the propagation of single colonies on agar plates in vitro, progeny that had retargeted pKLK106 into the other tRNALys locus were regularly observed, indicating that pKLK106 is mobilized and reintegrated into the clone K chromosome at high frequency.
Plasmid pKLC102 could recombine with the tRNALys (2) gene only close to the pilA locus, because the other site was blocked by PAGI-4(C). The only extra DNA of pKLC102 that is absent in pKLK106 is a P. aeruginosa operon flanked by direct repeats which match with PAO chromosomal sequence in the vicinity of tRNALys(1), which is present in C but not in K chromosomes. Repeats and tRNALys(1) encompass a 9.5-kb block which is found again with conserved synteny and >90% sequence homology in pKLC102. We assume that the proximity of two targeting signals in cis initiated complex genome rearrangements which led to the irreversible incorporation of one small part of a pKLC102 ancestor next to the tRNALys(1) gene.
All investigated clone C isolates from aquatic habitats and the hospital environment harbored chromosomal and episomal copies of pKLC102. However, many isolates from CF lungs contain either no (C5) or only chromosomally integrated (C2) pKLC102 (Fig. 8). The latter scenario is typical for a genome island (29, 30, 31). Of the four subgroups of clone C (53), subgroup C is exclusively represented by CF lung isolates and differs from the other three groups by the insertion of the class I composite transposon TNCP23 into chromosomally integrated pKLC102, which may have been acquired because of the aadB gene conferring gentamicin resistance (Fig. 8). P. aeruginosa converges in CF lungs to a common phenotype characterized by the decreased production of membrane components, cellular appendages, and secreted factors (45, 69). This phenotypic signature was partially gained in subgroup C strains by TNCP23-mediated chromosome remodeling. Intramolecular transposition of the active IS6100 element of TNCP23 led to large chromosomal inversions, which disrupted genes that are typically inactivated during the adaptation of P. aeruginosa to the atypical habitat of CF lungs (Fig. 8). In parallel, the integrity of pKLC102 was destroyed. The two attachment sites were separated, so that the genetic content of pKLC102 was irreversibly fixed in the chromosome. In summary, Fig. 8 portrays the evolution of a plasmid from a mobile genetic element to an irreversibly fixed genome island that finally was disrupted and distributed among separate chromosomal regions. It should be noted that the increasing complexity of genome organization caused by insertion, transposition, and inversion was accompanied by mutation, deletion, and/or duplication of sequence close to the breakpoint.
Horizontally acquired elements, such as prophages, plasmids, and genome islands, have been detected in numerous completely sequenced bacterial genomes (3, 19, 24, 29-31) based on sequence homology, phylogenetic profiling, the presence of diagnostic genes (for example, int, tnp, ori, and tra), and/or global criteria, such as atypical GC content, codon usage, or oligonucleotide frequency bias. However, with the exception of the spread of resistance determinants, most in silico findings are not backed up by knowledge about the original donors and recipients and the underlying mode of transmission. pKLC102 is one of the rare examples for which the causative action on genome evolution can be demonstrated. pKLC102 coexists in the episomal and chromosomal states and recombines with and mobilizes from the chromosome at high frequency, even in the absence of any apparent stress stimuli. Annotation and phylogenetic analyses point to the possible origin of this double role of plasmid and genome island. The closest homologs of pKLC102 are plasmids and phage-type genome islands (Fig. 5). The plasmid lineage conferred genes for replication, partitioning, and conjugation, and the phage lineage conferred integrase, att, and the syntenic set of conserved hypothetical genes also observed in the tRNAGly-associated gene islands on clone C chromosomes (40). Interestingly, the closest neighbors of the phage lineage inhabit the rhizosphere, while the closest neighbors of the plasmid lineage colonize the phyllosphere (Fig. 5). Hence, pKLC102 probably emerged in a plant habitat from a phage lineage and a plasmid lineage that endowed this hybrid with the uncommon flexibility to exist as a conjugative plasmid and as a genome island.
Genome islands adapt over time to the taxospecies-specific signature of the core genome (29, 30, 31). pKLC102 escaped this adaptation. Its tetranucleotide frequency bias defines a lineage that is separate from those of the completely sequenced P. aeruginosa, P. putida, and P. syringae genomes (data not shown). Moreover, the genetic repertoire of pKLC102 includes mainly genes for its own maintenance and propagation. Even the putative virulence gene chvB may primarily facilitate the spread of the plasmid; its impact on the pathogenicity and fitness of the host bacterium may be just an implicit secondary effect. In conclusion, pKLC102 exhibits typical features of a selfish genetic element, and this is probably a major reason why it coexists in most isolates from environmental and disease habitats as both a plasmid and a genome island.
This work was supported by the Deutsche Forschungsgemeinschaft (DFG) (Tu40-5, Schwerpunktprogramm "Ökologie bakterieller Krankheitserreger-molekulare und evolutionäre Aspekte"). J.K. and O.R. are members of the DFG-sponsored Europäisches Graduiertenkolleg "Pseudomonas: Pathogenicity and Biotechnology."
|
|
|---|
. Proc. Natl. Acad. Sci. USA 79:5837-5841.
13 and
42. Mol. Microbiol. 16:877-893.[CrossRef][Medline]
integrase is a context-sensitive modulator of recombinase functions. EMBO J. 20:1203-1212.[CrossRef][Medline]
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2010 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»