Previous Article | Next Article ![]()
Journal of Bacteriology, November 2004, p. 7783-7795, Vol. 186, No. 22
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.22.7783-7795.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
,
Genome Sciences Centre,1 Department of Microbiology and Immunology, University of British Columbia, Vancouver,4 Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada,2 Department of Bioengineering, Nagaoka University of Technology, Nagaoka, Niigata, Japan3
Received 31 December 2003/ Accepted 5 June 2004
|
|
|---|
|
|
|---|
Rhodococcus sp. strain RHA1 is characterized by its exceptional ability to transform polychlorinated biphenyls (PCBs) (53), a particularly widespread and persistent class of environmental pollutants. It is generally thought that in aerobic bacteria, PCBs are cometabolized by the bph pathway, which is responsible for the aerobic degradation of biphenyl (23). The upper bph pathway consists of four enzymatic activities that together transform biphenyl to benzoate and 2-hydroxypenta-2,4-dienoate. For each of these four steps, RHA1 appears to possess multiple isozymes, which may help explain the strain's superior PCB-transforming capabilities. Thus, the strain contains at least three bph-type ring-hydroxylating dioxygenases (33) and at least seven different bph-type ring cleavage enzymes (51). It is unclear which of these isozymes is involved in the catabolism of biphenyl or closely related compounds and how these different activities are regulated.
The genome of Rhodococcus sp. strain RHA1 is organized into a chromosome of unknown topology and three large linear plasmids: pRHL1 (1,100 kb), pRHL2 (450 kb), and pRHL3 (330 kb). Most of the genes of the upper biphenyl catabolic pathway are located on the two largest linear plasmids (56). However, genes encoding related isozymes are distributed throughout the genome, as are the genes involved in the degradation of benzoate and 2-hydroxypenta-2,4-dienoate. Analysis of the telomeres of pRHL2 revealed the presence of terminal inverted repeats with covalently associated proteins (56). This structure is characteristic of invertrons, a class of linear elements found in a variety of bacteria, bacteriophages, and viruses (50). A second class of linear elements, found thus far in Borrelia spp. and prophage, has covalently closed hairpin loops at the termini. A probe derived from the right end of pRHL2 cross-hybridized to the pRHL1 and pRHL3 termini, suggesting that these plasmids may also be invertrons (56).
Actinomycete invertrons include plasmids and chromosomes. Although the latter have only been definitively reported to occur in streptomycetes (63), linear plasmids have been characterized in most genera of actinomycetes, including rhodococci (60), streptomycetes (28, 59, 65), planobisporetes (47), and mycobacteria (38, 50). It has been proposed that linear plasmids evolved from bacteriophages (27) and that linear chromosomes arose from the recombination of linear plasmids with circular chromosomes (12). The cores of large linear plasmids replicate bidirectionally from a unique internal origin, similar to replication of circular plasmids (9), and some linear replicons can replicate in circular forms when their telomeres are deleted (55). Regardless of their precise origins, it is clear that actinomycete invertrons are dynamic genetic elements. For example, plasmids can exchange ends with the host chromosome, mobilizing large regions of the chromosomal ends (44), and large regions of linear chromosomes can be duplicated.
As part of an effort to characterize metabolism and its genetic regulation in Rhodococcus sp. strain RHA1, we are determining the sequence of this organism's genome. Within the context of this project, we report here the complete sequence of pRHL3, the smallest of the strain's three plasmids. Sequence analysis of pRHL3 revealed the presence of several interesting plasmid-borne genes, including clusters of catabolic genes, and enabled the identification of regions that may have been acquired by horizontal transfer.
|
|
|---|
Strains, media, and growth. Rhodococcus sp. strain RHA1 was grown at 30°C on Luria-Bertani (LB) broth or W medium supplemented with an appropriate carbon source (53). Liquid cultures of 25 ml were incubated in 125-ml Erlenmeyer flasks shaken at 200 rpm. Limonene, carvone, and carveol were provided in the vapor form to cultures on W medium. For solid medium, 5 µl of the specified compound was placed in a sterile Eppendorf tube placed in a 50-ml tube (Sardstedt) attached to the lid of a petri plate. Several holes in the lid of the petri plate permitted vapors to pass from the tube into the petri plate. The petri plates were sealed with parafilm and incubated lid down. For liquid medium, an Eppendorf tube containing substrate was suspended in the headspace of a flask. Plasmid and fosmid libraries were propagated in Escherichia coli strains DH10V and EPI10, respectively. Genomic libraries were plated on 2xYT supplemented with appropriate antibiotics and, in the case of plasmid libraries, X-Gal (5-bromo-4-chloro-3-indolyl-ß-D-galactopyranoside) and IPTG (isopropyl-ß-D-thiogalactopyranoside). For cloning the telomeres and the origin of replication, E. coli JM109 was used for DNA propagation and was cultured on LB medium. All E. coli strains were grown at 37°C on medium containing the appropriate antibiotics.
Preparation of RHA1 DNA. Linear plasmid DNA was prepared, detected, digested with restriction enzymes, and subjected to pulsed-field gel electrophoresis (PFGE) and Southern hybridization analysis as described previously (56). Genomic DNA was prepared for library construction essentially as described by Marmur (39).
Cloning of the telomeres. Plasmid DNA was extracted by electroelution from pulsed field gels, digested with PstI, and ligated into pBluescript II SK(+) that had been linearized with PstI and EcoRV. The ligation mixture was transformed into E. coli JM109, and transformants were selected on LB agar plates containing 50 mg of ampicillin/liter, 2 mM IPTG, and 0.04% X-Gal. Plasmids pTPE1R and pTPE1L, containing 1.2- and 5.0-kb inserts, respectively, were recovered from the transformants and corresponded to the right and left telomeres of pRHL1. Plasmids pTPE3R and pTPE3L, containing 1.8- and 3.5-kb inserts, respectively, were recovered and corresponded to the right and left telomeres of pRHL3. The telomeres were subcloned into pUC18 and pUC19 and were sequenced by using the dideoxy termination method (52) and a CEQ2000XL sequencer (Beckman Coulter, Inc., Fullerton, Calif.).
Cloning of the replication region.
RHA1 genomic DNA was partially digested with MboI, and the resultant fragments were separated by agarose gel electrophoresis. Fragments of 9 to 23 kb were extracted and ligated into BamHI-digested Charomid 9-28::tsr, constructed by inserting the thiostrepton resistance gene (tsr) into the SmaI site of Charomid 9-28 (32). The resulting DNA was introduced into E. coli DH5
via in vitro packaging. Plasmid DNA was isolated from the transformants and transformed into Rhodococcus sp. strain RHA1 by electrotransformation (66). RHA1 transformants were selected on LB agar plates containing 10 µg of thiostrepton/ml. The plasmid DNA was recovered from each transformant and subjected to Southern hybridization analysis with a tsr-derived probe to detect plasmids after separation of DNA by agarose gel electrophoresis.
Fragments of the replication region were generated by digestion with restriction enzymes and were subcloned into pIJ702 (32) or pBSSK::tsr, two thiostrepton resistance vectors that are unable to replicate in RHA1. Constructs were transformed into RHA1 and selected on LB agar plates containing thiostrepton as described above. In determining the incompatibility of replication region-containing plasmids with pRHL3, the plasmid content was examined by PFGE in at least 10 independently selected transformants. For these analyses, transformants were grown in 10 ml of threefold-diluted LB medium at 30°C.
Genomic libraries. Plasmid libraries were constructed by using one of two methods. In one method, Rhodococcus sp. strain RHA1 genomic DNA was manually sheared by using a syringe and a 25-gauge needle. Fragments of 2 to 3 kb were double gel purified, end repaired with T4 DNA polymerase plus Klenow fragment, and phosphorylated by using the T4 polynucleotide kinase. Blunt-end fragments were cloned into HincII-linearized, dephosphorylated pUC19. Colonies were analyzed for insertions by PCR (M13F-21/M13R) and restriction double digests (HindIII/XbaI).
In the second method, RHA1 genomic DNA was sheared by sonication and end repaired by limited Bal31 nuclease digestion. End-repaired DNA was run on a 1% low-melting-point agarose gel, and 2- to 3-kb fragments were excised and recovered by ß-agarase digestion, phenol extraction, and ethanol precipitation. Size-selected fragments were ligated by using an
1,000-fold excess of BstXI adapters (Invitrogen). Excess adapter was removed by three rounds of agarose gel purification. Purified, BstXI-adapted fragments were inserted into BstXI-linearized pBR194c plasmid.
A fosmid library containing 40-kb inserts of manually sheared genomic DNA was constructed by using an EpiFOS fosmid library production kit (catalog no. FOS0901; Epicentre) according to the manufacturer's instructions.
Fingerprint map. Fingerprints were generated by digesting RHA1 fosmid clones with BamHI and separating the resulting fragments on 1.2% agarose gels (40, 52a). Gel images were processed and captured with IMAGE (http://www.sanger.ac.uk/Software/Image), and the fragments were called using BANDLEADER software (22). A total of 4,973 fingerprints were obtained from 4992 fosmid clones that were analyzed. Fingerprints were automatically assembled into contigs by using FPC (42, 57, 58; see also http://www.genome.clemson.edu/fpc/) based on the restriction fragment overlap determined by the probability of coincidence score. A probability of 1e10 and the default parameters were used for this map, yielding 417 contigs and 1,260 singletons. After the automated fingerprint binning, each contig was manually edited by using FPC tools. This involved refining order and overlaps based on the fingerprint similarities. Each contig was then extended in both directions from comparing the fingerprints of each contig with all other fingerprints within the FPC database at a less stringent cutoff and permitting the joins that did not contradict the high-stringency data. Some singletons were also used to bridge contigs. The final physical map of the RHA1 genome contained 25 contigs and 366 singletons.
Sequencing. As part of the project to sequence the genome of RHA1, a total of 76,416 plasmid clones and 9,984 fosmid clones were grown on 2xYT agar containing 100 µg of ampicillin/ml with IPTG and X-Gal (pUC19 vector) or 25 µg of chloramphenicol/ml (pEpiFos5 vector). Sequencing was accomplished by using a combination of universal primers that include M13-Reverse, M13-40-Forward, pEpiFos5-Forward, and pEpiFos5-Reverse. The sequence reactions were performed in a total reaction volume of 5 µl containing 0.54 µl of Applied Biosystems BigDye v.3.1 cycle sequencing reaction mix (Applied Biosystems) and 3 µl of alkaline lysis purified plasmid DNA.
Sequence gaps caused by hard stops were closed by using an alternate 10-µl total volume sequencing reaction mix. The chemistry contained 2 µl of dGTP reaction mix (Applied Biosystems), 5% dimethyl sulfoxide, and 2 µl of alkaline lysis purified plasmid DNA. Reactions were primed by custom oligonucleotides designed to anneal to sequence flanking each hard stop region.
ABI Prism 3100, 3700, and 3730XL DNA analyzer sequencing instruments were used. Base calls of the trace data were performed by the program PHRED (19, 20) with default parameters, and the sequence was trimmed for quality and vector.
Sequence assembly and finishing. Sequence reads were concurrently assembled by using Arachne 1.0 (4) and Phrap (25), and the assembly progress was monitored by a sequence assembly manager (R. Warren et al., unpublished results). In the latest stage of genome assembly, reads were binned into supercontigs (a higher arrangements of contigs based on read pairs information) by using Arachne and reassembled with Phrap to allow low-quality read bases to be included in the assembly (R. S. Fulton, unpublished data). The clone tiling path deduced from the RHA1 fingerprint map was used to align and orient supercontigs into ultracontigs based on the exact position on the fosmid clone end reads in our sequence assembly. This information, along with self-sequence alignment of the supercontig bins of Phrap contigs were used to rebin the reads for every RHA1 genetic element (chromosome and three plasmids). Consed (24) and Autofinish (25) were used to select primers and clones to finish low-quality regions, telomeres, and gaps in the pRHL3 sequence. Consed was also used to inspect sequence quality and integrity, as well as to edit the final assembly.
Annotation. Putative genes were identified and annotated by using integrated automated and manual approaches. In the automated step, open reading frames (ORFs) were independently predicted by Glimmer2.10 (14) trained on a set of 500 known rhodococcal genes and by GeneMark-prokaryote (8) trained by using the supplied Mycobacterium tuberculosis model. RBSfinder (The Institute for Genomic Research), a ribosome-binding site prediction program, was used to help determine the start codons of our gene set.
Hand curation was facilitated by using Acedb (17) and an in-house interface to our pRHL3 annotation MySQL database (http://www.mysql.com). Each ORF predicted by the Hidden Markov Model (HMM) was inspected in the context of the plasmid sequence. ORF function and position were confirmed with BLASTP alignments (1) and BLASTX alignments of pRHL3 sequences to nr-SPtrEMBL, NCBI-nr, and Rhodococcus sp. strain I24 (www.integratedgenomics.com) protein databases. Interproscan (2) was used with the PROSITE, PRINTS, Pfam, SMART, TIGRFAMs, PIR SuperFamily, SUPERFAMILY, and ProDom databases to search for conserved domains and motifs and to validate predicted gene function. Finally, BLASTX alignments were used to identify genes that were not predicted by the gene finders used in the present study.
Sequence analyses (alignments, phylogenetic analyses, and genomic islands). Sequences were aligned by using CLUSTAL W (61) with all parameters set to their default values. For phylogenetic analyses, CLUSTAL W alignments were used as input for the algorithm of the PHYLIP (version 3.6) package (21). Phylogenetic analyses were performed on 24 sequences: six RHA1 telomere sequences corresponding to the three RHA1 plasmids. The first 800 nucleotides of each were used for the analysis due to high base conservation between pRHL1 and pRHL3. The SEQBOOT program of the PHYLIP package was used to generate 100 data sets that were used in conjunction with the DNAPARS (DNA parsimony) program of PHYLIP, forcing 10 permutations per data set. The best tree was obtained by using CONSENSE (PHYLIP) and plotted by using TREEVIEW (43).
Clusters of genes that were potentially acquired through horizontal transfer (genomic islands) were identified by using IslandPath (29). The G+C content variation and dinucleotide bias were calculated with reference to the plasmid average (as opposed to the genome average). Dinucleotide bias was calculated by using both the "ORF clusters" method previously reported for IslandPath and the "whole plasmid sequence" method, with a sliding window size of 3 kb shifted every 0.5 kb. Putative insertion sequence (IS) elements were identified by BLASTN search against the IS Finder database (http://www-is.biotoul.fr/). Repeats were detected by using Reputer's REPFIND program (35) and MUMmers v.3.10 (36). Large repeats were investigated for possible gene duplication by using NCBI-bl2seq and MIROPEATS (45). To identify putative integrons (49), BLASTP and regular expressions were used to search for integron-associated integrases (IntI) and the core attachment site (attI) consensus sequence, respectively.
|
|
|---|
![]() View larger version (56K): [in a new window] |
FIG. 1. Analysis of RHA1 telomere fragments. RHA1 cells were lysed in an agarose plug with (+) or without () proteinase K treatment. Agarose plugs containing RHA1 DNA were subjected to PFGE directly (lanes 1, 2, 5, and 6) or after PstI digestion (lanes 3, 4, 7, and 8). Electrophoresis was conducted for 6 h with a voltage of 6 V/cm and a pulse time that was increased from 2 to 10 s as the electrophoresis progressed. Lanes M to 4 and 9 were stained with ethidium bromide. Lanes 5 to 10 represent Southern blots with a probe derived from the right telomere of pRHL3. The experiment shown in lanes 9 and 10 was performed by using conditions of higher stringency. Lane M, 1-kb plus DNA ladder size marker (Invitrogen, Carlsbad, Calif.). The position of intact linear plasmid DNA containing pRHL3 is indicated on the left. The estimated sizes of the fragments detected by hybridization are indicated on the right.
|
![]() View larger version (38K): [in a new window] |
FIG. 2. Sequence analysis of actinomycete invertron telomeres. (A) Alignment of rhodococcal invertron telomere nucleotide sequences. The nucleotide sequences are derived from each of the three RHA1 invertrons (except for pRHL2-L), as well as pHG201 of R. opacus MR11, pHG204 of R. opacus MR22 (31), pBD2 of R. erythropolis (60), and pHG207 of R. sp. strain MR2253 (30). Strictly conserved nucleotides are indicated with hodococcus asterisks. The two sets of inverted repeats are indicated with arrows. The GCTXCGC central motif is boxed. (B) Radial view of best maximum-parsimony tree obtained by PHYLIP analyses of actinomycete telomeres. The first 800 nucleotides of each telomere were aligned. Sequences were taken from each of the plasmids in 2a, as well as the following invertrons: S. clavuligerus pSCL1 (65), S. coelicolor A3 SCP1, S. violaceoruber pSV2 (59), S. rochei 7434AN4 pSLA2-L (28), Planobispora rosea pPR1 and pPR2 (47), and M. celatum pCLP (46).
|
Phylogenetic analyses of the TIRs of actinomycete invertrons reveal the presence of at least four distinct groups of telomeres (Fig. 2B). The group formed by the telomeres of pSV2 and pSCL1 have a single set of the inverted repeats with the GCTXCGC motif found in the pRHL1 and pRHL3 telomeres. Interestingly, the telomeres do not group according to species or plasmid. Thus, the high divergence between the left and right ends of pRHL2 and pBD2, respectively, clearly indicate that functional invertrons do not require perfectly matching TIRs.
In streptomycetes, linear plasmids can exchange ends with the host chromosome, mobilizing large regions of the chromosomal ends (44). It seems equally likely that linear plasmids could also exchange ends with each other. This would facilitate recombination and exchange and may partly explain the apparent duplications that occur in pRHL3 (see below) and in the other RHA1 plasmids, as exemplified by the duplications of aromatic hydroxylation dioxygenase genes (30; W. Kitagawa, unpublished results). In the current assembly of the RHA1 genome, several regions of pRHL3 had identity to regions of the chromosome or one of the other two plasmids. Regions of 100% sequence identity were as long as 1.5 kb, and most of these included at least part of a gene putatively involved in recombination. A more complete analysis awaits completion of the genome sequence.
Replication machinery. When an RHA1 DNA library constructed by using Charomid 9-28::tsr (which is unable to replicate in rhodococci) was introduced into RHA1, 17 transformants were obtained. Two of these transformants yielded plasmid DNA which, when analyzed by Southern hybridization, carried the tsr gene. The other transformants may have originated from the integration of the tsr sequence into the RHA1 genome. These plasmids contained the same 18-kb insert and restriction fragments, and one of them was designated pCHB79. Southern hybridization analysis after PFGE revealed that the 4-kb HindIII fragment of pCHB79 hybridized specifically to pRHL3. All of the RHA1 transformants containing pCHB79 lost pRHL3, suggesting incompatibility of pCHB79 with pRHL3 (Fig. 3) and that pCBH79 contains the replication origin of pRHL3.
![]() View larger version (64K): [in a new window] |
FIG. 3. Analysis of RHA1 transformants containing origin of replication of pRHL3. Transformants were analyzed by PFGE (A) and Southern hybridization with a probe derived from the tsr gene (B). Lanes were loaded with the following: M, a chromosome size marker derived from Saccharomyces cerevisiae; 1, wild-type RHA1; and 2 to 4, independent transformants of RHA1 containing pCHB79. The positions of the RHA1 chromosome and each plasmid are indicated on the left. An arrow on the right indicates the deduced position of pCHB79, which corresponds to the origin of electrophoresis.
|
![]() View larger version (22K): [in a new window] |
FIG. 4. Replication origin region of pRHL3. (A) Subcloning of the origin of replication. Subclone plasmids are labeled, and their inserts are represented by horizontal thick bars. The results of transformation experiments are presented on the right: +, clones that yielded transformants of RHA1; , clones that yielded no transformants. The six-digit numbers indicate the nucleotide positions of restriction sites on pRHL3. (B) Annotated ORFs in the replication origin region. ORFs are numbered according to Fig. 5 and are represented by horizontal arrows; rep1 is presented by a shaded arrow. A closed vertical arrowhead and an open box indicate the respective locations of the direct repeats and AT-rich region described in the text.
|
|
View this table: [in a new window] |
TABLE 1. Selected ORFs predicted on pRHL3 of Rhodococcus sp. strain RHA1
|
The plasmid was predicted to contain 300 genes (Table 1 and Fig. 5), including three possible pseudogenes, each of which contained a single frameshift (RHL3.16, RHL3.59, and RHL3.141). Two of these pseudogenes encode putative transposases. The coding region covered 79% of the plasmid. This is lower than what has been reported for other actinomycete invertrons, whose coding regions cover >85% (7, 60). There is a slight bias for genes on the lower strand at 60.4%. Interestingly, the largest gene clusters are arranged in operon-like structures and are all located on the lower strand.
![]() View larger version (55K): [in a new window] |
FIG. 5. Physical map and G+C content of pRHL3. The G+C content is depicted in a histogram in which each vertical bar indicates the G+C composition calculated over a 100-bp interval by using a sliding window of 10 bp. The bottom bar depicts predicted ORFs grouped into eight functional categories (a color code is provided in the lower right corner of the figure). The orientation of each ORF is indicated by an arrowhead. The symbols between the upper and lower bars indicate the respective positions of genomic islands (GI1 to -4; fuchsia-colored bars), IS elements (light blue bar), TIRs (dark blue arrows), and possible duplications (light blue and green arrows).
|
|
View this table: [in a new window] |
TABLE 2. Functional classification of pRHL3 ORFs
|
The telomeres have a distinctive G+C content: the first 100 nucleotides have a very high G+C content (79 to 80%), followed by a considerable decrease in G+C content in the region between positions 300 and 400 bp (36 and 39% G+C for pRHL3-L and pRHL3-R, respectively). A similar pattern is evident for the related telomeres shown in Fig. 2A.
Catabolic gene clusters. Many of the predicted catabolic genes of pRHL3 appear to be arranged in one of three clusters. Genes within these clusters have the structural characteristic of functional operons, including overlapping stop and start codons, unidirectional transcription, and nearby genes encoding transcriptional regulators. Each of the three catabolic clusters contains up to three genes that encode enzymes with high similarity to 6-phosphogluconate dehydrogenase (gnd gene), glucose 6-phosphate dehydrogenase (G6PDH), and glucose 6-phosphate isomerase (G6PI), respectively. These correspond to RHL3.55, RHL3.56, and RHL3.57 in the first cluster and to RHL.295, RHL3.299, and RHL3.294 in the third cluster. The second catabolic cluster only contains genes encoding G6PDH and G6PI: RHL3.150 and RHL3.151. The three putative isomerases share ca. 66% amino acid sequence identity. Similarly, the two putative 6-phosphogluconate dehydrogenases, encoded by RHL3.55 and RHL3.295, share 67% amino acid sequence identity. In contrast, the putative G6PDHs are not all identical: RHL3.56 and RHL3.150 appear to encode F420-dependent G6PDHs and share 86% amino acid sequence identity, whereas RHL3.299 is similar to NADP-dependent G6PDHs. These findings suggest that at least some of the genes utilized by RHA1 to metabolize glucose may originate from pRHL3.
The first catabolic cluster (RHL3.20 to RHL3.63) spans a region of 54 kb and includes 13 dehydrogenase genes, all of whose products share >23% sequence identity. The substrates of most of these enzymes have yet to be identified. However, RHL3.41 and RHL3.42 of this region encode enzymes that share high sequence identity with carveol dehydrogenase and limonene monooxygenase, respectively, from R. erythropolis DCL14 (62). Rhodococcus sp. strain RHA1 grew on carveol or limonene as the sole organic substrate.
The second catabolic cluster, spanning kilobase positions 158 to 190 of the plasmid, is characterized by the presence of genes encoding metal and permease transporters, inner membrane translocators, and members of the ABC transport system for glucose. The transporter genes are located just downstream of the genes encoding the possible G6PDH (RHL3.150) and G6PI (RHL3.151). Located 10 kb downstream of the second catabolic gene cluster is a 40-kb region containing at least four genes predicted to be involved in heavy metal transport (metal-associated proteins and ATPases), as well as membrane proteins, permeases, and members of the ABC transport system.
The last 30 kb of pRHL3 harbors the third and shortest catabolic gene cluster. This region appears to contain at least two operons. The first of these contains genes predicted to code for a dioxygenase (RHL3.280), a flavoreductase (RHL3.279), an intradiol dioxygenase (RHL3.277), and an iron-containing dehydrogenase (RHL3.276) that appear to constitute an operon involved in the degradation of an aromatic compound. The RHL3.277-encoded protein shares 43% sequence identity with an intradiol dioxygenase from Agrobacterium tumefaciens C58. The protein encoded by RHL3.280 has some similarity to poorly characterized indole dioxygenases from R. opacus and Streptomyces avermitilis. It does not appear to be either a flavin-type oxygenase or a ring-hydroxylating dioxygenase, since the appropriate sequence motifs were not found. This putative operon appears to be regulated by an AraC-type transcriptional regulator (RHL3.281), whose best hit (30% sequence identity) is ThcR from R. erythropolis (41). The second operon contains a cytochrome P450 gene (RHL3.287), as described in the next section.
Cytochrome P450s.
Genes encoding three cytochrome P450s (CYPs) were found on pRHL3 based on sequence alignments to known CYPs and the presence of a cysteine-containing heme-binding motif (Table 3). Of the three putative P450s, only that encoded by RHL3.287 may be assigned to an existing CYP family based on sequence identity, and this one appears to belong to a new class. RHL3.287 lies in the middle of the third cluster of catabolic genes on pRHL3,
5 kb downstream of a putative operon that appears to specify the catabolism of an aromatic compound. RHL3.287 encodes a family 116 CYP, showing highest identities to the N-terminal portion of P450RhF from Rhodococcus sp. strain NCIMB9784 (47%) and a thcB-encoded P450 from Rhodococcus sp. strain NI86/21 (48%). The latter is involved in the degradation of two herbicides: EPTC (S-ethyl dipropylthiocarbamate) and atrazine (41). The substrate of P450RhF is unknown. However, P450RhF has an intriguing multidomain structure: a C-terminal domain is similar to the reductase of some dioxygenases, harboring a 2Fe-2S cluster and an FMN (48). Accordingly, P450RhF was proposed to belong to a newly identified class of CYPs (class IV). The RHL3.287-encoded P450 is not a multidomain protein: it corresponds to the N-terminal heme-binding domain of P450RhF. However, RHL3.286, which apparently forms an operon with RHL3.287, encodes a protein whose sequence is 50% identical to the C-terminal reductase domain of P450RhF and is also predicted to harbor a 2Fe-2S cluster and an FMN. Thus, RHL3.286 and RHL3.287 potentially encode a class V system, predicted by Roberts et al. (48).
|
View this table: [in a new window] |
TABLE 3. CYP genes of pRHL3
|
The other two P450s, encoded by RHL3.62 and RHL3.246, cannot be assigned to existing families. The sequence of the RHL3.62-encoded P450 is most similar to family 125 and 225 P450s. The genes coding for the putative cognate ferredoxin (RHL3.61) and reductase (RHL3.60) are located immediately downstream of RHL3.62: the three genes appear to be arranged in a transcriptional unit. The physiological role of this system is unclear, and no substrate of a CYP125 or CYP225 has been identified to date. However, the function of RHL3.62 may be linked to the catabolic genes with which it clusters on pRHL3. The sequence of the RHL3.246-encoded P450 is most similar to family 107 CYPs. Many family 107 CYPs have been linked to macrolide biosynthesis. Interestingly, the genes surrounding RHL3.246 encode proteins of no known function, and genes encoding a ferredoxin or a reductase do not appear to be in its vicinity.
The relatively high number of CYP-encoding genes on pRHL3 reflects the total number of CYP genes in the RHA1 genome, currently estimated to be 12. More generally, the number of CYP genes in RHA1 seems to be a hallmark of actinomycete biology. For example, the genome sequences of Streptomyces coelicolor A3 (2), S. avermitilis, and Mycobacterium tuberculosis have 18, 33, and 20 CYP genes, respectively (7, 13, 37). Many of the streptomycete P450s are predicted to be involved in secondary metabolite biosynthesis. In the rhodococci, catabolic function may dominate.
Horizontal gene transfer and recombination. Clusters of horizontally acquired genes, or "genomic islands," are frequently associated with a particular adaptation of the recipient microorganism, such as increased virulence, a particular catabolic capability, or resistance to an antimicrobial or heavy metal (26). Moreover, identification of horizontally acquired regions can help elucidate the evolutionary history of a genome, providing insights into recent adaptations. Analyses included IslandPath analysis (29) and searches for mobility genes, repeats, IS elements, and integrons.
Plasmid pRHL3 carries at least 19 putative "mobility" genes that encode proteins likely to be involved in DNA recombination, including 15 possible transposases and 2 possible integrases. Two of the transposase-encoding genes, RHL3.59 and RHL3.141, are pseudogenes; each contains a single frameshift mutation. Six copies of a tandem repeat of GGCGGTC lie immediately upstream of pseudogene RHL3.59. Five integration/recombination genes are found within the first catabolic gene cluster. There appears to be a "hot spot" for transposase genes in the 105- to 154-kb region: 11 transposases are located in this region, many of which are clustered side by side (Fig. 5). The plasmid also contains an intact IS element at position 265 kb that is highly similar to IS1164 (85% nucleotide identity) identified in R. rhodochrous J1. There are several IS-like elements and IS remnants. Thus, RHL3.112 shows 58% identity at the amino acid level to the IS110 transposase, and RHL3.46 has some similarity to the IS1676 transposase but is shorter and possibly truncated. Finally, RHL3.24 and RHL3.25 are most similar to two genes of the transposase operon found in IS1206, and RHL3.85 is highly similar to a transposase found in IS1295 (82% nucleotide identity). However, RHL3.24 and RHL3.25 lack detectable inverted repeats characteristic of IS elements, and RHL3.85 has a 3' deletion compared to IS1295. Consequently, these three ORFs in pRHL3 may represent IS remnants. No obvious integron-like element was detected on the plasmid.
IslandPath (29), which combines sequence analysis features (%G+C and dinucleotide bias) and annotation features (mobility genes and tRNAs), was used to improve the prediction of genomic islands on pRHL3. Dinucleotide bias was initially calculated by using ORF clusters, as in previous IslandPath analyses. However, due to the smaller size of this plasmid versus the size of the genomic sequences previously analyzed by IslandPath, dinucleotide bias was also calculated over the entire plasmid sequence by using a window of 3 kb, shifted every 0.5 kb. Three regions showed dinucleotide bias by using both approaches. These three regions, together with a fourth that has an unusual G+C content, are described below as putative genomic islands. Each region is proximal to probable mobility genes. This is the first time that IslandPath has been adapted for plasmid analysis.
The first putative island of pRHL3, GI1 in Fig. 5, contains no apparent dinucleotide bias. However, several other considerations indicate that this region may be a genomic island. First, four of eight genes in this region have a G+C value >1 standard deviation below the mean ORF G+C content of the plasmid. Second, these genes are adjacent to a gene predicted to encode a recombinase/integrase (RHL3.36) and two genes predicted to encode transposases (RHL3.37 and RHL3.46). Finally, a pair of 8-bp direct repeats flanks the region. Although repeats of this size often occur by chance, they may represent a duplicated insertion point. GI1 contains the genes predicted to encode limonene monooxygenase, carveol dehydrogenase, and an oxidoreductase.
The second putative island of pRHL3, RHL3-GI2 (Fig. 5), spans positions 78 to 92 kb, includes RHL3.72 to RHL3.85, and is flanked by a pair of 9-bp direct repeats. The majority of these predicted genes encode proteins of no known function (hypothetical or conserved hypothetical). The only annotated gene product corresponds to histone protein H1. Several of the genes in this region have moderate similarity to chromosomally located genes from other actinomycetes. The G+C content of ORFs in this region averages 67%, which is close to the chromosomal G+C content.
The third putative genomic island, RHL3-GI3 (Fig. 5), spans positions 135 to 153 kb and includes RHL3.128 to RHL3.141. The region appears to be an insertion hot spot since there are six transposases flanking this region. At least three pairs of 7- to 8-bp direct repeats also flank this region, suggesting a complex recombination history. As noted above, the significance of such short repeats is unclear. Interestingly, three of the predicted ORFs (RHL3.136, RHL3.137, and RHL3.138) in this island together encode a putative type I restriction modification (R-M) system that has moderate similarity to a Caulobacter system. It has been suggested that horizontal exchange of type I R-M systems may contribute to sequence divergence within R-M families (54). There are also several genes with unknown functions in this second island. The G+C content of each ORF in this region varies. However, the average G+C content, 63%, is slightly below the mean for all ORFs in the plasmid (65%).
The fourth putative island, RHL3-GI4 (Fig. 5), spans positions 194 to 213 kb, includes ORFs RHL3.174 to RHL3.193, and is flanked by a pair of 9-bp direct repeats. The G+C of the region, at 68.1%, is almost 1 standard deviation above the mean for all plasmid ORFs. This island is notable for the presence of several genes encoding probable metal transporters. No other metal transporter genes were found elsewhere on this plasmid. The island also contains two genes, RHL3.183 and RHL3.184, which are predicted to encode the sensor kinase and regulator, respectively, of a two-component signal transduction system.
Finally, pRHL3 contains evidence for a duplication event that involves two ORFs of unknown function (Fig. 5). The respective regions spanning 11.8 to 13.3 kb (RHL3.14 and RHL3.15) and 69.7 to 71.7 kb (RHL3.65, RHL3.66) share an overall nucleotide identity of ca. 92%, which rises to 98% over the first 160 bp. The second region is slightly larger due to a possible insertion between nucleotides 69880 and 70396. The insertion effectively truncates RHL3.65 with respect to RHL3.14. As more sequences from related bacterial species become available, comparative genomic analysis will be helpful in further ascertaining the evolution history of pRHL3.
Regulatory genes. A significant proportion (7%) of the pRHL3 genes are predicted to be transcriptional regulators. This preponderance of regulatory processes correlates well with the proportion of catabolic genes found on the plasmid and their tightly regulated expression. Indeed, more than half of the putative regulatory genes are distributed within the three main catabolic gene clusters described above. Members of the following subfamilies of transcriptional regulators containing a helix-turn-helix motif are predicted to be encoded by pRHL3 genes: TetR/AcrR, LuxR/UhpA, GntR, MarR/EmrR, DeoR, ArsR, and AraC/XylS. Particular examples of such regulatory proteins are discussed above with respect to catabolic gene clusters. Other putative regulatory genes include those encoding a two-component signal transduction system that are part of the putative genomic island RHL3-GI4 (see above) and RHL3.91 which is predicted to encode a serine/threonine kinase.
Transport. As noted above, genes involved in transport constitute a significant proportion of the pRHL3 genes. Except for genes encoding putative ATP-binding proteins at 95.1 kb (RHL3.88) and at 150.7 kb (RHL3.139), all of the ORFs predicted to encode transporters, metal-carrying proteins, membrane proteins, and transport-related proteins occur within or near the second catabolic gene cluster in a 70-kb region. Within this region, there are three distinct clusters of transport-related genes located at 171, 202, and 234 kb, respectively. The first of these comprises four genes that appear to encode a functional ABC transport system including a permease (RHL3.156), an RbsC-like inner membrane translocator (RHL3.157), an ATP-binding protein (RHL3.158), and a periplasmic or lipoprotein component (RHL3.159). In gram-positive bacteria, these systems are also known as binding-lipoprotein-dependent transport systems. The substrate for this ABC system is unknown. However, two lines of evidence suggest that it is a sugar. First, sequence analyses indicate that the translocator is most similar to RbsC, a ribose translocator, and the lipoprotein component is most similar to a putative periplasmic component of a sugar transport system. Second, the putative transporter genes are close to genes that are apparently involved in glucose catabolism. The second "transport cluster" is located in the putative genomic island RHL3-GI4 and is characterized by genes whose products show significant sequence similarity with actinomycete clusters involved in heavy metal transport.
Concluding remarks. The analyses of telomeres and replication components indicate that pRHL3 is an actinomycete invertron typical of those thus far characterized. The analysis of the genes suggests that the principal function of pRHL3 is to increase the catabolic capabilities of RHA1. However, the overall organization of the plasmid is very different from that of the classic catabolic plasmids of pseudomonads, such as pWWO, NAH, and CAM (64). In the latter plasmids, a significant proportion of the genes are responsible for the transformation of a specific compound to tricarboxylic acid cycle intermediates. Moreover, these genes occur in well-organized operons. The high number of mobility genes detected in pRHL3, together with the presence in RHA1 of other invertrons with which it can exchange ends, may help to explain the mosaic nature of this plasmid.
Due to the high number of unknown genes in catabolic gene clusters and the fact that the RHA1 genome is not yet completely known, it is difficult to assess the full catabolic role of pRHL3 in RHA1. However, the presence of several oxygenase- and numerous dehydrogenase-coding genes, known to play central roles in the degradation of aromatic compounds, along with the structural organization of these genes, and the putative nature of some of their regulatory genes strongly suggest that it plays a role in the assimilation of such compounds. Further studies are necessary to shed more light on the true catabolic nature of this plasmid and how it interacts with the RHA1 genome.
|
|
|---|
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
Presented in part at the 11th International Conference on Microbial Genomes, Durham, N.C., 28 September to 2 October 2003. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2010 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»