Previous Article | Next Article ![]()
Journal of Bacteriology, October 2004, p. 6560-6574, Vol. 186, No. 19
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.19.6560-6574.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Biological Sciences Department and Environmental Biotechnology Institute, California Polytechnic State University, San Luis Obispo, California,1 Department of Molecular Sciences, University of Tennessee Health Science Center, Memphis, Tennessee2
Received 24 March 2004/ Accepted 9 July 2004
|
|
|---|
|
|
|---|
Multiple species of the genus Pseudomonas are commonly found in the environment. This subset of the gamma-proteobacteria is metabolically and catabolically diverse, and several species are known to play important roles in elemental cycling. P. aeruginosa is commonly found in soil and water and is one of the best-characterized members of the genus. P. aeruginosa strain PAO1 has been extensively studied and is used as the model strain for Pseudomonas genetics (37, 39, 75). The complete genomic sequence of P. aeruginosa strain PAO1 was finished in 2000 (85). Several Pseudomonas plasmids have also been characterized, including some isolated by selective growth enrichment and others specifically constructed for the degradation of petroleum hydrocarbons and other toxic chemicals (1, 15, 30, 47, 92). Thus, there is significant potential for use of P. aeruginosa for bioremediation of contaminated environments.
In addition to its metabolic capability and environmental versatility, P. aeruginosa is also studied because of its ability to cause human disease and its resistance to many antibiotics (7, 12, 27, 52). P. aeruginosa is an opportunistic pathogen that causes infections at a number of sites, including the urinary tract, respiratory system, and central nervous system (7, 12). The success of P. aeruginosa as a pathogen can be attributed to its large number of virulence factors, including those that confer weakened host defenses, resistance to antibiotics, and production of extracellular enzymes and toxins (18, 45, 67, 73). The ongoing analysis of the factors that contribute to P. aeruginosa virulence hold promise for the development of better antibiotics and methods for treatment of such infections (26, 79, 80).
In this investigation, the genome of bacteriophage B3, a transposable phage of P. aeruginosa, was sequenced, analyzed, and compared to the genomic sequences of Mu (68) and multiple Mu-like prophages (10, 20, 32-34, 48, 70-72, 83).
|
|
|---|
lacZM15 Tn10 were obtained from Stratagene (La Jolla, Calif.). Phage DNA isolation, cloning, and sequencing. Phage B3 was propagated for DNA extraction using the P. aeruginosa PAO1 host and a plate lysate method in which the phage-infected cells are suspended in a soft agar overlay (66). After the agar overlay was harvested and subjected to centrifugation to remove agar and cell debris, the supernatant was treated with DNase and RNase (Sigma, St. Louis, Mo.) and the phage particles were concentrated by precipitation with polyethylene glycol 8000 (Fisher Biotech) and resuspended in SM buffer (66). Phage DNA was then isolated by phenol extraction and ethanol precipitation (76). The phage DNA was sheared, treated with Bal 31 exonuclease (Promega, Madison, Wis.), and fractionated on a 0.7% SeaPlaque agarose gel (FMC Corporation, Newark, Del.), and fragments in the 1.6- to 3-kb range were purified using a Qiaex II gel extraction kit (QIAGEN, Valencia, Calif.) (76) and cloned into pPCR-Script Amp using a PCR-Script cloning kit (Stratagene). After E. coli XL1-Blue cells (Stratagene) were transformed and plated on Luria-Bertani agar containing ampicillin (50 µg/ml; Fisher Scientific, Hanover Park, Ill.), X-Gal (5-bromo-4-chloro-3-indolyl-ß-D-galactopyranoside) (80 µg/ml; Fisher Scientific), and IPTG (isopropyl-ß-D-thiogalactoside) (20 mM; Fisher Scientific), white colonies were purified, cultures were grown, and plasmid DNA was isolated using an UltraClean plasmid mini-prep kit (Mo Bio Laboratories, Solana Beach, Calif.). Plasmid inserts were sequenced from both ends using standard T3 and T7 primers and Dye Terminator chemistry (PE Applied Biosystems, Foster City, Calif.) on ABI Prism 373 and 377 automated DNA sequencers (PE Applied Biosystems). For direct B3 genome sequencing, custom primers were designed from the assembled clone sequences using PrimerSelect software (DNASTAR, Inc., Madison, Wis.), and the sequence was determined using an ABI Prism 377 automated DNA sequencer (PE Applied Biosystems).
Sequence assembly and annotation. Genome analysis was performed using the Lasergene software suite (DNASTAR, Inc.). The processed sequences were aligned and edited using SeqMan II and coding regions were predicted using GeneQuest. Where multiple open reading frames (ORFs) were possible, the presence of ribosome-binding sites, codon bias, and genetic overlap were used to help choose the final ORF presented. Searches of public databases for homologous DNA sequences were performed with BLASTN; whereas searches for protein sequence homologues were performed with BLASTP, BLASTX, PSI-BLAST, and TBLASTX (4, 5; http://www.ncbi.nlm.nih.gov/BLAST).
Variable host DNA sequences were identified by using a BLASTN search against the P. aeruginosa genome and the entire nonredundant GenBank sequence database. Searches for helix-turn-helix motifs were performed using GYM (69; http://www.cs.fiu.edu/
giri/bioinf/GYM2/prog.html), and searches for potential regulatory protein-binding sites were performed with BIOPROSPECTOR (54; http://robotics.stanford.edu/
xsliu/BioProspector/).
Nucleotide sequence accession number. The DNA sequence of the B3 genome was deposited in the GenBank data bank and has been assigned accession number AF232233.
|
|
|---|
A combined total of 655 plasmid and direct genome sequences were used to assemble the complete B3 genome. The average sequence, after end trimming and vector removal, was 578 bases long. Each nucleotide position was sequenced a minimum of two times, in general at least once from each strand. When a nucleotide position was sequenced twice from a single strand, both cloned DNA and direct genome sequencing were used. Only 1.8% (720 bp) of the complete genome was represented by single-strand coverage. On average, each nucleotide position was sequenced 9.85 times, with 336 sequences representing the upper DNA strand and 319 sequences covering the lower strand. The linear genome is composed of 38,439 bp (GenBank accession number AF232233) with a G+C content of 63.3%. This high G+C content correlates with that of P. aeruginosa strain PAO1, which displays a G+C content of 66.6% in most predicted coding regions (85).
Characterization of the ends of the B3 genome and attached host DNA. Assembly of multiple right-end sequences from plasmid clones revealed a position-specific divergence in sequence homology that defined the right end of the B3 genome. Since only one clone contained B3 left-end DNA, this approach could not be used to define the precise genome left end. Therefore, direct genome sequencing of B3 DNA isolated from phage particles was performed with a primer walk-out strategy analogous to that used previously for Mu (44). The resulting sequences showed a loss of specific base identification immediately 5' to terminal 5'-TG dinucleotides at both ends of the B3 genome (data not shown). We interpret this loss of base identity to be due to host DNA attached to the terminal 5'-TG dinucleotides, since all bases would be represented in nearly equal proportions if random host DNA segments were present at the ends of the B3 genome. These results confirmed previous findings for B3 (50, 74).
The presence of Pseudomonas DNA on both ends of the packaged B3 genome was confirmed by performing a search for nucleotide sequence homologues by BLASTN analysis (5) of B3 left- and right-end clones. Sequences of the putative host DNA in the one left-end clone and three right-end clones displayed nearly perfect homology with different regions of the PAO1 genome, as expected for a transposable phage. In all cases, host homology ceased precisely at the 5'-TG B3 end. Nine clones lacking B3 homology contained sequences homologous to different regions of the PAO1 genome (data not shown). These clones were most likely derived by cloning of host DNA from the ends of B3 phage particle DNA or from particles containing only transducing DNA.
BLASTN analysis of the B3 genome.
The B3 genome was screened for nucleotide sequence homology with bacterial, bacteriophage, and viral DNA sequences using BLASTN analysis (5). Significant nucleotide homology was found for only two regions of the B3 genome. The first region reflected strong homology (expect value of 5e-24; 83% identity over 178 bp) between a region of bacteriophage
E125 (93) and ORF48 of B3. As will be discussed later, these homologous regions comprise about one-quarter of the dam DNA modification genes of both phages.
The second region consisted of the last 111 bp (bp 38328 to 38439) of the B3 genome, which displayed strong homology (expect value of 2e-41) with a sequence containing the PAO1 cyanide-insensitive oxidase A and B operon (cioAB; GenBank accession number Y10528; 19), but no homology with the sequenced PAO1 genome (85). Analysis of this 2,925-bp cioAB sequence by BLASTN against the PAO1 genome revealed strong homology with cioAB bp 1 to 2812 (corresponding to bp 4403831 to 4406643 in the PAO1 genome; 19, 85). However, the remaining 113 bases of the cioAB sequence shared no similarity with the PAO1 genome sequence but displayed nearly perfect homology with the right end of the B3 genome (106 of 113 nucleotides [nt]). This region of B3/cioAB homology lies 59 bp downstream of the termination codon of the cioB gene, with an orientation such that the right end of the B3 genome is closest to cioB. The simplest explanation is that there is a B3 or B3-like prophage or a cryptic B3-like sequence downstream of cioB in the PAO1 substrain (PAO6049) from which the cioAB sequence was obtained (19).
Assignment of probable B3 genes. Putative ORFs within the B3 genome were located using GeneQuest software (DNASTAR, Inc.) and validated by multiple methods. The predicted protein sequences were used for BLASTP analysis (5) to identify ORFs displaying significant homology with other bacteriophage or bacterial protein sequences. Segments of B3 nucleotide sequence were also subjected to BLASTX analysis (5) to screen for possible homologues in other B3 reading frames. Sequence upstream and in the early portion of each predicted ORF was examined manually for the presence of ribosome-binding site sequences (e.g., GGAGG) (57) within 4 to 14 bases upstream of potential start codons. The sequence was also examined for the close proximity of start codons to stop codons, as is frequently observed in prokaryotic operons (57). When multiple possible ORFs were present, codon usage was used to aid the choice of the most likely coding region. As observed for many organisms with high G+C DNA (16), most B3 ORFs show a strong preference for G+C in the first and third base positions, with the second position displaying a preference for A+T (Table 1). Interestingly, the 1.5-kb region from 10.0 to 11.5 kb on the B3 genome exhibits less striking preference and has a lower G+C content, 53.9% instead of 63.3%, which may reflect its relatively recent incorporation into the genome.
|
View this table: [in a new window] |
TABLE 1. Proposed B3 coding regions
|
Detection of homologues and prediction of gene function. With the availability of genome sequences for many characterized bacteriophages, one fruitful approach for predicting protein function is the detection among homologous proteins some whose functions are already known. To take advantage of this approach, we performed homology searches with BLASTP and PSI-BLAST algorithms (4, 5) against protein sequences in the protein database at the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov). BLASTP and the first round of PSI-BLAST perform similar searches and were run with and without a low-complexity filter, respectively. In most cases, the homologues identified and scores obtained were similar (see Table S2 in the supplemental material). The second iteration of PSI-BLAST incorporates sequence information from the strong homologues identified in the first round for a second search, with the consequence that scores for the first homologues usually increase and new homologues are detected (4). The results showed that many B3 genes are similar to genes in Mu and Mu-like prophages (defined as prophages that encode transposase homologues and a substantial number of other Mu protein homologues in the early, lysis, and head gene regions) (see Table S2 in the supplemental material). The identity of the highest scoring homologue and the homologue most useful in predicting the gene's function are shown in Table 1, with their scores presented as expect values, which roughly indicate the likelihood of the observed match occurring at random. For B3 genes with homologues of known function, we were able to assign predicted functions, which are shown in Fig. 1.
![]() View larger version (21K): [in a new window] |
FIG. 1. Comparison of the genome organization for phages B3 and Mu. The genomes of B3 and Mu are shown as thick lines and are drawn approximately to scale, with tick marks at 1-kb intervals. ORFs are shown as boxes; their positions were derived from GenBank accession numbers AF083977 and NC_000929 for the Mu genome and this work for B3 (GenBank accession number AF232233). The direction of transcription is indicated by an arrow; in addition, genes transcribed rightward are shown above the line, and those transcribed leftward are positioned below the line. Relevant Mu genes are identified by their gene names or sequential genome gp numbers, and functions if known. The predicted B3 gene functions shown were deduced from the known functions of homologous genes in Mu or other phages; for B3 genes with Mu homologues, the identity of the homologous Mu gene is also given.
|
The untranslated region between ORF15 and ORF16 contains a typical bacterial promoter, with a 35 hexamer (TTGCCA) separated by 16 nt from a 10 hexamer (TTTGTT) (Fig. 2). The high similarity of this promoter to consensus promoter sequences (35 TTGACA, 16- to 18-nt spacer, and 10 TATAAT [29]) suggests that it will be recognized directly by the host RNA polymerase and efficiently drive leftward transcription of the early operon.
![]() View larger version (23K): [in a new window] |
FIG. 2. Potential regulatory elements in untranslated regions. The sequences for both DNA strands in relevant predicted untranslated regions are shown. The beginning or end of the flanking ORFs are shown by the large brackets; the ORFs continue in the directions of the broken lines. Potential 10 and 35 hexamers are indicated by bullets, which are pointed in the direction of transcription.
|
One would expect these two untranslated regions to contain binding sites for a B3 repressor protein that would keep early transcription turned off in a lysogen. Searches with BIOPROSPECTOR (54) identified multiple related sequences that overlapped all three promoters and, thus, might serve as repressor-binding sites, but we were unable to develop a single convincing consensus sequence from them. Nevertheless, there are two candidate ORFs that could encode the B3 immunity repressor protein, ORF17 and ORF18. The predicted protein encoded by ORF18 exhibits amino acid sequence similarity to the phage P22 c2 repressor and a long list of predicted phage repressor proteins (Table 1; see Table S2 in the supplemental material). The putative ORF17 protein, while not homologous to known regulators, also contains a potential helix-turn-helix DNA-binding motif (data not shown). The genes for both proteins lie close to the ORF17-ORF18 and ORF15-ORF16 intergenic regions where the immunity repressor is predicted to bind, a location consistent with the typically modular organization of phage genes and regulatory elements (8, 35). They are also just downstream of bacterial promoters and, thus, would be transcribed early after infection when repressor protein synthesis is needed to establish lysogeny. The remaining ORFs in this region, ORF13, -15, -16, -20, -21, and -22 have no currently identifiable function. On the basis of their locations in predicted early transcripts, the proteins produced by these ORFs could function as (i) accessory regulators of repressor synthesis, (ii) positive regulators for subsequent phases of transcription, (iii) auxiliary factors regulating or involved in replication and transposition, and (iv) factors stimulating or inhibiting host functions affecting lysogenic or lytic development.
(ii) Middle and late genes. It is well-known that phages and other viruses typically possess mechanisms that confer sequential expression of their genes. For example, genes for genome replication are usually expressed early in the lytic cycle, and genes for phage particle morphogenesis and cell lysis are expressed late. Phage Mu transcription occurs in three phases: early, middle, and late (60). Mu middle and late transcripts are activated by the Mor and C proteins, respectively (59, 63).
The identification of typical bacterial promoter sequences in the ORF15-ORF16 and ORF17-ORF18 intergenic regions makes it likely that transcription of ORF1 through ORF15, ORF16 and ORF17, and ORF18 through ORF23 occurs early during the lytic cycle and is catalyzed by the host RNA polymerase. Since the leftward promoter for transcription of ORF16 and ORF17 may be inefficient due to its long 20-nt spacer, it is possible that a positive regulator produced from one of these transcripts might stimulate transcription from that region, making ORF16 and ORF17 the B3 middle genes. It is equally possible that ORF16 and ORF17 are simply early genes and that B3 has only two phases of transcription, early and late.
The first identifiable B3 late genes are ORF24, ORF25, and ORF27. The predicted ORF25 protein has strong amino acid sequence similarity to more than 50 soluble, predominantly bacterial, lytic transglycosylases (see Table S2 in the supplemental material), and therefore, likely encodes a B3 endolysin that participates in host cell lysis. With the recent release of the BcepMu sequence (GenBank accession number AY539836), we can also identify nearby ORF24 and ORF27, which encode potential membrane or membrane-anchored proteins, as homologues of the predicted BcepMu holin and Rz homologues, proteins predicted to play roles in host cell lysis analogous to those of the well-studied auxiliary lysis proteins of phage lambda (6).
Most of the ORFs from ORF28 through ORF38 are the head genes of B3; they display significant amino acid sequence similarity and gene order with the known head genes of Mu (22, 25, 68) (Table 1 and Fig. 1; see Table S2 in the supplemental material). In particular, ORF30 and ORF31 are homologous to the small and large subunits of the Mu terminase (68, 82) and are followed by the portal and head assembly genes ORF32 and ORF33, respectively. In Mu, the head assembly gene is followed by the Mu G gene (gp31), and ORF35 of B3 is a Mu G homologue. Although the precise role of Mu G protein is not known, it is proposed to be involved in head-tail joining (25). In many tailed phages, including Mu, the overlapping genes for the protease and scaffolding proteins and the gene encoding the major capsid protein follow the above genes (68). On the basis of similarity to BcepMu proteins (86), we can also identify ORF36 as the gene encoding the B3 protease and nested scaffolding protein (designated ORF36Z) and ORF38 as encoding the B3 major head protein, the capsid protein. Thus, the order of B3 head genes roughly parallels that in Mu and other tailed phages.
In many phage genomes, a cluster of tail genes follows the head genes, with the tail-length tape measure gene being preceded by the tail sheath and tail tube genes (14, 68). Phage B3 has the two-gene DNA modification gene cluster located immediately upstream of the predicted B3 tail-length tape measure gene, ORF49; however, ORF43 exhibits some similarity to the tail sheath protein of phage Sti3 (Table 1; see Table S2 in the supplemental material), and other ORF43 homologues are closely related to other phage tail sheath proteins (data not shown). With the exception of ORF43 and ORF49 (and possibly ORF40, a homologue of Mu gp36 of unknown function), there is no significant homology to any characterized phage tail genes. Since B3 has a flexible noncontractile tail morphologically similar to that of phages lambda and A118 (46, 55, 84), one might expect approximately 10 B3 genes to be involved in tail morphogenesis. The right half of the B3 genome contains 15 ORFs of as yet unidentified function (ORF40 through -42, ORF44 through -46, and ORF50 through -58). Excluding the very small ORFs (ORF45 and ORF46) and ORF58, which had no detectable homologue, there are 12 remaining B3 ORFs, and we predict that most of these are involved in B3 tail morphogenesis. After completion of the BLAST analysis, we discovered that genes at the right end of the phage D3112 genome encode proteins with very high similarity to those from B3 ORF41 through ORF57, with the notable exception of ORF45 through ORF48, which include the B3 DNA modification gene cluster. These D3112 genes also have the same order as their B3 homologues (91).
Examining the sequences in untranslated regions and very short ORFs between ORF22 and ORF57 both manually and with the program BIOPROSPECTOR (54), we failed to detect any consistent sequence elements that might serve as promoters or binding sites for regulators of late transcription. We were also unable to detect elements which could form a potential RNA stem-loop structure, followed by a string of T residues, features characteristic of Rho-independent terminators (11). Thus, insight into possible regulatory sequences and mechanisms must await experimental determination of transcript ends within this late gene region. Interestingly, there is a poor, but recognizable, promoter just upstream of ORF58 (Fig. 2), raising the possibility that it is a moron, an autonomous genetic module containing a protein-coding region flanked by a promoter and terminator (41).
(iii) DNA modification gene cluster. The Mu DNA modification gene cluster contains two genes, com and mom, which are located at the extreme right end of the genome (Fig. 1). The Mom protein modifies about 15% of the adenine residues in Mu DNA to acetamidoadenine, protecting it from cleavage by a variety of restriction enzymes (31, 42). In the absence of Com, translation of mom mRNA is inefficient due to formation of a stem-loop structure that occludes the ribosome-binding site and start codon for Mom translation (31, 42). The Mu Com protein is a zinc finger protein that binds to an adjacent stem-loop structure, destabilizing the mom RNA stem-loop and allowing translation of Mom (31, 42).
The B3 DNA modification gene cluster also contains two genes: ORF47, which encodes a Com homologue containing the four highly conserved zinc finger cysteines, and ORF48 which encodes a DNA adenine methyltransferase homologue (see Table S2 in the supplemental material). Folding of RNA sequences containing the translation initiation region of ORF48 using the Zuker (94) Mfold web server (http://www.bioinfo.rpi.edu/applications/mfold) produced a variety of structures depending on the length of sequence used. In these structures, the ORF48 ribosome-binding site bases GGAG were partially or completely sequestered in double-stranded regions (data not shown), leaving open the possibility that translation of ORF48 is similarly regulated.
Global organization of the B3 genome: functional coding regions. Although there is significant amino acid sequence similarity between proteins encoded by B3 and Mu, there are several differences in genome organization, with the most dramatic being the opposite orientations of their early operons (Fig. 1). In B3, the leftmost 9 kb containing the proposed primary early operon is transcribed leftward, whereas in Mu, the early genes are transcribed rightward (Fig. 1).
The proposed B3 late genes are transcribed rightward, as are the Mu late genes, and the order of B3 head genes parallels that of the Mu head genes (Fig. 1). One notable difference between the B3 and Mu genome organization is the location of the B3 com-dam DNA modification gene cluster within the tail gene region about 12 kb from the genome right end; the corresponding com-mom gene cluster of Mu is located at the extreme right end of the Mu genome (Fig. 1).
B3 evolution. Homologues of B3 genes were found in a large array of phages, prophages, and phage-related elements (see Table S2 in the supplemental material). Table 2 lists the names of the phages, prophages, and phage-related elements with significant homology to B3 genes, their hosts, and genome locations and reveals a dramatic increase in the number of Mu-like transposable phage family relatives since the annotation of the Mu genome in 2002 (68). The hosts containing these prophages and elements, as well as their ecological niches, are quite diverse, ranging from nonpathogenic enteric bacteria, such as E. coli K-12 (87), to human or animal pathogens, including E. coli O157:H7 (32), Neisseria meningitidis (48, 71, 88), Salmonella enterica (70), Bordetella bronchiseptica (72), Vibrio cholerae (33), Haemophilus influenzae (20), Burkholderia cenocepacia (86), and Haemophilus ducreyi (GenBank accession number NC_002940), to predominantly nonpathogenic soil organisms, such as Shewanella oneidensis (34) and Chromobacterium violaceum (10), to plant pathogens, such as Xylella fastidiosa (83). This diversity strongly supports the conclusion that tailed phages are genetic mosaics derived by multiple stepwise recombinational exchanges that occur within a single, large gene pool for tailed phages (14, 35, 36).
|
View this table: [in a new window] |
TABLE 2. Phages, prophages, and phage-related elements with homology to phage B3a
|
![]() View larger version (15K): [in a new window] |
FIG. 3. Diagrammatic representation of the degree of relatedness for B3 genes with homologues in other Mu-like phages and prophages. The B3 map, ORFs, and predicted protein functions are shown as in Fig. 1, except that all ORFs are shown above the kilobase line. Phages and prophages with homology to at least six B3 genes are included, and their names are listed on the left. Horizontal lines drawn below a B3 gene indicate the presence of a homologue in the corresponding Mu-like phage or prophage. The thickness of the line indicates the degree of similarity, achieved by dividing the BLAST expect values in Table S2 in the supplemental material into five groups, with the thickest line representing the greatest similarity (group I) and decreasing thickness indicating decreasing similarity. The expect value boundaries for the five groups are as follows: e-50 or better for group I, e-49 to e-15 for group II, e-14 to e-4 for group III, 0.001 to 5.0 for group IV, and >5.0 for group V on PSI-BLAST iteration 1 but 1.0 or better on PSI-BLAST iteration 2.
|
B3 transposase subunit evolution.
The BLAST and PSI-BLAST searches with ORF12 identified a number of homologues in genomic Mu-like prophages and multiple annotated transposase proteins, including the primary transposase subunits of RadMu and Tn552 (Table 1; see also Table S2 in the supplemental material). The Tn552 transposase TnpA, like the Mu A protein, is a member of the DDE superfamily of transposases that include retroviral integrases and the transposases of the IS3 family of bacterial insertion sequences (28, 53). This superfamily consists of a relatively heterogeneous group of proteins that share at least a
200-residue catalytic core and perform closely related strand transfer reactions (28, 53). Despite the lack of similarity to the Mu A transposase subunit, these homologies clearly identified ORF12 as encoding the primary transposase subunit for B3. Its location, in the early operon, also parallels that of the Mu A transposase (Fig. 1).
Curiously, similar searches with ORF11 identified four different groups of proteins. Several of those with the greatest similarity were annotated as homologues of A subunits of the bacterial type II general secretion pathway (GSP) used for the second step of secretion of multiple hydrolytic enzymes and toxins from the periplasm to the extracellular environment (77, 78). The second group contained true GSPA proteins. The third group contained the MshM proteins of the bacterial type IV MSHA (mannose-sensitive hemagglutination antigen) secretion pathway used for synthesis of type IV pili and for transfer of DNA during conjugation and transfer of T-DNA into plant cells by Agrobacterium tumefaciens (17, 56, 61). Many of the type IV secretion pathway proteins are closely related to those of the type II pathway, and MshM shares homology with the GSPA component in particular (56). The fourth group, with considerably lower similarity, contained a number of transposase subunits similar to and including the Mu B and Tn552 TnpB transposase subunits. Strikingly, the proteins with the greatest similarity to B3 ORF 11 were encoded next to the Mu A protein homologue within BcepMu (86) and the closely related Mu-like prophages VioMu, DucMu (DucMu1, -2, and -3), BorMu, and VibMu, a position analogous to that of the B gene in Mu (Fig. 1). In the case of BcepMu, the B3 ORF11 homologue (BcepMu gp8) has been annotated as ExeA and proposed to serve as a potential virulence factor for pathogenesis (86). In contrast, the similarities and locations described above lead us to propose that ORF11 and its phage and prophage homologues encode the second transposase subunit analogous to Mu B, an ATPase subunit that brings the target DNA to the transposase complex (28, 64).
The observed similarity of the transposase subunits to ATP-binding subunits of the type II and type IV secretion systems is real. Beginning a PSI-BLAST homology search with each of the two best-characterized proteins, Mu B (64) and the Aeromonas hydrophila type II secretion protein ExeA (also called GSPA) (40, 81), led to recovery of the same three groups of homologous proteins in the second iteration (data not shown). Furthermore, the conserved
150-amino-acid ATPase domain of the ExeA subunit (COG 3267) is located within the
300- amino-acid conserved transposase ATPase subunit domain (COG 2842) (58; http://www.ncbi.nlm.nih.gov; data not shown). Whereas similarity of the 500- to 700-amino-acid GSPA secretion proteins to each other usually extended over the entire length of both proteins, similarity of the smaller
300-amino-acid type IV proteins and the 300- to 400-amino-acid transposase subunits was generally limited to the N-terminal 250 to 300 amino acids of ExeA. This region of ExeA contains the three motifs characteristic of ATPases, the Walker A motif, the Mg2+-binding site, and the Walker B motif (81). Thus, it seems likely that it is the related ATPase domains in these groups of proteins that are responsible for their detection as homologues.
Clearly, the BLAST scores show that ORF11 of B3 and its prophage homologues are much more similar to the ATPase domains of the secretion genes than to the Mu B-like transposase subunits (Table 1; see also Table S2 in the supplemental material); yet, like the transposase subunits, they are only
400 amino acids long. Perhaps the secretion proteins and ORF11 homologues evolved from a common ATPase ancestor, either by deleting the C terminus of a long protein precursor to form the transposase subunit or by adding 100 to 300 amino acids to the ATPase domain to form the secretion protein, with the added region playing a secretion-specific role. Nevertheless, at this point, we cannot rule out the possibility that ORF11 and its prophage homologues participate in both transposition and protein secretion. A test of their ability to complement an exeA or mshM mutant is clearly warranted, as is a test for transposition of an ORF 11 mutant phage.
B3 DNA modification gene evolution.
The B3 DNA adenine methylase gene, ORF48, was the only B3 gene with significant nucleotide sequence similarity to any other phage or bacterial gene; it shares similarity with gene 27, the DNA adenine methylase gene, of the temperate phage
E125 of Burkholderia thailandensis (93). Not surprisingly, the amino acid sequences of the ORF48 and gene 27 proteins exhibited extremely high similarity over the entire length of both proteins. Very high protein similarity was also observed for the proteins annotated as DNA modification methylases of the BorMu, VioMu, and phage 03 prophages of Bordetella bronchiseptica, Chromobacterium violaceum, and Pseudomonas syringae (see Table S2 in the supplemental material). There were about 70 additional ORF48 homologues; most were bacterial proteins and had considerably poorer scores. This group contained multiple methylase genes associated with restriction-modification systems, e.g., MboI and DpnII, and multiple cytosine-specific methyltransferases as well. Thus, it remains to be seen whether ORF48 and its prophage homologues perform an adenine methylation or some other type of modification and whether that modification provides a survival advantage to the phage as Mom does for Mu (31, 42).
Curiously, B3 is the only phage in Fig. 3 that encodes both Com and Dam homologues. Phages VioMu and BorMu encode Dam homologues, but no Com homologue (Fig. 3). Phages SP18, FluMu, and Mu encode Com homologues but no Dam homologue (Fig. 3). A BLASTP search with the Mu Mom protein sequence revealed that SP18 encodes a Mom DNA modification protein just downstream of its com gene (data not shown), but as observed previously by Morgan et al. (68), FluMu encodes a different non-Mom-like protein at that position, as does Pnm1 (68). A BLASTP search with the FluMu protein sequence identified homologues in Pnm1 and Pnm2 as well as DucMu1, DucMu2, and DucMu3 (data not shown). The lack of similarity between these new genes and Mom or Dam raises the possibility that these genes may perform a DNA modification different from both methylation and acetamidoadenine modification. In Mu, the Com protein is needed to prevent translation of Mom until late in the lytic cycle, because high-level Mom expression is lethal (31, 43). The absence of intact Com homologues in these other phages (Pnm1 contains a mutant com gene [68]) suggests that expression of their DNA modification genes may not be lethal, thereby making Com unnecessary.
In Mu and all but one of the above phages, the DNA modification gene(s) is located very close to the right end of the phage genome, whereas in DucMu1 and B3, the DNA modification gene(s) is located internally, approximately 12 kb from the right end (data not shown). Thus, there are striking differences in the gene number, modification protein sequence, and gene locations in this group of closely related phages.
Summary. The sequencing of transposable phage B3 particle DNA revealed that B3 has a linear genome, 38,439 bp long, with variable host DNA fragments attached to the genome ends. Comparison of the B3 genetic map to that of Mu revealed evidence of multiple genetic rearrangements and substitutions. The results from homology searches with the 59 predicted B3 ORFs allowed us to predict potential functions for almost half of the genes, defining distinct transposition, regulation, lysis, head, and tail regions. Homologous proteins were found in multiple related phages with a diverse range of bacterial hosts, raising the possibility that genetic manipulations based on transposable phage technology (23, 90) can be applied to this broad spectrum of pathogenic and nonpathogenic bacteria. The sequence also provides much of the essential information needed for the development of B3 vectors for use in bioremediation.
We thank Alice Hamrick for assistance with G+C content analysis of the alternate ORFs and Christy Houde for assistance with the primer walk-out sequencing.
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»