Previous Article | Next Article ![]()
Journal of Bacteriology, January 2004, p. 535-542, Vol. 186, No. 2
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.2.535-542.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Institut für Mikrobiologie und Genetik,1 Laboratorium für Genomanalyse, Universität Göttingen, Göttingen, Germany,3 LBMPS, Université de Genève, 1292 Chambésy, Switzerland2
Received 22 July 2003/ Accepted 8 October 2003
|
|
|---|
40% of the pNGR234b genes are not strain specific and were probably acquired from a wide variety of other microbes. The presence of 26 ORFs coding for transposases and site-specific integrases supports this contention. Surprisingly, several genes involved in the degradation of aromatic carbon sources and genes coding for a type IV pilus were also found. |
|
|---|
-proteobacteria and is able to establish nitrogen-fixing symbiosis with many different legumes. Despite extensive study, the molecular mechanisms behind this broad host range are not fully apparent (6). Although R. meliloti has a very limited host range (5), it is phylogenetically close to NGR234 and the organization of both genomes is similar (14a, 31a). In both cases, the genome comprises three replicons (14a). Most symbiotic genes are carried on SymA plasmids of 0.54 Mb in strain NGR234 (17) and 1.35 Mb in R. meliloti (1). Both bacteria also possess a second group of plasmids, the so-called exo- or megaplasmids (pSymB). pNGR234b is estimated to be 2.2 Mb (31a), and the size of R. meliloti is 1.68 Mb (14). Chromosomes in both bacteria are about the same size (3.5 Mb in NGR234, 3.34 Mb in R. meliloti) (7, 31a). Also snapshot sequencing suggested that housekeeping and many metabolic genes are similar (48). Sequencing data also suggested that NGR234 differs in its gene content from R. meliloti, however. We sequenced two large contigs of pNGR234b, one of which contains loci involved in extracellular polysaccharide (EPS) synthesis and, thus, in fine-tuning symbiosis. Altogether 575 open reading frames (ORFs) were identified, of which 222 appear to be organized into clusters of more than four genes. Comparative analyses indicated that NGR234 may have acquired large parts of the genetic content of pNGR234b from other soil- and plant-associated microbes.
|
|
|---|
Manipulation of DNA and construction of an ordered cosmid library. On the basis of hybridization and sequence data, a minimal set of cosmids of the canonical ordered library of the NGR234 genome (37a) was selected for further analysis. Selected cosmids were partially digested with Sau3A. Fragments of 0.5 to 3.5 kb were isolated after electrophoretic separation on agarose gels, cloned into pTZ19R (Amersham, Essex, United Kingdom), and sequenced with standard primers. Sequencing was performed by using dye terminator technology on a model 377 sequencer (Applied Biosystems, Foster City, Calif.) or on capillary sequencers from Amersham. The GC-Phrap software package (http://www.jgi.doe.gov/Docs/JGI_Seq_Quality.html#_SeqQ.I) was used to assemble the sequences. Editing and finishing was facilitated by the Staden software package (http://www.mrc-lmb.cam.ac.uk/pubseq/staden_home.html). Sequencing of PCR-generated fragments was used to close single- and double-stranded gaps. ORFs were initially identified by the programs Glimmer (http://www.tigr.org/software/glimmer/) and GeneMarkS (4). The cutoff limit for ORFs without database homologues was 150 bp. Predicted ORFs and intergenic regions were used to interrogate nonredundant protein databases with Blast programs via the website http://www.ncbi.nlm.nih.gov/blast. ORFs were entered into the ERGO Integrated Genomics, Inc. (Chicago, Ill.) bioinformatics suite for genome annotation and metabolic reconstruction. The predicted ORFs were subjected to two initial rounds of annotation (one automatic and one manual). Proteins were categorized with a modified Riley classification (40). Analysis of the sequenced region resulted in the identification of 575 ORFs, which were arbitrary assigned identification numbers with the specific prefix ngr. The names of ORFs of pNGR234a that are also found on pNGR234b are followed by a superscript b; thus, the cysteine synthase cysM (y4xP) has a homologue y4xPb on pNGR234b and so on.
Nucleotide sequence accession numbers. The nucleotide sequences were deposited in GenBank under accession numbers AY316746 and AY316747.
|
|
|---|
![]() View larger version (55K): [in a new window] |
FIG. 1. Physical organization of the ORFs of pNGR234b. Coordinates are given in kilobases. Putative genes and ORFs are colored (grey-boxed area) according to putative functions. ORFs were named genes when BLAST-P searches of the National Center for Biotechnology Information database indicated an identity of <E-80. Several insertion sequence elements are indicated as regions boxed with dotted lines. Angled arrows indicate the locations of possible sigma-54-dependent promoters as well as the location of two possible ROSE elements. Conserved clusters of genes are shown as double-headed and horizontal arrows. Conserved clusters ( 3 ORFs) present in identical or highly similar orders in other bacteria are indicated. Colors show the highest similarity to the bacterium indicated in the boxed area. Only the microbe that showed the highest similarity and most conserved gene order over the entire range of the cluster is indicated.
|
|
View this table: [in a new window] |
TABLE 1. Genes and ORFs identified on the sequenced contigs of pNGR234ba
|
Other interesting features included 14 ORFs that play roles in protection responses. Among these are three genes encoding proteins involved in resistance to acriflavine (ngr226, ngr288, and ngr289), genes involved in detoxification of other small molecules (e.g., ngr065, ngr334, two copies of the multidrug resistance protein B), CopC (copper resistance protein C), and Ngr174, a macrolide glycosyltransferase. ngr153 encodes a homoserine lactone efflux protein, which belongs to the family of the RhtB proteins, that can confer resistance to elevated levels of exogenous L-threonine, L-homoserine, and analogues (53). Homologues of this protein are also found in A. tumefaciens and B. japonicum as well as several other gram-negative bacteria (e.g., Salmonella enterica serovar Typhi, E. coli O157:H7 Sakai, and Brucella melitensis) but not in R. meliloti.
Catabolic functions.
A significant number of genes encoding proteins that could be involved in oxidative metabolism were identified (Table 1). Among them are dehydrogenases, oxidoreductases, and dehydratases. Several predicted sugar kinase genes were also found. A number of ORFs were identified that encode proteins involved in the degradation of complex or aromatic carbon sources, including a protocatechuate 3,4,-dioxygenase (ngr051), an opine oxidase (ngr333 and ngr334,
- and ß-subunit), and a hydroxyquinol 1,2-dioxygenase (ngr391). Other proteins possibly involved in the degradation of complex carbon compounds available to NGR234 include four myoinositol 2-dehydrogenases (ngr233, ngr250, ngr251, and ngr252) and two other proteins linked to the degradation of myoinositol (IolD and IolE). Also, one ORF encodes an octopine dehydrogenase subunit B (ngr446), two ORFs encode agmatinases (ngr257 and ngr540), and another encodes a metapyrocatechase (ngr571). The latter is involved in the degradation of naphthalene in Bacillus, Pseudomonas, and Rhodoccocus via the metacleavage pathway (29, 33). A homologue of the trihydroxytoluene oxygenase, which is involved in the catabolism of 2,4-dinitrotoluene, is encoded by ORF ngr570, and a protein involved in nitrilotriacetate catabolism (ngr138) (50) was found. Both ngr570 and ngr138 are part of a conserved cluster in B. japonicum, several Brucella species, Burkholderia pseudomallei, and Sphingomonas aromaticivorans. Obviously, pNGR234b is important in the catabolism of a remarkably wide spectrum of carbon and energy sources, including loci that are involved in the degradation of aromatic compounds (i.e., ngr570 and ngr571) but are not found in the R. meliloti genome.
Regulatory elements. Another 58 ORFs encode possible regulatory proteins: two for possible polymerase sigma factors (sigJ and sigB), with the rest mostly belonging to the LysR, GntR, and TetR families. Two ORFs (ngr159 and ngr160) encode possible homologues of the two-component regulators NodV-NodW or NwsA-NwsB (Fig. 1. The nodVW genes of B. japonicum are involved in activation of nod gene expression in response to plant-produced isoflavones (20a, 30).
Chaperones and cofactor biosynthesis. Two copies of the heat shock proteins GroES and GroEL, as well as several other ORFs encoding small heat shock proteins, two of which belong to the Hsp20 family (ngr309 and ngr311), were found (Fig. 1 and Table 1). All are required for rapid adaptation to heat stress (34). Although their transcription is commonly activated from sigma-70 promoters, it is also negatively regulated by cis-acting elements (ROSE [repression of heat-shock gene expression]) (32, 35). Two possible ROSE elements upstream of the groES genes are indicated in Fig. 1.
Other genes encode proteins involved in the biosynthesis of cofactors and vitamins. Examples include proteins involved in the biosynthesis of pyrroloquinone (PqqA to -E), pyridoxal phosphate (PdxA), a riboflavin-specific deaminase (ngr151), and the thiamine biosynthesis protein ThiD. Despite their well-known roles as classical cofactors, several of these compounds might be involved in promoting colonization of roots (44, 51). Two proteins involved in the biosynthesis of amino acids (AroC, the shikimate 5-dehydrogenase, and the possible cysteine synthase y4xPb [CysM] homologue) were identified; both, however, may have chromosomal homologues. Nevertheless, the observation that pNGR234b encodes pathways involved in cofactor and amino acid biosynthesis, indicates an essential role in cellular processes. The suggestion that pNGR234b might be essential was further supported by the discovery of a gene involved in the biosynthesis of the 30S ribosomal protein S21 A (RbsU). Interestingly, in both NGR234 and R. meliloti, the rbsU gene is close to the cold shock gene cspA. In R. meliloti, however, the corresponding homologues of rbsU and cspA are found on the bacterial chromosome and are transcribed as one operon (7, 36). To further verify that the pNGR234b is essential for NGR234 cell processes and metabolism, further tests in which the pNGR234b is cured from the host strain are necessary, as well as confirming that other genes essential for cell division are present on this replicon (41).
Macromolecular metabolism.
When an imbalance between carbon and nitrogen, phosphorus, or biotin occurs, many rhizobia sequester the excess carbon as polyhydroxybutyrate (13, 25, 45). Proteins involved in the degradation of polyhydroxybutyrate, BdhA (poly-3-hydroxybutyrate-dehydrogenase) and PhaZ (poly-3-hydroxybutyrate-depolymerase), were found. Degradation is initiated by the action of a polyhydroxybutyrate depolymerase that releases the monomer 3-hydroxybutyrate. A possible endoglucanase (ngr054) and two putative cellulose synthases (ngr055 and ngr066) are encoded on pNGR234b. Several putative proteins involved in cell wall biosynthesis as well as the degradation of polygalacturonate (ngr072) were discovered. The latter belongs to the family of 28 glycosyl hydrolases (23) that cleave 1,4-
-D-galactosiduronic linkages in pectate and other galacturonans. Pectinases and polygalacturonases are part of the armory of plant pathogens, including Erwinia chrysanthemi, where the expression of pectinases is directly linked to pathogenicity (24, 26). No polygalacturonases have been reported in the genome of R. meliloti, whereas A. tumefaciens (ORF ATU4560) and B. japonicum USDA110 (2a) possess putative polygalacturonases. The identification of polygalacturonases in NGR234 is thus intriguing and may suggest a role during the infection process.
Transposases and integrases. Twenty-six ORFs encoding integrases and transposases were found. Their presence is probably linked to the high frequency of DNA rearrangements found in soil bacteria (21, 22). Short repeats (the largest is 140 bp) (positions are indicated in Fig. 1) are interspersed between the integrases and transposases. Interestingly, the G+C content of the DNA fragment framed by both repeats has a significantly lower G+C content (60.4%) than the remaining part of the contig1 (61.7%). These data suggest that lateral transfer of genetic material has occurred.
ORFs without assigned functions. Finally, 128 ORFs were identified for which similarities were observed but functions could not be assigned, and 44 ORFs had no known homologues in the databases.
Comparative analyses of loci found in other plant-associated species.
Possible horizontal transfer of all the identified ORFs was examined by comparing pNGR234b with the genomes of other plant-associated microbes. All potential operons and gene clusters (
3 ORFs), were compared with the genes of pNGR234a (17) and the complete genomes of the following members of the Rhizobiaceae: A. tumefaciens (20, 49), B. japonicum USDA110 (28), M. loti MAFF303099 (27), and R. meliloti 1021 (18). Available information on the plant pathogen Erwinia carotovora (http://www.sanger.ac.uk/Projects/E_carotovora/) was also included in the analysis.
Initial analyses indicate that at least 176 ORFs are part of paralogous clusters and 291 ORFs are part of 62 chromosomal clusters. Use of the ERGO suite to examine a number of these loci showed that 29 conserved gene clusters (comprising 222 ORFs) have similar or identical gene orders to those found in one or several plant-associated microbes (Fig. 1). Of these, 158 ORFs were identified in clusters or operons that are highly similar to clusters or operons of pSymB of R. meliloti.
Analysis of the exo-exs cluster. Obvious structural similarities were seen among the exo and exs genes of pNGR234b and pRmSymB. This cluster contains 31 ORFs stretching from the thiD gene to the exsI gene (Fig. 1 and 2). DNA identities of about 80% extend across the cluster, and the orientation of the genes is the same. exo and exs genes are involved in the synthesis of low-molecular-weight EPSs, which are essential for nodule invasion (2, 37). Thus, Exo mutants of NGR234 are ineffective on the host plant Leucaena leucocephala (8). Profiling with restriction enzymes and comparison with previously sequenced exoX and exoY genes (GenBank accession number X16704) showed that the core of this cluster has been previously mapped (9). In addition, the pNGR234b exo cluster is similar to comparable loci in A. tumefaciens C58 and M. loti MAFF303099. The exoPNOMAL fragment is present in all four species (Fig. 2). The most striking differences between NGR234 and R. meliloti were found on both sides of the conserved exoI region. ORFs corresponding to exoH and the genes exoTWV were not found in the sequenced regions of pNGR234b, suggesting that two deletions occurred (Fig. 2). This possibility is supported by the identification of a 37-bp exoH fragment of NGR234 (upstream of exoK), which forms the left border of the deleted exoH region.
![]() View larger version (19K): [in a new window] |
FIG. 2. Physical organization of the exo-exs cluster of Rhizobium sp. strain NGR234 compared to that of A. tumefaciens strain C58, M. loti strain MAFF303099, and R. meliloti strain 1021. Asterisks mark deletions of exoH and exoTWV (see text).
|
ExoV modifies the terminal glucose of the R. meliloti succinoglycan subunit with a pyruvyl group. Although the NGR234 exo gene cluster lacks exoV, the nonreducing galactose of the subunit is also pyruvylated (11), suggesting that a nonidentified pyruvyl transferase must exist in the NGR234 genome. Interestingly, NGR234 also harbors the acetyltransferase exoZ. ExoZ of R. meliloti acts on the trisaccharide Gal-(Glu)2 (39), whereas the subunit of NGR234 is acetylated at the nonreducing galactose of the side chain [(GlcA)2-Gal] and at sites that have not been determined (10, 11). Thus, it is possible that the NGR234 ExoZ gained the ability to acetylate the third sugar of the side chain, thereby conserving the specificity for trisaccharides.
Once the succinoglycan subunits of R. meliloti are synthesized, they are polymerized and exported by ExoP, ExoQ, and ExoT (19). The pNGR234b exo cluster lacks exoT. This finding suggests that exoT is not required for acidic EPS synthesis in NG234 or that a functional exoT exists at another position in the genome. The symbiotically active succinoglycan of R. meliloti consists of low-molecular-weight succinoglycan, which is released from the polymer by the extracellular glycanases ExoK and ExsH (52). We identified a sequence encoding a putative ExoK glycanase, but exsH was not found. A promoter is located 211 bp upstream of the exoK start codon of R. meliloti (3). The corresponding sequence in NGR234 is located within the mutated region upstream of exoK. It is possible, therefore, that the promoter of exoK is not functional in NGR234. Low extracellular glycanase activity would explain why NGR234 produces relatively low amounts of low-molecular-weight EPS (10).
In summary, the organization of the exo cluster suggests that acquisition and deletion of genetic information has extensively shaped pNGR234b. It is tempting to speculate that the exoKHTWV is the original sequence of succinoglycan-producing members of the Rhizobiaceae. In A. tumefaciens, this arrangement has been maintained. Then a common ancestor of R. meliloti and NGR234 acquired a 2-kbp fragment containing ngr20301 (also known as smb20953), ngr011 (also known as smb20952), exoI, and ngr2014 (also known as smb21673). These imported DNA sequences seem not to be involved in the synthesis of EPS. Finally, two deletions (perhaps first exoTWV and later the now useless exoH) resulted in the organization of pNG234b shown in Fig. 2.
Genes encoding type IV pili. Among other interesting loci identified was a cluster encoding a type IV pilus. Type IV pili are unique structures on the bacterial surface that are found in many gram-negative bacteria (Fig. 3A), where they play an important role in bacterial adhesion to host cells, biofilm formation, conjugative DNA transfer, motility, and infection by bacteriophages (12). Pili are secreted through the inner and outer membranes. In Caulobacter crescentus, at least seven genes are required for pilus assembly, including pilA, cpaA, cpaB, cpaC, cpaD, cpaE, and cpaF.
![]() View larger version (30K): [in a new window] |
FIG. 3. (A) Physical organization and comparative analysis of the type IV pilus cluster of Rhizobium sp. strain NGR234. Same colors indicate similar predicted functions of the depicted ORF. No coloring indicates that no link to the pili biosynthesis cluster or to the larger gene cluster was observed. ORFs rsm4251 and rsm4262 encode transposases. (B) Physical organization and comparative analysis of the y4yB-y4xM clusters of Rhizobium sp. strain NGR234. Same colors indicate similar predicted functions of the depicted ORF.
|
Two clusters with high similarity to loci of E. carotovora. Two other clusters are highly homologous to loci of the plant pathogen E. carotovora. A contig2 cluster stretching from ORF ngr068 to ngr074 encodes genes probably involved in modification and/or degradation of plant cell walls. A second cluster on contig1 includes the ORFs y4yBb (ngr505) to fcuA, which encode genes important in iron transport and subsequent metabolism. Seven of the ORFs, y4yBb to ngr511, are found on the same strand with overlapping stop and start codons, suggesting they could be transcribed as an operon. fcuA is located approximately 300 bp downstream of ngr511 and on the opposite strand (Fig. 3B). y4yAb may encode a decarboxylase, whereas y4xPb (CysM) has homology to cysteine synthases. Both enzyme families require pyridoxal phosphate as a cofactor, and one of the genes (pdxA) required for the synthesis of this cofactor are located on the same contig. y4xOb shows weak homology to octopine dehydrogenases, y4xNb belongs to the IucA-IucC family of siderophore biosynthetic enzymes, y4xMb encodes a possible permease and possesses 11 predicted transmembrane domains, ngr511 has homology to iron(III) dicitrate binding proteins, the fcuA gene codes for a possible ferric siderophore receptor, and y4yBb is homologous to many hypothetical bacterial proteins of unknown function.
Interestingly, several of the ORFs within this cluster (y4yBb to y4xMb) are duplicated (85% identity and in the same order) on pNGR234a (Fig. 3B). Upstream of y4yB on pNGR234a is a 40-bp repeat that is highly similar to y4yBb on pNGR234b. Downstream of the pNGR234a y4xM ORF there is no sign of ngr511 nor any indication of sequences originating from pNGR234b. Interestingly, the duplicated ORFs on pNGR234a are found within a region containing the genes encoding a functional type III protein secretion system. Furthermore, they lie between nopX and nopL, two genes that encode proteins secreted by this system (47). It is possible that y4yB to y4xM could be coregulated with nopX, as there are no obvious transcriptional termination signals in the 185-bp (nopX to y4yB) intergenic region. Transcriptional analysis of pNGR234a showed that y4yB to y4xM are all strongly induced 24 h after flavonoid addition, indicating possible symbiotic functions (38) (it should be cautioned that some of the transcripts could have come from the other duplicated genes, however). Since y4yB to y4xM are within the type III secretion system cluster, a polar mutation in y4yB was generated, but it did not have any obvious symbiotic phenotype or effect on protein secretion (R. Dieckmann, C. Marie, X. Perret, W. J. Broughton, and W. J. Deakin, unpublished data). Obviously, the discovery of second copies of y4yB to y4xM suggests that a double mutant should now be created to answer this question.
Both B. japonicum USDA110 and M. loti MAFF303099 possess type III secretion systems, yet neither has homologues of y4yB to y4xM, perhaps suggesting that this locus is not essential for protein secretion. Several of the ORFs have homology to proteins involved in iron transport, including a siderophore receptor, implying that they may monitor the iron status of the environment. It is noteworthy that iron often limits bacterial virulence and, particularly, type III protein secretion. The role of iron in virulence of Erwinia has been well documented (15, 16), although homologues of the y4yB cluster have not been implicated. Type III protein secretion by the plant pathogen Ralstonia solanacearum is controlled by an outer membrane receptor (PrhA) with homology siderophore receptors (31). R. solanacearum also contains homologues of many of the genes in the y4yB cluster. Several genes of the y4yB-y4xM cluster are also found in other plant pathogens as well as the human pathogen Staphylococcus epidermidis ATTC 14990.
In summary, these observations suggest that 40% of the genes and operons identified on pNGR234b are not strain specific and may have been acquired from other related bacteria. This would explain why NGR234 is so successful in nodulating different legumes; symbiotic competence arises from a flexible genome that is able to efficiently integrate foreign DNA from other bacteria.
Research in LBMPS is financed by the Fonds National Suisse de la Recherche Scientifique (project 31-63893.00) and the Université de Genève. Research in Göttingen was funded by the Genomik Network of the BMBF and the FCI.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»