Previous Article | Next Article ![]()
Journal of Bacteriology, April 2008, p. 2892-2902, Vol. 190, No. 8
0021-9193/08/$08.00+0 doi:10.1128/JB.01652-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
State Key Laboratory of Virology, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan 430071, China,1 Beijing Genomics Institute, Chinese Academy of Sciences, Beijing, China,2 Cardiff School of Biosciences, Cardiff University, Museum Avenue, Cardiff, CF10 3US, United Kingdom3
Received 12 October 2007/ Accepted 6 February 2008
|
|
|---|
|
|
|---|
To date, no extensive chromosomal DNA sequencing of B. sphaericus isolates has been reported. Moreover, no other genome sequencing of a bacterium incapable of polysaccharide utilization has been completed. B. sphaericus C3-41, a highly active strain isolated from a mosquito breeding site in China in 1987, shows toxicity against Culex sp., Anopheles sp., and Aedes sp. and has significantly higher activity against Culex sp. than the commercialized B. sphaericus strain 2362 (75). The C3-41 strain belongs to the flagellar serotype H5a5b, like B. sphaericus strains 2362 and 1593 (74), and it has been developed as a commercial larvicide (JianBao) and successfully used for the control of mosquito larvae for more than 10 years in China. In some cities, such as Shenzhen, Foshan, and Dongguan in southern China, the C3-41 mosquitocidal formulation has been chosen as the sole larvicidal agent for breeding-site management in the integrated mosquito control program. Here, we report the sequencing of the B. sphaericus C3-41 genome and a comparative analysis with genomes of other species. These data provide a global view of the genes possessed by the organism and an insight into evolutionary relationships among the bacilli.
|
|
|---|
Sequence annotation. Glimmer 3 gene finder (56) was utilized to identify potential coding regions. The annotation was accomplished by BlastP analysis of sequences in the Nr, Nt, and Swissprot databases, respectively, and by manual curation of the outputs of a variety of similarity searches and was completed as described previously (63). The possible orthologs of the genome were identified based on the COG (clusters of orthologous groups of proteins) database and classified accordingly (62). The potential coding sequences (CDSs) involved in different pathways were determined by KEGG analysis (47, 30). The protein motifs and domains of all CDSs were documented based on intensive searches against publicly available databases and by using their application tools, including Pfam, PRINTS, PROSITE, ProDom, and SMART. The results were summarized with InterPro (6). tRNA genes were identified by using tRNAscan-SE (41). GC skew analysis and the circular-genome-map drawing were performed by using CGView software (60).
Construction of orthologous/paralogous families and comparison analysis. Orthologous/paralogous families for B. sphaericus C3-41, five other completely sequenced organisms (i.e., Bacillus anthracis strain Ames, Bacillus subtilis strain 168, Thermoanaerobacter tengcongensis MB4, Clostridium perfringens strain 13, and Escherichia coli K-12), and two gapped genomes, those of Bacillus sp. strain NRRL B-14911 and Bacillus sp. strain NRRL B-14905, were built by using Treefam's method of comparing the gene tree with the species tree (http://www.treefam.org/) in stages (38). The method of Treefam defines a gene family as a group of genes that evolved after the speciation, and the orthologs and paralogs in TreeFam are inferred from the phylogenetic tree of a gene family and are different from those inferred by BLAST matches (i.e., Inparanoid, KOGs, and OrthoMCL) or BLAST matches and synteny (i.e., Ensembl-Compara and HomoloGene). Thus, it also tries to include outgroup genes to reveal the distant members (38). In this study, besides the reference Bacillus genomes, we have taken C. perfringens and E. coli as the outgroup species. In addition, B. sphaericus is an archaic organism and its spores have apparently been found in 25- to 40-million-year-old amber (9). Previous studies have also suggested that this mesophilic bacterium is closely related to some thermophilic archaea (29, 45, 70). T. tengcongensis was, therefore, included in the construction of orthologous/paralogous families.
There are four main steps: (i) an all-versus-all BLAST with proteins and conjoined fragmental alignments by Solar (http://treesoft.svn.sourceforge.net/viewrc/treesoft/branches/dev/solar); (ii) clustering of gene families by using Hcluster_sg (http://treesoft.svn.sourceforge.net/viewrc/treesoft/trunk/hcluster); (iii) performing multiple alignments by Muscle (http://www.ebi.ac.uk/muscle) and converting protein alignments to CDS alignments by using a Perl script; and (iv) building phylogenetic trees and inferring orthologs and paralogs by the neighbor-joining method.
Nucleotide sequence accession numbers. The B. sphaericus C3-41 genome is available in GenBank under accession numbers CP000817 and CP000818.
|
|
|---|
2.9 Mb and on the inner strand from
2.9 Mb to the origin (Fig. 1, circles 1, 4, and 5). This is also reflected by the presence of several genes near the 2.9-Mb site, including parC and parE, which encode the subunits of topoisomerase IV, involved in chromosome partitioning (1, 12). However, we did not find the homolog of rtp (replication terminator protein-encoding gene) in the chromosome of B. sphaericus C3-41, and it can be seen from Fig. 1 that the putative replication termination site is significantly offset from the point diametrically opposite to the origin. This is not unique to B. sphaericus C3-41, but it is unusual. The potential origin of replication of the large, 178-kb plasmid was also analyzed by GC skew (Fig. 1B). The possible replication proteins were identified through database comparisons (Bsph_014-017). Close to this putative origin, a predicted CDS shows high similarities to FtsZ/tubulin family proteins, which are known to be involved in plasmid replication (7, 67). |
View this table: [in a new window] |
TABLE 1. General features of whole-chromosome sequences of B. sphaericus C3-41
|
![]() View larger version (33K): [in a new window] |
FIG. 1. Circular representations of the genome of B. sphaericus C3-41. (A) Chromosome. (B) Plasmid pBsph. From the inside: circles 1 and 2, GC skew and G+C content (20-kb window with 5-kb step); circle 3, blue and green bars show positions of tRNA and rRNA, respectively, and black bars show positions of repeats; circles 4 and 5, CDSs on the – and + strands. Colors reflect functional categories of CDSs. Teal, chromatin structure and dynamics; blue, energy production and conversion; orange, cell cycle control, cell division, and chromosome partitioning; maroon, amino acid transport and metabolism; dark blue, nucleotide transport and metabolism; silver, carbohydrate transport and metabolism; dark green, coenzyme transport and metabolism; dark purple, lipid transport and metabolism; navy, translation, ribosomal structure, and biogenesis; light brown, transcription; aqua, replication, recombination, and repair; green, cell wall/membrane/envelope biogenesis; fuchsia, cell motility; gray, posttranslational modification, protein turnover, and chaperones; dark yellow, inorganic ion transport and metabolism; dark blue, secondary metabolite biosynthesis, transport, and catabolism; dark red, general function prediction only; dark gray, function unknown; lime, signal transduction mechanisms; yellow, intracellular trafficking, secretion, and vesicular transport; olive, defense mechanisms; black, not classified by COG. The "0" coordinates marked on the outmost circles correspond to the putative replication origins, and the putative replication termination site is located near 2.9 Mb.
|
Genome comparisons between B. sphaericus C3-41 and other bacteria. Direct comparisons between the predicted CDSs of the B. sphaericus C3-41 chromosome and those of seven other bacterial species (B. anthracis strain Ames, B. subtilis strain 168, T. tengcongensis MB4, C. perfringens strain 13, E. coli K-12, Bacillus sp. strain NRRL B-14911, and Bacillus sp. strain NRRL B-14905) were performed by BLAST analyses (5). The results revealed that the putative CDSs of B. sphaericus C3-41 share greater similarities with those of Bacillus sp. NRRL B-14905 than with those of other Bacillus species (Table 2). About 3,716 CDSs (77.64% of the total genes predicted in B. sphaericus C3-41) have matches in the gapped genome sequences of Bacillus sp. strain NRRL B-14905, and the average identity of these genes is 91.62%. Bacillus sp. strain NRRL B-14905 is a marine Bacillus species, reported previously to be closely related to B. sphaericus C3-41 according to a phylogenetic tree based on 16S rRNA sequencing, and both of them share similar physiological characteristics, such as strict aerobiosis, lack of sugar metabolism, and the production of round spores (58). Additionally, T. tengcongensis MB4 and C. perfringens strain 13 are more closely related than E. coli K-12 to B. sphaericus C3-41, these bacteria having 42.77%, 41.73%, and 27.48% CDSs, respectively, that match those of B. sphaericus C3-41 (Table 2).
|
View this table: [in a new window] |
TABLE 2. Genome comparisons between B. sphaericus C3-41 and seven other species
|
![]() View larger version (15K): [in a new window] |
FIG. 2. Plots of gene pairs based on genomic location. Each dot indicates a single protein plotted on the 5' end of the coding region in the reference genome and the best match in the query genome. From the top down: comparison of B. sphaericus C3-41 and B. subtilis subsp. strain 168; comparison of B. sphaericus C3-41 and B. anthracis strain Ames; comparison of B. sphaericus C3-41 and T. tengcongensis MB4.
|
![]() View larger version (44K): [in a new window] |
FIG. 3. Venn diagram illustrating the number of putative proteins associated with each organism and the number shared between these organisms. Bsu, B. subtilis strain 168; Bac, Bacillus sp. strain NRRL B-14905; Bsp, B. sphaericus C3-41; n, total number of putative proteins encoded by the organism.
|
The transposase sequences of B. sphaericus C3-41 were submitted to the IS Finder database to search for possible insertion elements. At least seven IS elements were found in the chromosome (named ISBsph3 to ISBsph9), and one copy of ISBsph9 exists in plasmid pBsph (Table 3). ISBsph3, ISBsph4, ISBsph5, ISBsph6, ISBsph7, and ISBsph9 share a number of features with other IS3 family elements, including inverted repeats, similarity between transposases, and the DD-(35)-E-(7)-K or DD-(35)-E-(7)-R motifs, which are highly conserved patterns among ISs (19, 33, 35).
|
View this table: [in a new window] |
TABLE 3. IS elements predicted in genome of B. sphaericus C3-41
|
Carbohydrate metabolism and transport systems. The inability to metabolize carbohydrates is a well-known feature of both B. sphaericus and Bacillus sp. strain NRRL B-14905, but the details of their energy metabolism remain to be investigated. Three CDSs, encoding CcpA (catabolite control protein A), HPr, and Crh (the regulatory paralogue of HPr), were present in the genomes of B. sphaericus C3-41 and Bacillus sp. strain NRRL B-14905. CcpA is a pleiotropic transcriptional regulator that acts as the key factor in the regulation of carbon and nitrogen metabolism, interacting with regulatory sites in the control regions of the regulated operon to either repress or activate transcription (8, 44, 69, 71). An alignment of CcpA sequences among B. sphaericus C3-41 (Bsph_4200) and 50 other bacterial isolates was performed (data not shown). Despite the variability among different species, the phylogenetic relationships among the 51 samples revealed sequence conservation of this pleiotropic transcriptional regulator in closely related species. B. sphaericus was grouped with the other Bacillus species and classified into a clade with Bacillus sp. strain NRRL B-14905 (see Fig. S1 in the supplemental material). In B. sphaericus C3-41, 15 examples of the TGWAANCGNTNWCA consensus, a cis-acting palindromic sequence called cre (catabolite-responsive element), which might be the regulatory binding sites of CcpA (69), were found throughout the genome where they may influence genes encoding proteins involved in amino acid metabolism, transport, and transcriptional regulators. This number is far less than the numbers in B. subtilis and other low-GC, gram-positive bacterial strains (8, 30, 44, 61).
The function of HPr, which acts as a cofactor for CcpA, is to participate in the phosphotransferase system (PTS)-catalyzed transport and phosphorylation of carbohydrates (15, 21, 30, 61). The presence in B. sphaericus C3-41 of the conserved enzyme I (EI)- and of HPr-encoding genes, such as Bsph_2351, Bsph_2352, and Bsph_0434, which might be involved in the phosphoenolpyruvate-dependent protein phosphorylation chain, was predicted. The direct and specific interaction of CcpA and HPr-Ser46-P or HPr-His15-P has been demonstrated previously and is assumed to provide a direct link between glycolytic activity of the cell and Cre binding by CcpA (15). The predicted HPr of B. sphaericus C3-41 (i.e., Bsph_2351) might be phosphorylated by the EI (PtsI) encoded by Bsph_2352 at His-15 or phosphorylated at a regulatory serine (Ser-46) by ATP and the HPr kinase encoded by Bsph_0434, as occurs in the phosphoenolpyruvate-dependent PTS system of B. subtilis (16, 22, 23, 53), and the phosphoryl group might be transferred to the sugar-specific EIIA binding proteins. This ATP-dependent phosphorylation might regulate the induction and carbon catabolite repression of some catabolic genes, as described previously (55).
However, with the exception of the sugar-binding protein EII that is specific for N-acetylglucosamine, no other sugar-binding protein was identified in B. sphaericus C3-41. Bacterial genome comparison analysis revealed that B. sphaericus C3-41 lacks many PTS systems, such as glucose- or fructose-specific PTS enzyme components. Compared with B. subtilis strain 168 (25 PTS systems) and B. anthracis strain Ames (19 PTS systems), B. sphaericus C3-41 has fewer PTS systems (only 9), and these are probably specific for the transport of N-acetylglucosamine, cellobiose, and an unknown pentitol, according to their KEGG orthologs (http://www.genome.jp/kegg/). Furthermore, ortholog analysis indicated that some ABC sugar transporters functioning in sugar binding and transport are present in the other six species but absent from B. sphaericus C3-41 and Bacillus sp. strain NRRL B-14905, both of which may, therefore, be defective in the transport of sugars.
The KEGG pathways were compared between B. sphaericus C3-41, B. subtilis strain 168, and the B. cereus group strains. The results indicated that at least glucose-6-phosphate isomerase (Pgi), phosphomannose isomerase (ManB), PTS system glucose-specific EII, and fructose-specific EII components were absent from B. sphaericus C3-41 and Bacillus sp. strain NRRL B-14905. Furthermore, both B. sphaericus C3-41 and Bacillus sp. strain NRRL B-14905 lack an additional 17 enzymes involved in carbohydrate metabolism in B. subtilis (data not shown). The absence of a pgi gene from B. sphaericus was reported previously by other authors (28), and the introduction of a heterologous pgi gene into this bacterium was able to restore polysaccharide utilization in vitro but not in vivo (B. Han, unpublished data). Thus, the absence of key metabolizing enzymes and a sugar transport system may explain the metabolic inactivity toward glucose, fructose, and most polysaccharides.
In contrast, a fragment of the N-acetylglucosamine utilization operon present in B. sphaericus C3-41 (Bsph_2343-Bsph_2352), including genes encoding a GntR family transcriptional regulator, three ABC transporters, N-acetylglucosamine-6-phosphate deacetylase (NagA), glucosamine-6-phosphate deaminase (NagB), HPr, and PtsI, might be involved in the whole pathway for N-acetylglucosamine metabolism and transport, which supports previous indications that B. sphaericus can degrade N-acetylglucosamine (4). It is interesting to note the presence of another N-acetylglucosamine utilization operon (Bsph_3892-Bsph_3899). The nagA gene in this region contains several frame shifts, causing a premature stop, and might be a pseudogene (Fig. 4).
![]() View larger version (11K): [in a new window] |
FIG. 4. CDSs probably involved in N-acetylglucosamine metabolism. From left to right: (a) Bsph_2343, transcriptional regulator (GntR family); Bsph_2344-2346, ABC transporters; Bsph_2347, PTS system N-acetylglucosamine-specific EIIC component (NagE); Bsph_2348, N-acetylglucosamine-6-phosphate deacetylase (NagA); Bsph_2349, hypothetical protein; Bsph_2350, glucosamine-6-phosphate deaminase (NagB); Bsph_2351, HPr (PtsH); and Bsph_2352 (PtsI); and (b) Bsph_3892-Bsph_3894, PTS system EIIC/A/B components; Bsph_3895, hypothetical protein; Bsph_3896: HPr (PtsH); Bsph_3897, hypothetical protein; Bsph_3898, transcriptional regulator (GntR family); and Bsph_3899, similar to NagA but containing several frame shifts. The possible transcriptional regulators are indicated by gray arrows, ABC transporters by arrows with dots, CDSs involved in sugar PTS systems by hatched arrows, hypothetical proteins by open arrows, and key enzymes in N-acetylglucosamine metabolism by dark arrows.
|
A fragment of a putative ethanolamine utilization operon consisting of 13 CDSs, including eutA, eutB, eutE, eutH, eutL, eutM, eutP, and eutS, was identified in B. sphaericus C3-41 (Bsph_2095-Bsph_2108). Similar operons are also found in Bacillus sp. strain NRRL B-14905, C. perfringens strain 13, and E. coli K-12, but not in the other five genomes, those of B. subtilis strain 168, B. anthracis strain Ames, B. thuringiensis serovar Konkukian strain 97-27, B. cereus ATCC 14579, and T. tengcongensis MB4, in our comparison. It has been proposed that a metabolic pathway for ethanolamine utilization may be an important survival strategy against the constant famine that microorganisms face in nature (20, 43). In addition, ethanolamine found in the mammalian gastrointestinal tract may present an important alternative source of nitrogen and carbon for bacteria living in the gut. Therefore, like many enterobacteria, such as E. coli K-12, B. sphaericus C3-41 and Bacillus sp. strain NRRL B-14905 may use ethanolamine as a source of both carbon and nitrogen (57).
In contrast to the reduced number of sugar-specific phosphoenolpyruvate-dependent PTS system genes, the genome of B. sphaericus C3-41 harbors abundant CDSs for ABC transporters (253 CDSs in total). Among them, 99 transporters appear to be involved in amino acid transport and metabolism in B. sphaericus C3-41, compared to only 32 in B. subtilis strain 168. Among the ABC peptide transporters in B. sphaericus C3-41, there are 32 putative ATP-binding proteins and 46 permeases. Only seven ABC transporters appear to be used in polysaccharide or sugar transport and metabolism, including Bsph_1250, Bsph_0763, Bsph_1252, Bsph_0765, Bsph_1251, Bsph_1260, and Bsph_0764. Furthermore, B. sphaericus C3-41 has more peptidases and proteases than B. subtilis strain 168 (93 versus 30 and 75 versus 24, respectively). Among these is Bsph_0341, which encodes sphaericase (also known as sfericase), the enzyme responsible for the degradation of the Mtx1 mosquitocidal toxin in B. sphaericus (68). In addition, the prediction of four amino acid efflux systems and 17 other transport systems involved in amino acid transport and metabolism implies that B. sphaericus C3-41 should be able to carry out active scavenging and secretion processes. The inability to metabolize carbohydrates and the abundant presence of protein-metabolizing systems suggests that the ancestor of B. sphaericus may have an animal- or insect-associated origin.
Structural component, S layer, and membrane proteins. Another interesting similarity between B. sphaericus C3-41 and Bacillus sp. strain NRRL B-14905 that may affect the phenotype is the presence of 22 CDSs probably functioning as surface (S) layer proteins or S layer homologs and of 17 membrane proteins. The counterparts of the S layer and membrane proteins in B. sphaericus C3-41 and Bacillus sp. strain NRRL B-14905 were not found in B. subtilis strain 168. These membrane- or S layer-associated proteins may be related to the marine environment of Bacillus sp. strain NRRL B-14905 or the multiple environmental sources of bacteria designated as B. sphaericus, including soil, aquatic habitats, and even waste piles from uranium mines (49). In the latter case, the complex of S layer proteins appears to be important in heavy metal sequestration by B. sphaericus strains, although it is not clear whether such strains belong to the same group (IIA) of B. sphaericus strains as C3-41.
Recently it was proposed that B. fusiformis and B. sphaericus be reclassified into the genus of Lysinibacillus (2). We suggest that Bacillus sp. strain NRRL B-14905 also should be classified into one genus with B. fusiformis and B. sphaericus. This is also in accordance with the phylogenetic relationship among the three Bacillus species based on 16S rRNA genes (58).
Sporulation and germination. As with other spore-forming bacteria, the spore formation of B. sphaericus occurs in response to nutrient limitation in the environment. A list of protein families related to sporulation and germination according to ortholog/paralog analysis of B. sphaericus C3-41 and seven other species was compiled (see Table S2 in the supplemental material). The sporulation of B. sphaericus C3-41 initiated by a deficiency in energy might be linked to the expression of several genes, probably functioning as sporulation initiation proteins or regulators of genetic competence, such as Bsph_1217. Four CDSs, including two AbrB genes (Bsph_0057 and Bsph_113, present in the chromosome and plasmid, respectively), Bsph_0076, and Bsph_2825, might regulate the developmental pathways of spores and biofilms in B. sphaericus C3-41, according to their predicted function. The predicted CDSs encoding spore coat proteins of B. sphaericus C3-41 are classified into 13 protein families according to their homologies. The 13 protein families are present in all five Bacillus species included in this study. For example, many of the B. subtilis strain 168 spore coat proteins, including CotA, CotE, CotY, CotJ, YhcQ, YabP, YabQ, YkuD, and SspF, are included in the 13 protein families. Fifty-three CDSs predicted in the chromosome of B. sphaericus C3-41, whose products are classified into 13 protein families according to ortholog analysis, are related to spore germination. Most of these CDSs can be classified into the GerC, GerK, GerH, GerQ, and GerX families, according to homology to these operons. Seventeen CDSs probably associated with sporulation and germination, such as GerP, can be found in other Bacillus species but are absent from B. sphaericus C3-41 and Bacillus sp. strain NRRL B-14905. Moreover, gene orthologs and COG analysis show that at least 64 possible CDSs in B. sphaericus C3-41 might be involved in the regulation and posttranslational modification of the sporulation process and assembly of the spore coat and exosporium (COG data are not shown).
However, sporulation and germination are complex, multistage events and a number of genes involved in transcription, signal transduction, posttranslational modification, and other functions may also be involved in these processes, including the assembly of the spore coat and exosporium. Thus, many CDSs not included in Table S2 in the supplemental material and some hypothetical proteins with as-yet-unknown functions might also be linked to the sporulation and germination processes, and the resolution of this issue will require further proteomic analysis.
Evolution of mosquitocidal toxin genes. Unlike other bacterial pathogens, whose virulence genes are assembled within putative pathogenicity islands, the insecticidal toxins of B. sphaericus C3-41 are widely distributed around the chromosome. Comparison of the region encoding the mtx1 genes (64) in strain 2297 (GenBank accession number AB126007) and strain SSII-1 (GenBank accession number M60446) with the equivalent region from B. sphaericus C3-41 (nucleotides 1,141,180 to 1,138,569) reveals that the mtx1 gene in C3-41 might be a pseudogene with a frame shift (Bsph_1076). The mtx1 gene is poorly expressed in B. sphaericus and has a repressor binding site between its putative promoter and its ribosome binding site (64). The mtx2 toxin gene (Bsph_1071) (previously described by Thanabalu and Porter) is located close to the mtx1 toxin gene (65). The Mtx2 protein is known to be related to the Mtx3 toxin of B. sphaericus (39), which is present in the C3-41 genome as Bsph_2822. It is interesting to note a further open reading frame (ORF), Bsph_2821, located upstream of mtx3 that has homology to both mtx2 and mtx3 but appears to be a pseudogene.
Our results indicated that the mtx1, mtx2, and mtx3 genes of B. sphaericus C3-41 might have some direct relationship to mobile genetic elements. Two insertion elements that were named ISBsph4 and ISBsph5, located within the mtx1-mtx2 cluster region, were found (Fig. 5A). orf2 and orf3 of the mtx2 distal insertion element appear to be a transposase disrupted into two nonfunctional genes (i.e., Bpsh_1069 and Bpsh_1070) by a premature stop, which suggests that this element may have been integrated for some time. This notion may be supported by the fact that, of the known B. sphaericus toxins, the Mtx2 proteins are the least conserved, with interstrain variations having significant effects on host range (11). There is also an insertion element (ISBsph7) upstream of Bsph_2821 and mtx3 (Bsph_2822) (Fig. 5B). This implies that these mobile genetic elements might play an important role in Mtx toxin evolution. With the exception of mtx1 (Bsph_1076), the other predicted mtx toxin genes, mtx2 (Bsph_1071) and mtx3 (Bsph_2822), have close orthologs in Bacillus sp. strain NRRL B-14905.
![]() View larger version (8K): [in a new window] |
FIG. 5. Mosquitocidal toxin gene clusters and relationship with mobile genetic elements. (a) Gene cluster from Bsph_1068 to Bsph_1080. (b) Gene cluster from Bsph_2814 to Bsph_2825. Gray arrows, transposases; arrows with lattice, mosquitocidal toxins; dark arrows, possible transcriptional regulators; open arrows, hypothetical proteins. Insertion elements and terminal inverted-repeat sequences (IR) are marked in the figures. The region encoding the possible Mtx1 contains a frame shift. pseudo, pseudogene.
|
35-kb duplicate fragment present both in the chromosome and in the large plasmid. In total, 21 CDSs were predicted within this duplicated fragment (Fig. 6). Apart from binA and binB (Bsph_3193/Bsph_155 and Bsph_3192/Bsph_154, respectively), the nearby Bsph_3195/Bsph_157 appears to encode a further protein with homology to the Mtx2/3 toxins. Genes encoding a putative peptide synthase and a chitin-binding protein are located in this region (Bsph_3196/Bsph_158 and Bsph_3198/Bsph_160). Two CDSs for phage integrase family proteins were observed upstream of the 35-kb fragment. Similar to those of the mtx cluster, a putative transposase gene (Bsph_3188) and insertion element (ISBsph9) are also present within the
35-kb duplicate fragment, suggesting that the pathogenicity locus of B. sphaericus C3-41 may have phage or transposon origins. The GerXB-XA-XC gene cluster is found upstream of the putative transposase gene. This is exactly in accordance with B. anthracis plasmid pXO1 that also has a GerXB-XA-XC gene cluster upstream of a transposase (48), where it appears to influence the germination rate of B. anthracis spores (26, 27). Comparison of the region following the binary toxin genes in B. sphaericus strains C3-41 and strain 1593 (GenBank accession number AJ224477) with the equivalent region from strain 2297 (GenBank accession number AJ224478) reveals that the strain 2297 sequence contains a probable transposase pseudogene. The transposase homology begins at a small CDS capable of encoding a peptide of only 14 amino acids, but the homology continues in various reading frames, with at least five frame shifts, for 1,110 nucleotides. In strains C3-41 and 1593, however, a 1,554-bp insertion is seen at the end of the small CDS of strain 2297. Several repeat sequences characteristic of transposition/rearrangement events are found in the above transposase regions. Downstream of the initial CDS in strain 2297, there is a direct repeat (TAAAGAATATAA) separated by 25 bases within the disrupted transposase. This feature is maintained in strain C3-41, downstream of the inserted sequence. In the latter strain, there is a long direct-repeat sequence (AAATAAAGTCgtGAtGTTTATAAAAAAGaTGCGAtgTTTaAAATAAAGTCacGAcGTTTATAAAAAAGcTGCGAag cTTT; centered around the a nucleotide) beginning just 11 nucleotides from the 5' end of the inserted sequence and a smaller inverted-repeat sequence 62 nucleotides upstream of the 3' end of the insertion (ATAGAAAAAGCGTGTTTTGAtcAAtctTTttTCAAAACACGCTTTTTCTAT; centered around the tct nucleotides) (Fig. 6).
![]() View larger version (24K): [in a new window] |
FIG. 6. CDSs of the 35-kb duplicates in B. sphaericus C3-41 and comparisons of the transposase homologs downstream from the binary toxins in B. sphaericus strains C3-41, 1593, and 2297. The 35-kb duplicates presented both in the chromosome and in the large plasmid contain 21 CDSs. Binary toxin BinA and BinB, CDSs for phage integrase family protein, insertion elements, the GerXB-XA-XC operon, a putative peptide synthase, and a chitin-binding protein are indicated. Some other, hypothetical proteins are indicated by open arrows. The transposase homolog of B. sphaericus 2297 begins at a small CDS capable of encoding a peptide of only 14 amino acids, but the homolog continues in various reading frames for 1,110 nucleotides. In strains C3-41 and 1593, a 1,554-bp insertion is located at the end of the small CDS of strain 2297. Downstream of the initial CDS in strain 2297, there is a direct repeat (TAAAGAATATAA) separated by 25 bases within the disrupted transposase. This feature is maintained downstream of the inserted sequence in strains 1593 and C3-41. The two opposite-facing triangles represent the long, direct-repeat sequence at the 3' end of the inserted sequence and the shorter, inverted-repeat sequence upstream of the 3' end of the IS, respectively.
|
35-kb fragment may be the remnant of the phage infection that may have happened only in B. sphaericus, and not in the other seven species listed in this study. The plasmid location, putative phage infection, and transposition may all provide explanations of the fact that bin genes are not present in all B. sphaericus strains, while the bin genes of different serotypes and isolates show extremely high levels of similarity (73, 74). A further insecticidal toxin from B. sphaericus was recently reported to be active against the German cockroach Blattela germanica (46). The gene for this sphaericolysin is identified in the B. sphaericus C3-41 genome as Bsph_4094 and has high similarity with the gene encoding cereolysin O, which is present in many Bacillus species. In addition, the chromosome of B. sphaericus C3-41 contains several homologs of genes known to be involved in the pathogenicity of other gram-positive bacteria, such as B. cereus and Listeria monocytogenes. These include haemolysin III (25) or putative haemolysin CDSs (Bsph_2727, Bsph_3508, Bsph_3583, and Bsph_3651), internalin-like genes (Bsph_4728), sigma factor B (Bsph_4295), and P60 extracellular protease (Bsph_1123) (14). These shared virulence genes might be part of the common arsenal associated with the pathogenicity of some gram-positive bacteria.
Duplication in genome sequences.
The predicted proteins of B. sphaericus C3-41 were used for finding segmental duplication and tandem duplication. The results of BLAST analysis revealed a large number of predicted CDSs with similar matches within the chromosome of B. sphaericus C3-41 (BLAST E value, <10–5), indicating gene duplication (Fig. 2A). This includes single-gene duplication with the two genes widely separated, duplication of neighboring genes, fragment duplication with many CDSs, and even the duplication of almost one-third to one-half of the circular chromosome. The duplication of the chromosome itself suggested that the chromosome of B. sphaericus C3-41 might have evolved by a drastic doubling in genome size. Similar phenomena were observed in other Bacillus species, such as B. anthracis strain Ames (GenBank accession number NC_003997) (data not shown). On the other hand, by genome comparison two main long-fragment syntenies (about 460 kb and 760 kb, respectively) were observed between B. sphaericus C3-41, B. subtilis strain 168, and the three B. cereus group strains, as we described before. Thus, we made an assumption that the Bacillus species shared a common "chromosome backbone" in a very ancient stage, after which a duplication process happened, the chromosome size increased, and then variation developed as a consequence of complex and dynamic evolutionary mechanisms, finally resulting in the divergence of species. In addition, the presence of the
35-kb duplication between the chromosome and the large plasmid pBsph, containing the CDSs of binary toxin, transposase, and phage integrase family proteins, was confirmed by analyses, including the average sequencing coverage of the chromosome (8.9 times), plasmid (21.4 times), and 35-kb duplicate fragment (25 times), respectively, multiple PCRs, single-nucleotide polymorphism analysis, and the coding bias (data not shown). We suggest that this large fragment may be the remnant of a phage infection and, thus, that bin genes are only present in a subset of B. sphaericus strains and show extremely high levels of similarity among different serotypes and isolates.
|
|
|---|
Based on overall nucleotide and protein similarities, B. sphaericus C3-41 is most similar to a marine bacterial species, Bacillus sp. strain NRRL B-14905. The similarities of these two species, including mobile genetic elements, membrane-associated proteins, and metabolic and transport systems, provide solid data to explain the common features of B. sphaericus C3-41 and Bacillus sp. strain NRRL B-14905, such as their inability to utilize polysaccharide, and suggest that these two species are a biologically and phylogenetically divergent group whose members have developed to adapt to particular environmental conditions over evolutionary time. This is in accordance with previous suggestions that B. sphaericus has some special features similar to those of some archaic organisms and bacteria that can grow in extreme environments (9, 49) and is, therefore, distinct from most Bacillus species (45). It was recently proposed that B. fusiformis and B. sphaericus be reclassified into the genus of Lysinibacillus, with the renaming of B. sphaericus as Lysinibacillus sphaericus (2). We would propose from the comparative analysis in our study that Bacillus sp. NRRL B-14905 should be classified into one genus with B. fusiformis and B. sphaericus.
On the other hand, despite the approximately 80% identity of the B. sphaericus C3-41 and Bacillus sp. strain NRRL B-14905 genomes, there are many CDSs that are individually unique in B. sphaericus C3-41. These include prophage and IS elements, sporulation- and germination-related proteins, and virulence factors. Some unique genes, such as bin genes and mtx1, might have been obtained by insertion through mobile genetic elements. This means of gene acquisition may explain why these genes are only present in some B. sphaericus strains and show extremely high levels of similarity among different serotypes and isolates.
Together, the similarities and differences may hint at overlapping but nonidentical environmental and ecological niches for the taxa of these species. In performing a comparative analysis of the synteny and duplication among eight species, we postulated that, although B. sphaericus C3-41 is quite different from other Bacillus species, it still shares a common "chromosome backbone" with them. The variation of the chromosomes might be due to duplication and a complex and dynamic evolutionary process producing the current bacterial species.
This project was supported by grant number KSCX2-SW-315 from the Chinese Academy of Sciences, 973 project number 2003CB114201, and grant number 30470037 from NFSC, China, and with financial assistance from Valent Biosciences Corp.
Published ahead of print on 22 February 2008. ![]()
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
|
|
|---|
ska. 1999. Interactions of the Streptomyces lividans initiator protein DnaA with its target. Eur. J. Biochem. 260:325-335.[Medline]This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»