Previous Article | Next Article ![]()
Journal of Bacteriology, April 2006, p. 2533-2542, Vol. 188, No. 7
0021-9193/06/$08.00+0 doi:10.1128/JB.188.7.2533-2542.2006
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Eric Coissac,2
Nathalie Vachiery,1
Jacques Demaille,3 and
Dominique Martinez1
CIRAD-Emvt, TA30/G, Campus International de Baillarguet, 34398 Montpellier Cedex 5, France,1 Inria Rhône-AlpesProjet HELIX, 655 Av. de l'Europe, 38330 Montbonnot-Saint Martin, France,2 Centre de Séquençage Génomique, IGH-CNRS-UPR 1142, 141 rue de la Cardonille, 34396 Montpellier Cedex 5, France,3 Swiss Institute of Bioinformatics, Swiss-Prot Group, 1 rue Michel, Servet, CH-1211 Geneva 4, Switzerland4
Received 31 August 2005/ Accepted 9 January 2006
|
|
|---|
|
|
|---|
-proteobacterium, is a small, gram-negative, aerobic, obligate intracellular pathogen of endothelial cells that can cause up to 90% mortality in susceptible animals. Heartwater is responsible for great economic losses in Africa (48) but also represents a threat to the American mainland owing to the presence of potentially transmitting ticks (8, 17). The control strategy is based on vector eradication and immunization, a strategy possible only on islands, where the incoming flow of ticks is highly limited (53). Vaccine development therefore remains critical. The only available commercial vaccine relies on the risky and inappropriate infection of animals with infected blood followed by treatment with antibiotics (12). Attenuated and DNA vaccines were developed (20, 21, 33, 71), but they induce long-lasting protection only against homologous virulent strains while conferring limited protection against heterologous strains in the field. The main difficulty in developing efficient vaccines is the simultaneous field occurrence of various genotypes in limited geographical areas (39). Similarly, serodiagnosis of heartwater has long been hampered by a lack of specificity and sensitivity (45), although these have been greatly improved recently (34, 36, 65). Additional diagnostic targets, protective proteins, and potential drug targets are therefore still needed. A key step toward this goal is better understanding of how genomes evolve in strains of differing phenotypes and which mechanisms are involved, in order to distinguish modifications linked to adaptation and plasticity from those directly involved in differing traits. Adaptation to the intracellular lifestyle is accompanied in the early stages by massive gene losses, creation of pseudogenes, and diminution of the genome size (47). The important increase in mobile genetic elements is especially characteristic of the early stage of intracellular parasitism. Following initial reduction, the genome is subjected to opposing evolutionary forces, to generate diversity while eliminating useless sequences in the process of reaching stasis (37, 44, 46, 47, 61). Post-establishment evolution is strongly influenced by the host, leading to specific evolutionary processes in different host-restricted bacterial lineages. The Rickettsiales, which specialized in intracellular parasitism 700 million years ago (31), have diverged into several lineagesRickettsia, Wolbachia, Anaplasma, and Ehrlichiadisplaying distinctive genomic features (23, 66). While Rickettsia spp. and the Wolbachia pipientis endosymbiont of Drosophila melanogaster (wMel) harbor a large amount of selfish DNA (50, 51, 67), Anaplasma, Ehrlichia, and the W. pipientis endosymbiont of Brugia malayi (wBm) are devoid of insertion sequences (16, 22, 27). Ehrlichia spp. display multiple tandem repeats associated with pseudogenes or in intergenic regions, whereas Anaplasma spp. do not (16, 22).
We report here the comparative genomic analysis of the complete genomes of three strains of E. ruminantium: the Gardel strain (referred to here as Erga), a Welgevonden strain (referred to here as Erwe), and the recently sequenced South African Welgevonden strain (22) (referred to here as Erwo), from which Erwe originates. The Welgevonden genotype is infective and pathogenic to mice, whereas Erga is not. Erga illustrates the separation of the Gardel and Welgevonden genotypes before or at the time of the introduction of E. ruminantium in the Caribbean, whereas Erwe represents evolution from the Erwo strain through 14 passages in a different cell environment. Comparative genomic analysis of these three strains therefore allows for the analysis of genome evolution at different time scales. We first report strong gene order conservation between the Ehrlichia and Anaplasma marginale genomes. We also report the occurrence of differential protein-encoding sequence (CDS) truncations and the presence of several strain-specific genes. A detailed gene-to-gene alignment of the three genomes also allows for assessment of mutational pressure. Finally, we report the description of a process of genome contraction/expansion targeted at tandem repeats in noncoding regions and based on the addition or removal of ca. 150-bp tandem units. This process is specific to E. ruminantium and is not observed in the other Rickettsiales (including A. marginale).
|
|
|---|
DNA extraction, cloning, and sequencing. Elementary bodies were purified from culture supernatant, as previously described (40), resuspended in 350 µl of phosphate-buffered saline containing 0.36 µg/ml of DNase to remove contaminating host cell DNA, and incubated for 90 min at 37°C prior to the addition of 25 mM EDTA (41). Extraction of DNA from elementary bodies was done as previously described (54). Contamination with host DNA was checked by dot blot hybridization using bovine DNA as a positive control and probe. Purified DNA was broken by sonication to generate fragments of differing sizes. After filling of the ends with Klenow polymerase, DNA fragments ranging from 0.5 kb to 4 kb were separated in a 0.8% agarose gel and collected after Gelase (Epicenter) digestion of a cut-out agarose band. Blunt-end fragments were inserted into pBluescript II KS (Stratagene) digested with EcoRV and dephosphorylated. Ligation was performed with the Fast-Link DNA ligation kit (Epicenter), and competent Escherichia coli DH10B cells were transformed prior to colony isolation on LB agar-ampicillin-X-Gal (5-bromo-4-chloro-3-indolyl-ß-D-galactopyranoside)-IPTG (isopropyl-ß-D-thiogalactopyranoside). About 15,000 clones were isolated for both Erga and Erwe. Inserts were sequenced on both strands with universal forward and reverse M13 primers and the ET DYEnamic terminator kit (Amersham). Sequences were obtained with ABI 373 and ABI 377 automated sequencers (Applied Biosystems). Data were analyzed and contigs were assembled by the Phred-Phrap and Consed software packages (http://www.genome.washington.edu). Gaps were filled in through primer-directed sequencing with custom-made primers. A total of about 20,000 raw sequence runs were generated and analyzed for each E. ruminantium strain to generate a full-length consensus sequence with 7x coverage. Sequences of all mutated CDSs were extensively checked to eliminate the possibility of sequencing errors.
Gene prediction and annotation tools.
The Erga and Erwe genomes were both annotated as described below. The Erwo genome was independently annotated by Collins et al. (22). The annotation process of Erga and Erwe was mainly conducted with the integrated computer environment GenoStar (25). CDSs were identified by using Markov chain models (15). First, a five-order periodic Markov model for coding sequences was trained on long open reading frames (
1,000 nucleotides) for each strain. The model was then applied to the genomes, using 120 bp as the cutoff value for CDS length and a probability threshold (P) of 0.80 for CDSs of lengths less than 360 bp. To detect frameshifts due to sequencing errors, each CDS of one strain was then checked against all CDSs of the other strain with BlastP (3). A pair of CDSs exhibiting more than 70% amino acid sequence identity and a size difference of less than 20% was considered a pair of homologous genes. A pair of homologous CDSs corresponding to a bidirectional best hit (63) was considered a pair of orthologous CDSs. All other cases (CDSs of different lengths or without any detected homolog in the other strain) were examined manually for possible frameshifts by aligning the corresponding genomic regions using dynamic programming and by close inspection of gel reads. Protein similarity was assessed with BlastP (3) against a database resulting from the nonredundant concatenation of SwissProt (release 41) and the proteomes of two completely sequenced rickettsiae: R. conorii (7) and R. prowazekii (50). Inferred functions and gene names were checked manually; COG (for "Clusters of Orthologous Groups") and EC numbers were assigned by a similar procedure on databases of bacterial proteins annotated with COG (62) and EC (14) numbers. tRNAs were located with fastRNA (68), and rRNAs were detected with BlastN (3) in a database of known bacterial RNAs. Genome GC-skew analysis was performed as previously described (28), and an arbitrary origin of coordinates was assigned at the putative origin of replication (no clear DNA box could be located). Codon and amino acid usage were analyzed by correspondence analysis (52, 55). Selection pressure on orthologous genes was assessed by counting synonymous and nonsynonymous nucleotide substitutions according to the methods of Nei and Gojobori (49). The significance of synonymous versus nonsynonymous substitution rates was determined by the Fisher exact test with a Bonferroni correction for multiple tests at an effective P value of 0.05 (69). Dispersed and tandem repeats were detected with two complementary programs. Repseek (2) was used for detecting dispersed repeats with a minimal seed length given by the Karlin-Ost formula (35) at a P value of 103. Tandem Repeat Finder (TRF) (10) was used for detecting tandem repeats with default parameters. TRF specifically targets tandem repeats, whereas Repseek can also find separated repeats but is less accurate on tandem repeat boundaries. Therefore, overlapping repeats detected by both programs were considered tandem repeats and only the TRF result was kept. When TRF identified several tandems at the same location, only the longest one was considered. Dispersed repeats were classified into four categories according to the relative orientation and distance between the two copies. The "direct" and "reverse" classes correspond to copies in the same and reverse orientations, respectively. The "close" and "distant" classes correspond to copies less than 1 kb away and more than 1 kb away, respectively.
Computation of size plasticity regions.
A pair of orthologous noncoding regions (ONCR) between Erga and Erwe was defined as a pair of intergenic regions flanked by two pairs of orthologous CDSs with no intervening gene (CDS or RNA) in between. ONCR of less than 10 bp were removed. Each ONCR was additionally associated with a pair of orthologous coding regions (OCR), which were arbitrarily chosen as the pair of orthologous CDSs located upstream from the ONCR. The observed expansion/contraction of a region (
obs) (i.e., either an OCNR or an OCR) is defined as follows:
obs = length of region (Erwe) length of region (Erga). Three classes of expansion were considered: (i) if 5 <
obs < 5, then the expansion class was designated "0," indicating that the region is stable; (ii) if
obs > 5, then the expansion class was designated "+," indicating that the genome of Erwe expands with respect to that of Erga; and (iii) if
obs < 5, then the expansion class was designated "," indicating that the genome of Erwe contracts with respect to that of Erga.
Expanding/contracting regions (i.e., associated with either the "+" or "" class) are referred to as "size plasticity regions." Investigation of the correlation between tandem repeats and the size plasticity of noncoding regions was conducted by surveying the presence of tandem repeats in each ONCR of Erga/Erwe. A tandem repeat was considered to be present in an OCNR if more than half of its length lies within the ONCR. If more than one repeat was found within the same ONCR, then only the largest one was kept. ONCR were classified into four categories ("Tandem_None," "None_Tandem," "Tandem_Tandem," and "None_None") depending on whether the ONCR contained a tandem repeat ("Tandem") or not ("None") in Erga and Erwe, respectively (for instance, the "Tandem_None" category corresponds to the case in which an ONCR displays a tandem repeat in Erga and no tandem repeat in Erwe). The theoretical expansion/contraction amount due to the tandem repeat (
theo) was calculated as follows:
theo = [(period_Erga + period_Erwe)/2] x (nb_copy_Erwe nb_copy_Erga), where "period" and "nb_copy" are the period and number of copies of the tandem repeats, respectively (both are set to 0 if no tandem is present).
Accession numbers. The sequences of the complete genomes of Erga and Erwe have been deposited in the EMBL databank under accession numbers CR925677 and CR925678, respectively.
|
|
|---|
270 bp; quantile 75,
1 kb) (Fig. 1). |
View this table: [in a new window] |
TABLE 1. Rickettsiales genome features
|
![]() View larger version (26K): [in a new window] |
FIG. 1. Quantiles of intergenic length for nine completely sequenced Rickettsiales and E. coli. Quantile 75 is the value that 75% of the intergenic lengths fall below. The median is quantile 50. Rickettsiales tend to exhibit longer intergenic sequences than typical bacteria (represented here by E. coli), but this effect is more pronounced in E. ruminantium. Abbreviations: Escol, E. coli; wMel, W. pipientis strain wMel; Anmar, A. marginale; Ricon, R. conorii; Ripro, R. prowazekii; wBm, Wolbachia sp. (subsp. B. malayi strain TRS); Erga, E. ruminantium strain Gardel; Erwe, E. ruminantium strain Erwe; Erwo, E. ruminantium strain Erwo.
|
Compositional biases. The cumulative GC-skew profile (28) obtained for E. ruminantium (data not shown) exhibits a strong leading/lagging compositional bias, similar to that observed with spirochetes (57). Similarly, the codon usage of E. ruminantium significantly differs between the two replication strands. Correspondence analysis of codon usage clearly shows two separated gene clusters (see Fig. S2a in the supplemental material) associated with the leading and lagging strands. This analysis also shows the presence of an additional cluster associated with the second factorial axis (see Fig. S2a in the supplemental material). A second correspondence analysis of amino acid usage (see Fig. S2b in the supplemental material) shows that the previously identified cluster corresponds to a group of genes coding for proteins with biased amino acid composition. Projection of characters (i.e., amino acids) further shows that this bias is directed toward an enrichment of large, hydrophobic amino acids: phenylalanine and tryptophan. Eighty-one CDSs are present in this cluster, of which 42 have been assigned to known membrane proteins (see Table S1 in the supplemental material). The 39 unknown proteins from this cluster might therefore be also associated with membrane proteins.
Comparative analysis of mutational trends. The high colinearity between the three strains of E. ruminantium allowed for detailed, gene-by-gene comparison and identification of differences which may explain host range variation. This resulted in an alignment table (see Table S1 in the supplemental material) composed of 986 rows; 888 rows (90%) correspond to triplets of genes (one gene for each strain), and 86 rows (9%) correspond either to singlets (only one gene is observed [6%]) or doublets (one gene is missing in one strain [3%]). Finally, 12 rows (<1%) correspond to fragmented genes (in-frame stop codon), leading to two or more genes per strain in the row. Out of the 888 triplets, 818 (93%) exhibit sequence identity of 95% or more at the amino acid level. Analysis at the nucleotide level (see Table S1 in the supplemental material) indicates that the Welgevonden strains, Erwe and Erwo, are almost identical with respect to their coding sequences. CDS alignments reveal very few substitutions and almost no deletions. Exceptions are almost exclusively associated with recognized pseudogenes and will be presented later. On the other hand, several genes in Erga and Erwe display a sufficient number of differences to allow for the analysis of selection pressure based on synonymous versus nonsynonymous (S/NS) substitution rates: 181 pairs of orthologs display a significantly larger amount of synonymous substitutions, indicating a strong selection pressure to maintain the protein sequence, whereas only three pairs (i.e., ERGA_CDS_00630/ERWE_CDS_00660, ERGA_CDS_05750/ERWE_CDS_05840, and ERGA_CDS_08580/ERWE_CDS_08680) display a significant amount of nonsynonymous substitutions, indicating putative ongoing pseudogenes or functional changes (see Table S1 in the supplemental material). Two of these CDSs correspond indeed to truncated CDSs (i.e., pseudogenes) in Erwe/Erwo. These three CDSs code for proteins of unknown function. To investigate putative differences in metabolic capabilities between Erga and Erwe/Erwo, genes associated with an EC number were specifically checked (see Table S1 in the supplemental material). Out of a total of 333 rows, 325 (98%) correspond to proteins having more than 95% identity. Six of the eight remaining rows correspond to pseudogenes resulting from duplications, whereas the last two candidates (ERGA_CDS_04370 and ERGA_CDS_05040) display a significant bias toward synonymous substitutions, suggesting that protein function is conserved. Virulence genes and membrane proteins are other candidate targets for investigating host range differences. The vir genes are organized in two separate operons (22, 51) with two paralog genes, i.e., virB4 and virB8, located outside the operons (see Table S1 in the supplemental material). The virB6 and both virB4 genes display a larger number of synonymous substitutions, suggesting that selective pressure is maintaining their functional capabilities. virD4, virB10, and the paralogous virB8 gene also display a large number of substitutions, but their S/NS rates are not sufficiently different to conclude that functional pressure is still acting. The other vir genes are highly similar in all strains. With respect to membrane proteins, cpg1 makes a good candidate, with both high substitution and insertion/deletion rates in Erga and a significant substitution rate in Erwe (see Table S1 in the supplemental material). The other two cpg-related genes (i.e., ERGA_CDS_02490 and ERGA_CDS_02500) also display a significant substitution rate in Erga but not between Erwe and Erwo. All these changes are associated with an unbiased S/NS ratio (see Table S1 in the supplemental material). The cluster of paralogous map1-related genes (from ERGA_CDS_09000 to ERGA_CDS_09170) is another group of likely candidates since they display a large number of substitutions and insertions/deletions between Erga and Erwe/Erwo (see Table S1 in the supplemental material). Only map1 and map1-13 display significant selective pressure toward synonymous substitutions. map1-2 is truncated (see below) in Erga and is therefore an additional candidate to explain host range differences. Interestingly, the map1-1 gene from Erwe differs from those of both Erga and Erwo, which are identical (see Table S1 in the supplemental material).
Comparative analysis of unique CDSs. Fifty-seven unique CDSs are found within the three genomes. These unique CDSs are defined as sequences for which no predicted ortholog is found in the other genome. Twenty-eight unique CDSs are annotated only for Erwe, 7 are annotated only for Erwo, and 22 are annotated only for Erga. Careful examination of the differences between Erwo and Erwe shows that they are due only to different annotation strategies (mostly prediction programs or parameters and definitions of pseudogenes). Therefore, 35 CDSs are specific to the Erwe/Erwo group, whereas 22 are found only in Erga. Only 6 out of these 57 CDSs correspond to major rearrangements in the other genome, i.e., complete or partial gene deletions and extensive mutations (see Table S2 in the supplemental material). The remaining 51 CDSs are unique because of in-frame stop codons in the corresponding sequences in the other strain, making them too short to reach the minimal open reading frame size set for prediction. Out of these 51 unique CDSs, 21 are truncated versions of full-length genes annotated elsewhere in the genome (see Table S2 in the supplemental material), and 5 other CDSs display similarity to known genes not found at full length in E. ruminantium (see Table S2 in the supplemental material). They might therefore be remnants of full-length genes eliminated through deletion or have been inserted through ancient horizontal transfer. The remaining 25 CDSs are unknown.
Comparative analysis of partial or fragmented CDSs. Occurrence of a stop codon may not lead to the complete loss of CDS prediction but to the detection of truncated genes, depending on the size of the remaining fragments. Truncated genes resulting in a single CDS are thereafter designated partial CDSs, whereas those resulting in two or more predicted CDS are designated fragmented CDSs; 29 such truncations were observed (see Table S2 in the supplemental material). Sequences were checked to eliminate sequencing errors. Seven genes are affected in all three genomes but differently, depending on the strain. Only one is known and encodes a putative type IV secreted protein (see Table S2 in the supplemental material).
A total of 18 CDS truncations differentiate the genome of Erga from that of Erwe/Erwo; 8 truncations appear in Erga (i.e., full-length CDSs are present in Erwe/Erwo), and 10 truncations appear in Erwe/Erwo (see Table S2 in the supplemental material). Only two genes have a known function. map1-2 (truncated in Erga) bears a deletion of 48 bases (16 amino acids). This deletion accounts for 80% of the size difference between the map1 clusters of Erga and Erwe/Erwo. Interestingly, map1-2 was shown to have recombined with map1-3 in a subset strain (strain CTVM) of Erga (9). In Erwe/Erwo, the only known gene to be affected is ftsA, which is 63 bp shorter than the Erga ortholog. ftsA codes for a protein involved in cell division. However, this does not seem to affect the ability of Erwe and Erwo to multiply and develop efficiently. The keystone protein FtsZ, involved in the recruitment of cell division proteins, and the other Fts proteins (i.e., FtsH, FtsQ, and FtsY) are present and not truncated.
Finally, four CDS truncations (checked on Erwe chromatograms) also occurred between Erwe and Erwo (see Table S2 in the supplemental material). These CDSs are strictly identical in Erwo and Erga and are therefore specific to Erwe. Three of them have a known function. The first is tufA, coding for one of the elongation factors Tu. The second tuf gene (tufB) is present in all strains as a full-length gene. In E. ruminantium, the tuf operons display the chimeric organization described in Rickettsia spp. (6, 22, 59) with respect to E. coli (Fig. 2). tufA is flanked on one side by rpsG-fusA and by tRNA-Trp-secE-nusG on the other side (Fig. 2). This organization is intermediate between those of E. coli and Rickettsia. Like Rickettsia, E. ruminantium bears the recombination between the tufA and tufB operons, but, like E. coli, it still bears the tufA gene, although fragmented in Erwe (6, 22, 59). The tufB operon of E. ruminantium is identical to those of Rickettsia spp., with tufB surrounded by tRNA-Tyr-tRNA-Gly and rpsJ. Furthermore, these regions are known to be prone to recombination and inversion (1, 6, 32). Interestingly, Wolbachia (wBm) displays a novel intermediate configuration. The tufA operon is identical to that of E. coli, whereas the tufB gene is flanked at its 5' end by only two of the E. coli tRNA genes (tRNA-Tyr and tRNA-Gly) and at its 3' end by tRNA-Trp-secE-nusG. This 3' flanking region is identical to that of the tufA operon in the other Rickettsiales. The wBm secE gene codes for a small protein of 69 amino acids which is part of the prokaryotic protein translocation system. This gene was not detected in the original wBm genome annotation (perhaps due to its small size), but the predicted sequence displays high similarity (>60%) to secE from Ehrlichia, Anaplasma, and Wolbachia (wMel), whereas it does not display similarity to the E. coli secE gene. The second truncated known gene in Erwe is petC, which is a key element in the cytochrome bc1 complex. Despite interruption of both tufA and petC, Erwe remains highly virulent. If tufA is dispensable, as shown in Rickettsia spp. (59), other mechanisms might complement the absence of cytochrome c1. The third known gene to be truncated in Erwe, ftsK, is also a key gene. This gene codes for the cell division protein FtsK, which is also a virulence-related gene in R. prowazekii (30). It displays a 135-bp deletion in its central part. The corresponding region in Erwo and Erga is a tandem repeat of four in-frame copies of 45 bp translated into the 15-amino-acid sequence LSDQDFEDESFADED. Erwe presents only one copy, and the missing segment corresponds exactly to the three other copies. This region has no sequence similarity with any FtsK protein. This deletion represents one of the very rare occurrences of tandem repeats in coding regions.
![]() View larger version (24K): [in a new window] |
FIG. 2. Organization of the tufA and tufB operons in the Rickettsiales and E. coli genomes. The arrangement displayed by E. coli (Escol) is considered the ancestral organization of the tuf genes, whereas an intrachromosomal recombination event led to the shuffled Rickettsiales arrangement. Erga, Erwe, Erwo, A. marginale (Anmar), and wMel display the same organization as the Rickettsia tuf operons, except that tufA is still present (in a split form in Erwe). Wolbachia sp. (wBm) represents a novel intermediate configuration in which the tufA operon is identical to that of E. coli and not to other Rickettsia spp., whereas the organization of tufB is unique (see Discussion). CDSs are represented by arrows (not to scale) textured according to function. tRNA genes are represented by boxes.
|
|
View this table: [in a new window] |
TABLE 2. Distribution of size plasticity classes within coding and noncoding regions
|
|
View this table: [in a new window] |
TABLE 3. Distribution of repeats in complete genomes of Rickettsiales
|
|
View this table: [in a new window] |
TABLE 4. Correlation between type of repeat and size plasticity of ONCR
|
theo (see Materials and Methods), versus the observed one,
obs (Fig. 3b), indicates that both quantities correlate perfectly (correlation coefficient, 0.94). This suggests that the Erga and Erwe genomes evolved, with respect to size, by deletion or addition of a variable number of repeat units of similar size centered on 150 bp. Figure S3, in the supplemental material, displays several cases of deletions and expansions in Erga/Erwe/Erwo and A. marginale. The tandem repeats between birA/gst and recR/znuA illustrate the longer intergenic regions in E. ruminantium compared to A. marginale. These two tandems have periods of 155 bp and 178 bp, respectively, and there is one more copy in Erga than in Erwe/Erwo. The tandem repeat downstream from ubiB illustrates a case of complete tandem (four copies of 221 bp) deletion in the Welgevonden strains associated with a shortening of their intergenic regions; conversely, the tandem repeat downstream of ERGA_CDS_02630 (four copies of 187 bp) illustrates a tandem deletion in the Gardel strain. Figure S4a, in the supplemental material, illustrates a case of tandem expansion in Erwe: the period is 7 bases and Erwo exhibits 35 copies, whereas Erwe bears 64 copies, resulting in a 203-bp expansion [(64 35) x 7] of the Erwe intergenic region. Figure S4b, in the supplemental material, illustrates the converse case: the period is here 219 bp and five copies are present in Erwo, whereas Erwe has lost one copy, resulting in a 219-bp reduction. Although the tandem deletion/expansion process mostly affects noncoding regions, few could be observed within genes, as illustrated in the case of the ftsK gene.
![]() View larger version (11K): [in a new window] |
FIG. 3. Distribution of tandem repeats in intergenic regions. (a) Distribution of the periods of the tandem repeats found in expanding/contracting intergenic regions between E. ruminantium strains Gardel (Erga) and Welgevonden (Erwe). The distribution is clearly bimodal, with one population of short-period tandem repeats ( 12 bp) and a second population of long-period tandem repeats ( 150 bp). (b) Correlation between the observed differences in intergenic size ( obs) and the values calculated by assuming that the differences are due solely to the different numbers of tandem repeat copies ( theo).
|
|
|
|---|
-proteobacterial branch (23, 66). The specific organization in the three genera of the vir genes in two separate operons supports this assumption (22, 51). The similar organization of the tuf operons in Rickettsia, Wolbachia (wMel), E. ruminantium, and A. marginale further indicates that this recombination probably occurred in their common ancestor. The tufA gene was further deleted in Rickettsia (6), whereas in Wolbachia (wMel), Ehrlichia, and Anaplasma, tufA remained, although fragmented in Erwe. The deletion of tufA in Rickettsia spp. (6), along with the virulence of Erwe, also confirms that tufA is dispensable in intracellular parasites, as shown in R. prowazekii (59). The conservation of the vir and tuf operons is relatively surprising, considering that the synteny observed between Ehrlichia and Anaplasma is globally lost with the other Rickettsiales. The evolution of tuf genes in Rickettsia is associated with palindromic elements named RPE (4, 5), but RPE are not found in E. ruminantium or A. marginale, indicating that this mechanism was not involved in the recombination and was probably inherited by Rickettsia after separation from the ancestral Ehrlichia/Anaplasma group. Another feature to be underlined is the unusually long intergenic regions observed in E. ruminantium. Although there is a general trend toward longer intergenic regions in Rickettsiales, this trait is very pronounced in E. ruminantium and is particularly striking compared to A. marginale, which displays the short intergenic sequences usually observed in bacteria. Moreover, these long intergenic regions exhibit important size plasticity related to the presence of tandem repeats. This positive correlation is in good agreement with already-proposed mechanisms of tandem repeat deletion or amplification through DNA slippage (18, 38). Indeed, this RecA-independent mechanism nicely explains why the observed variations affect integral numbers of tandem copies and not the excision of the whole tandem by homologous recombination. Interestingly, although E. ruminantium bears the recA gene, as well as other genes involved in DNA repair, the RecA-independent mechanism seems to be favored. Large tandem repeats (over several hundred bases) can be produced and maintained under selection (18, 38). However when selection is removed they tend to collapse to single copy by RecA-dependent homologous recombination. Obviously, this phenomenon is here again not observed in E. ruminantium. This means either that some selection pressure is still active or that some elements of the excision mechanism are not fully functional. An interesting observation is that the deletion/expansion process is still very active and occurs in a rather short time frame, since several cases are observable between Erwe and Erwo.
Comparative genomic analysis also provides hints for targeting genes potentially involved in the observed host range diversity. Membrane proteins such as Cpg- and Map1-related proteins are well-studied antigenic proteins (9, 34, 65), and comparative genomics may allow for more accurate targeting. Cpg proteins, and more specifically Cpg1, are clearly good candidates. Several Map1 proteins are also priority targets, as shown by their larger numbers of observed mutations between Erga and Erwe/Erwo. The best candidates are map1 itself, as well as map1-2 and map1-6. Interestingly, the map1-1 gene, which was shown to be preferentially expressed in vectoring ticks (9), is the only map1 gene to be strictly identical between Erga and Erwe while differing between Erwe and Erwo. The vir genes provide an additional set of candidates, in particular virB6 and virB4, which display a large number of substitutions between Erga and Erwe/Erwo although they are clearly still subjected to selection pressure. This combination may reflect the need to adapt to a different host and cell environment while preserving key structure-function aspects of the proteins. By contrast, no significant difference appears between genes annotated with EC numbers whose products are involved in intermediate metabolism. It seems therefore unlikely that the metabolic capabilities significantly differ in the three strains.
The truncated genes observed in both Erga and Erwe/Erwo form another group of candidates. The few truncated genes with an identified function cannot explain the host range difference; therefore, candidates must be sought among the 24 unknown truncated genes. Similarly, the new potential membrane proteins of unknown function characterized by high Phe and Trp content comprise a final group of target candidates. However, since nothing is known about their function or localization, transcriptomics and proteomics experiments should thus be further considered to determine which genes are active and whether any differential expression occurs between Erga and Erwe/Erwo. Extending this analysis to the genomes of related bacteria will allow for the identification of genes involved in key mechanisms such as pathogenesis, virulence, host range, and protection and will thus contribute to a better understanding of the biology of the Rickettsiales.
The apparent ongoing process of genome plasticity, characterized by permanent deletion-insertion of tandem repeats and occurrence of gene truncations, clearly differentiates E. ruminantium from other intracellular bacteria, for which genome stability following initial size reduction is a key feature (11, 37, 44, 46, 61). E. ruminantium seems to be capable of rapidly undergoing genomic rearrangements upon exposure to a novel environment, which may thus explain the poor field efficacy of vaccines. The presence of truncated genes in Erwe, while they remained intact in both Erga and Erwo, after a change in cell environment may illustrate this phenomenon. This might be further exemplified both by the recent development of an attenuated phenotype of the Welgevonden strain by propagation in an unusual cell environment, i.e., a canine macrophage-monocyte cell line (71), and by the identification of a recombination event between the map1-2 and map1-3 genes of CTVM-Gardel (9) following a modification of the cell environment. Experimental evolution and comparative genomic analysis could help answer this question. Nevertheless, genome plasticity and the related strain diversity definitely affect both strain-specific diagnostic and vaccine strategies. Therefore, a deeper understanding of this mechanism and its potential for variability, as well as of the minimal genome required for survival, is a prerequisite for development of novel vaccines and diagnostic tools.
This work was supported by CIRAD-CNRS grant 751745/00.
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
Present address: Centre de Recerca en Sanitat Animal, Campus de Bellaterra, Edifici V, 08193 Bellaterra, Barcelona, Spain. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»