Previous Article | Next Article ![]()
Journal of Bacteriology, May 2009, p. 3321-3327, Vol. 191, No. 10
0021-9193/09/$08.00+0 doi:10.1128/JB.00120-09
Copyright © 2009, American Society for Microbiology. All Rights Reserved.
,
Department of Bioregulation, Leprosy Research Center, National Institute of Infectious Diseases, Tokyo, Japan,1 Department of Microbiology, Leprosy Research Center, National Institute of Infectious Diseases, Tokyo, Japan,2 Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Japan3
Received 29 January 2009/ Accepted 6 March 2009
|
|
|---|
|
|
|---|
Pseudogenes are described as functionally silent relatives of normal genes. Since they are usually eliminated from the genome, it was speculated that the number of pseudogenes correlates with the size of the genome (28). Most pseudogenes are thought to result from a transposon insertion or inactivation of one copy after a gene duplication event (7). Because they do not create functional proteins, they are also called "junk" genes. However, some pseudogenes are expressed and function to regulate the expression of other genes (14, 20).
About one-quarter of the M. leprae genome is composed of noncoding regions, which constitutes a much larger proportion of the genome than the noncoding regions in M. tuberculosis. Gene-regulatory short RNA fragments generated from noncoding regions have been found in many organisms (17). In those cases, precursor microRNAs are transcribed independently and processed into mature forms. In eukaryotes, most of the transcriptome, which includes thousands of microRNAs, consists of noncoding RNA (24). In addition, the abundance of small RNAs in Escherichia coli has been estimated at 1 to 2% of the number of open reading frames (ORFs) (12).
Microarrays have facilitated transcriptome analysis through the use of probes that target a large number of genes. The technique has identified unexpected gene activity in a number of areas and in some cases has served to elucidate entire microbial metabolic processes, as exemplified by caloric restriction or oxidative stress in E. coli (10, 30). Moreover, RNA expression profiling has been valuable in the analysis of pathogenic bacteria. Analyses of changes in RNA expression upon infection of host macrophages has identified genes related to oxidative stress, proliferation, and other unknown functions in Yersinia pestis (causative agent of plague) (42) and Salmonella enterica serovar Typhi (causative agent of typhoid fever) (9). DNA microarray analysis has also found genes involved in the acid stress response (2) and transcriptional hierarchy of the flagellar system (27).
Only known or predicted genes were examined in the experiments described above. Therefore, it was not possible to analyze the RNA expression of noncoding regions and potential pseudogenes that did not have the appropriate annotation. Clone-based microarrays were developed to solve this problem (29), but they were still unable to detect genome-wide RNA expression. Finally, tiling arrays have become a useful tool for the analysis of whole-genome or chromosome expression (19) and have been used to uncover several novel RNA expression patterns (15, 38). Although the genome sequence and its annotation are known, comprehensive analysis of M. leprae RNA expression has not been performed. The results of our previous study and the availability of tiling arrays prompted a detailed investigation of RNA expression throughout the M. leprae genome. In this study, tiling arrays were used to analyze comprehensive RNA expression of genes, pseudogenes, and noncoding regions in M. leprae.
|
|
|---|
RNA extraction. M. leprae cells (2.8 x 1011) were suspended in 2 ml of RNA Protect bacterial reagent (Qiagen, Germantown, MD), subjected to a vortex, and incubated for 10 min at room temperature. The cells were pelleted and resuspended in 2 ml of RNA Protect bacterial reagent, 0.4 ml of 1.0-mm zirconia beads (BioSpec Products, Bartlesville, OK), and 0.6 ml of lysis/binding buffer from the mirVana miRNA isolation kit (Ambion, Austin, TX). The mixture was homogenized at 3,000 rpm for 3 min using a Micro Smash homogenizer (Tomy, Tokyo, Japan) followed by four freeze-thaw cycles. RNA was then extracted according to the manufacturer's guidelines (Ambion) and treated with DNase I (TaKaRa, Kyoto Japan).
Preparation of labeled double-stranded DNA. Twenty micrograms of total RNA from M. leprae was reverse transcribed using SuperScript II (Invitrogen, Carlsbad, CA). The generated cDNA was incubated with 10 ng of RNase A (Novagen, Madison, WI) at 37°C for 10 min, phenol-chloroform extracted, and precipitated with ethanol. Cy3 labeling was performed as follows: 1 µg double-stranded cDNA was incubated for 10 min at 98°C with 1 optical-density-at-600-nm unit of Cy3-9-mer Wobble primer (TriLink Biotechnologies, San Diego, CA). The addition of 8 mmol of deoxynucleoside triphosphates and 100 U of Klenow fragment (New England Biolabs, Ipswich, MA) was followed by incubation at 37°C for 2 h. The reaction was stopped by adding 0.1 volumes of 0.5 M EDTA, and the labeled cDNA was precipitated with isopropanol.
Array design. The tiling array was designed based on sequences obtained from the GenBank database (accession no. NC_002677) (5). Each probe was a 60-mer, and the adjacent probe was shifted by 18 nucleotides (a 42-nucleotide overlap). A total of 363,116 probes were designed for the sense and antisense strands and arranged on a glass plate with 22,000 control probes of randomly chosen sequences. Another array on which the probes were chosen from M. leprae ORFs (NimbleGen Systems, Madison, WI) was made. On this ORF array, 20 different probes were designed for each of the 1,605 ORFs. The probes were spotted onto five blocks on the glass plate, resulting in an arrangement of 160,500 probes on the ORF array.
Hybridization and analysis of tiling and ORF arrays. Cy3-labeled samples were resuspended in 40 µl of hybridization buffer (NimbleGen Systems, Madison, WI), denatured at 95°C for 5 min, and hybridized to arrays in a MAUI hybridization system (BioMicro Systems, Salt Lake City, UT) for 18 h at 42°C. The arrays were washed using a wash buffer kit (NimbleGen Systems), dried by centrifugation, and scanned at a 5-µm resolution using the GenePix 4000B scanner (Molecular Devices, Sunnyvale, CA). NIMBLESCAN 2.3 (NimbleGen Systems) was used to obtain fluorescence intensity data from the scanned arrays.
Quantitative real-time PCR. The cDNA used for tiling array was also subjected to real-time PCR analysis. The primers were designed using GENETYX version 7 (Genetyx Corporation, Tokyo, Japan) and are listed in Table S1 in the supplemental material. Preparation of M. leprae genomic DNA and real-time PCRs was carried out as described previously (37) with 200 nM of each primer and 0.5 ng of cDNA or 0.2 ng of genomic DNA as a control.
|
|
|---|
In order to confirm the specificity of the tiling array, RNA from the same sample was simultaneously hybridized with the ORF array on which multiple sequence-specific probes were designed for each gene. The positive signals detected on the ORF array were consistent with those detected on the tiling array (Fig. 1). Moreover, because the tiling array probes include ORFs in their coverage of the entire genome, it is expected that more detailed information would be obtained from them. The strongest signal was identified in the rRNA; most probes in this region showed significantly higher intensity (Fig. 2A). Other highly expressed areas were detected in the genes (Fig. 2B), pseudogenes (Fig. 2C), and noncoding regions (Fig. 2D). In this study, noncoding regions were defined as regions that are not annotated. rRNA and tRNA are usually considered noncoding RNA but are dealt with separately here since they are annotated in the database. An interesting feature of some highly expressed areas was that positive signals sometimes overlapped both gene/pseudogene and noncoding regions, as illustrated in Fig. 2B and C. The expression levels of each probe within a single ORF were not constant but rather quite variable, which might reflect a difference in melting temperature based on the GC content of each probe.
![]() View larger version (11K): [in a new window] |
FIG. 1. Typical array data from an approximately 40-kbp region. Data from the tiling and ORF arrays are shown with the gene annotation of Cole et al. from 2001 (5) depicted as rectangles.
|
![]() View larger version (19K): [in a new window] |
FIG. 2. Signal intensity patterns detected as highly expressed areas in the tiling array. Scanned data were normalized to log2, divided by the median, and arrayed against the corresponding M. leprae genome sequence. Positive areas were extracted and are depicted under the signal pattern of probes with gene and pseudogene annotations. (A) Genomic region of rRNA showing almost saturated signal intensity. (B) Highly expressed region of the gene for the hypothetical protein ML2313 (shaded area). (C) Highly expressed region of the ML1476 pseudogene (probable oxidoreductase alpha subunit; shaded area). (D) Highly expressed noncoding region in the genomic position from bp 1973155 to 1973700, which showed no homology to genes or other functional sequences by BLASTN search. Gene annotations are from reference 5.
|
![]() View larger version (16K): [in a new window] |
FIG. 3. Distribution of signal intensity in each region. Mean signal intensities of individual regions were calculated, and the ratio against the corresponding total number in the M. leprae genome was plotted for genes, pseudogenes, and noncoding regions. Mean signal intensities, variances, and P values from Student's t test were calculated for the entire region and are shown below the graph.
|
|
View this table: [in a new window] |
TABLE 1. Numbers of highly expressed genes, pseudogenes, and noncoding regions identified by tiled microarray analysis
|
2 = 7.1, P = 0.008). Among the "small-molecule metabolism" class, the "amino acid biosynthesis" (4 out of 77) and "purines, pyrimidines, nucleosides, and nucleotides" (4 out of 52) subsets were highly expressed, while expression of the "biosynthesis of cofactors, prosthetic groups, and carriers" subset was not observed (0 out of 63). Similarly, in the "macromolecule metabolism" class, the "cell envelope" subset was expressed (13 out of 256), but the "degradation of macromolecules" subset was not (0 out of 43) (
2 = 2.8, P = 0.251). Three out of 11 PE and PPE protein gene families found in the "other functions" class were expressed among the coding genes. |
View this table: [in a new window] |
TABLE 2. Numbers and percentage of expressed genes and pseudogenes based on functional classificationa
|
2 = 40.9, P = 1.00 x 10–7). No significance was detected when this class was excluded (
2 = 1.7, P = 0.793). In the "other functions" class, 15 expressed pseudogenes contained parts of the LEPREP repeat sequence. Markedly expressed pseudogenes were also found in the "degradation" (5 out of 74) and "energy metabolism" (7 out of 118) subsets of the "small-molecule metabolism" class, although the expression was not statistically significant among pseudogenes (78 out of 1,115). The overall expression level of pseudogenes (7.0%) was higher than that of genes (3.9%) (
2 = 11.3, P = 0.001). However, the "cell processes" class showed significantly higher gene expression (9.8%) than pseudogene expression (3.0%) (
2 = 6.6, P = 0.010). Real-time PCR confirmation of RNA expression profiles. Specific primers were designed for five genes, seven pseudogenes, and six noncoding regions that were highly expressed in the tiling array analysis (see Table S1 in the supplemental material). Although M. leprae RNA was pretreated with DNase I prior to reverse transcription, the RNA was checked by PCR to exclude possible contamination by genomic DNA (data not shown).
Each primer set generated a specific reverse transcription-PCR product (data not shown). The RNA expression levels determined by real-time PCR analysis were comparable to the signal intensities from the tiling array (Fig. 4). Of interest, coding genes produced higher expression levels in real-time PCR, in contrast to the higher level of pseudogene expression detected by the tiling array.
![]() View larger version (19K): [in a new window] |
FIG. 4. Comparison of RNA expression between real-time PCR and tiling array. Relative RNA expression levels detected by tiling array analysis and quantitative real-time PCR were compared. Genes and pseudogenes are indicated by accession numbers. Noncoding regions are indicated by their starting position in the M. leprae genome. Data are from three independent real-time PCRs and are expressed as means ± standard errors.
|
|
|
|---|
The roles of RNA derived from M. leprae noncoding regions and pseudogenes are not known, but the aberrant expression of pseudogenes has been reported in some cancers (22, 35). In addition, a nitric oxide synthase pseudogene is expressed in the central nervous system of the snail Lymnaea stagnalis, and its transcript is thought to have antisense activities (18). Pseudogenes also have some biological functions in processes such as cell growth and organogenesis (16). Computational analysis of the mouse genome showed that 10% of the mRNA fraction can be derived from pseudogenes (11). Our results suggest that pseudogenes and genes are similarly transcribed. If some pseudogenes function to regulate gene expression, it may explain why M. leprae is able to survive with only a limited number of protein-coding genes. Comprehensive analysis of small RNA revealed that small interfering RNAs are expressed from pseudogenes and regulate gene expression (37). In this study, we found that pseudogenes in the functional categories of "degradation" and "energy metabolism" in the "small-molecule metabolism" class were strongly transcribed on a frequent basis. Further functional analysis is needed to elucidate their roles and the reason behind the biased transcription between functional classes. One hypothesis is that pseudogenes are transcribed because the organism has not yet evolved so as to switch them off. The strength of the selective pressure in M. leprae to dispense with useless transcription is unclear.
It has been speculated that the massive genomic degeneration seen in M. leprae is the result of dysfunctional sigma factors (23). Up to 2% of the M. leprae genome consists of repetitive DNA sequences, potential remnants of past transposons (6). Such repetitive sequences are found in pseudogenes in the "other functions" class and in noncoding regions. Of interest, we detected high RNA expression from those regions, suggesting the existence of functional roles now and/or in the past. Mycobacterium ulcerans, a close relative of M. leprae, has a similar genome structure. M. ulcerans has 771 pseudogenes, but the proportion of pseudogenes based on genome size is about 40% of that of M. leprae (34). It was also shown that Mycobacterium marinum has 65 pseudogenes (33). These species appear to have preserved past genomic evolution and heterotrophic circumstances as they adapted.
Except for rRNA and tRNA, noncoding RNAs are classified as components of ribonucleoproteins, ribozymes, or microRNA; the rest are thought to be junk derived from transposons or splicing remnants (25). The noncoding region occupying one-quarter of the M. leprae genome was presumed to be silent. The highly expressed areas of the noncoding regions were thought to be derived from RLEP and LEPREP (6). However, a large number of other noncoding regions that are more highly expressed than genes and pseudogenes have no homology with known sequences of noncoding RNA. Consequently, these RNAs might have a hitherto unrecognized function.
Different classes of M. leprae genes exhibited different levels of RNA expression. RNA expression was relatively high from genes in the "small-molecule metabolism" class related to amino acid and nucleotide synthesis, probably because these small molecules are necessary for protein and RNA synthesis. Moreover, a low level of pseudogene expression in these classification subsets may support the idea that the genes in this class have very essential roles. Similarly, highly expressed genes in the "cell processes" class are responsible for the folding of synthesized proteins. On the other hand, genes related to DNA replication were not strongly expressed, reflecting the fact that the proliferation of M. leprae is very slow. Also, although high expression was not detected in some functional subclasses, such as the "biosynthesis of cofactors, prosthetic groups, and carriers" and "degradation of macromolecules" subclasses, these genes are expressed at a low level (data not shown). In fact, genes targeted by particular drugs are included in these subsets. Thus, RNA polymerase III and folic acid synthesis genes, targeted by rifampin and dapsone, respectively (8), are not highly expressed (data not shown). These data indicate that high RNA expression does not necessarily correlate with the functional importance of the genes, such as those related to drug resistance.
High expression was detected from lipoproteins and the PE and PPE families, which is characteristic of M. leprae. Lipoproteins function in infection and survival, as exemplified in M. tuberculosis (38). The PE and PPE families are specific to Mycobacterium species and by definition contain a Pro-Glu or Pro-Pro-Glu motif near the N terminus (4). Since the PE and PPE families are associated with the early secreted antigenic target 6-kDa (ESAT-6) antigen (29), they may play an important role in virulence. Because M. leprae has fewer PE, PPE, and ESAT-6-like genes than M. tuberculosis, information on these expressed genes will facilitate further functional analysis of a PE, PPE, and ESAT-6-like protein complex.
There were some differences in the levels of RNA expression detected by tiling array and real-time PCR. The level of expression from coding genes detected by tiling array was lower than the level from these genes detected by real-time PCR, while pseudogene expression was more abundant in the tiling array analysis than in real-time PCR. This discrepancy might reflect the difference in the target length for these methods as well as the difference in the length of transcribed RNA.
The genome size of microbes, as well as the proportion of noncoding regions, is much smaller than that of eukaryotes. Therefore, RNA expression from these regions has been extensively studied. One such study resulted in the discovery of an essential protein homolog, Argonaute, which is necessary for microRNA maturation (13). RNA expression from noncoding regions was also detected from the whole-genome analyses of E. coli (39) as well as Prochlorococcus and Synechococcus spp. (3). The tiling array has facilitated far more in-depth transcriptome analysis, including noncoding regions, than previous techniques such as shotgun cloning (1). For example, a Saccharomyces cerevisiae tiling array analysis identified 98 novel noncoding RNAs (32). The present tiling array will be similarly useful for the identification of noncoding RNA in bacteria (31) and for further functional analysis. This is the first genome-wide expression profile of M. leprae genes, pseudogenes, and noncoding regions, which can used as the foundation for the screening of drug candidates and the study of host-bacillus interactions.
We thank M. Mishima, P. D. Bang, S. Aizawa, M. Hayashi Y. Ishido, and S. Sekimura (Leprosy Research Center, National Institute of Infectious Diseases) for invaluable discussions and M. Kenmotsu and H. Kawauchi (Roche Diagnostics) for helpful assistance with the tiling array analysis.
Published ahead of print on 13 March 2009. ![]()
Supplemental material for this article may be found at http://jb.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»