Previous Article | Next Article ![]()
Journal of Bacteriology, December 2004, p. 8240-8247, Vol. 186, No. 24
0021-9193/04/$08.00+0 DOI: 10.1128/JB.186.24.8240-8247.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, Massachusetts,1 Department of Biochemistry, University of Wisconsin, Madison, Wisconsin2
Received 14 May 2004/ Accepted 10 September 2004
|
|
|---|
|
|
|---|
One class of transposable element ("cut and paste" transposons) moves from one DNA molecule to another (or from site to site within a DNA molecule) by a process that involves excision of the transposon DNA from the original site and integration of the transposon DNA into the second site. These DNA excision and integration processes are catalyzed by a transposon-specific protein called a transposase (26). Understanding how transposases function is of general importance because many transposases are members of a superfamily of proteins that includes human immunodeficiency virus type 1 integrase and possibly the RAG-1 protein responsible for the DNA cleavage events in the immune system V(D)J joining process. That is, this family of proteins share overall architectural properties, detailed active-site structures, and catalytic mechanisms (13, 16, 26, 33).
IS50 is a component of the composite transposon Tn5. IS50 transposase (also called Tn5 transposase) is an excellent model system because both biochemical and genetic data and detailed structural information are available (32, 33). As a result, the overall mechanism of IS50 transposition is well understood (see Fig. 1) and the roles of many protein residues in this process have been determined. However, structure-function studies of IS50 transposase are far from complete. For instance, the residues that participate in intramolecular regulatory contacts or transposase recognition contacts with donor DNA or target DNA are currently unknown.
![]() View larger version (25K): [in a new window] |
FIG. 1. IS50 transposition mechanism. IS50 transposition is a multistep process including binding of transposase monomers to the transposon end recognition sequences (so-called cis binding), dimerization of the monomeric transposase-DNA complexes to form the synaptic complex (at this stage trans transposase-DNA contacts are formed), catalysis of the three phosphoryl transfer reactions (3'-end nicking, hairpin formation, and hairpin cleavage), leading to release of the transposon DNA from the donor DNA backbone, target DNA capture, strand transfer of transposon DNA 3' ends into the target, disengagement of the transposase molecules, and repair of the adjacent single-stranded DNA gaps. This figure is similar to one that was published before (33).
|
Previously, IS50 transposase had been classified as a member of the IS4 family of transposases (34). However, IS50 transposase has a limited degree of homology to other IS4 family members (such as IS10 transposase) primarily involving the so-called YREK motif around the third catalytic residue (note that the YREK motif is located within the C1 region [10, 24, 34]). A more detailed comparison between IS50 and IS10 transposases is presented later in this communication. A recent comparison of IS50 transposase to the other members of the IS4 transposase family incorrectly located the first catalytic residue for Tn5 transposase (10). Thus, this earlier work, given the low degree of similarity outside of the C1 region, would not provide the desired detailed information for elucidating structure-function characteristics.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Tn5 transposase ortholog sources
|
|
|
|---|
The deliberately conservative approach of including only apparently full-length homologous sequences was used to avoid the inclusion of sequences in which the transposase may have been inactivated due to deletions. The concern was that inactivated deleted sequences could accumulate mutations altering functionally important homologies. N-terminal sequences starting at residue 26 are known to play a critical role in the required transposon DNA binding step (30), and C-terminal sequences, including residue 468 (out of 476), are known to be required for the essential transposase-transposase dimerization step (36).
With the above strategy, BLAST-P analysis revealed two full-length annotated IS50 transposase matches in the genomes of Photorhabdus luminescens (15), and Gloeobacter violaceus (27) (Table 1). The search was then expanded to include the newly released results from the random environmental Sargasso Sea sequencing project (39). From this analysis, three additional unique, full-length sequences similar to IS50 transposase were added to the database. Finally, the IS50 transposase-like sequence previously reported in the Wolbacchia prophage WO genome (25) was added to the collection.
The protein sequences of these six IS50 transposase matches were aligned with the ClustalW (37) program available through EBI. The calculated alignments are shown in Fig. 2. Several things are apparent from this analysis. First, all three of the catalytic residues are located within conserved motifs (see below and Fig. 2). These conserved motifs around the active-site residues will be discussed below. Other apparently conserved residues are also indicated in Fig. 2 and will be discussed in greater detail below.
![]() View larger version (119K): [in a new window] |
FIG. 2. A. Clustal W alignment of the IS50 transposase sequence versus six full-length IS50 transposase-like sequences. Active-site residues are white on a black background. Other sequence identities are boxed with * underneath. Residues with a shaded background (and either : or . underneath) denote chemical conservation. Conserved motifs that are discussed further are residues 20 to 31 (IS50 transposase numbering scheme), 38 and 39, 58 to 63, 92 to 103 (DTT), 160 to 166, 183 to 193 (DREAD), 207 to 211, 296 to 306, 319 to 335 (YREK), 342 to 351, 354 to 366, 417 to 425, 429 to 434, and 449 to 452. B. DTT, DREAD, and TYEK active-site motifs.
|
It is of interest that only two protein sequences were found in our study that had a high degree (>55%) of identity to IS50 transposase. These two sequences were partial sequence fragments from the random environmental sample (39). One sequence was 147 residues long (IS50 transposase is 476 residues long) and was 98% identical to the IS50 transposase sequence. The second sequence was 100% identical to IS50 transposase but was only 65 residues long. The failure to find more close homologs to IS50 transposase is consistent with the fact that IS50 is seemingly rare, having only been isolated twice. The first isolate was from an R factor carried in a Klebsiella pneumoniae strain (5), and the second isolate was from a different R factor carried in an E. coli strain isolated from a pig (J. Davies, personal communication).
One concern about the above analysis is that the IS50 transposase homologs were not tested for functionality. However, the observation that the active-site residues and the motifs around these residues were completely conserved suggests that the homologs are functional transposases derived from shared ancestry. One exception to this probable transposase functionality involves the partial transposase-like sequences found in Wolbachia pipientis that were not used in our analysis (see below).
It is of note that the IS50 transposase homologs were found in the genomes of but a few widely divergent bacteria. This observation supports the hypothesis that the presence of transposase homologs in these genomes reflects lateral gene transfer. A particularly interesting bacterium whose genome contains fragments of an IS50 transposase-like gene is Wolbachia pipientis. Wolbachia pipientis is an obligate endosymbiont of arthropods (41) and nematodes (9) and thus would not a priori be expected to participate in the lateral transfer of genetic information. However, the Wolbachia pipientis genome is littered with mobile DNA (42), including prophages (7), that may have been the vehicle for introducing a transposable element. Consistent with this hypothesis is the observation of an IS50 transposase homolog in the WO prophage genome of Wolbachia wTai, but not in that of the related Wolbachia wKue prophage genome (25). The transposase gene contained on the prophage genome is full length and is imbedded in an IS50-like sequence (S. Bordenstein and W. Reznikoff, data not shown). In addition, there are IS50 transposase-like gene sequences contained on the Wolbachia pipientis chromosome. These are truncated and contain nonsense codons (42) and thus are likely inactive remnants of transposase genes.
Homologies found among IS50-like transposases. In this section, the sequence homologies discovered in the Clustal W analysis and confirmed by the pairwise analyses will be specifically analyzed.
Motifs containing the three catalytic residues. Transposase and retroviral integrase proteins are known to contain three key catalytic residues. These are the DDE (aspartate, aspartate, and glutamate) residues that chelate two Mg2+ ions necessary for the phosphoryl transfer reaction catalyses that occur in transposition (and retroviral DNA integration) (13, 18, 35, 44). In IS50 transposase these catalytic residues are known from genetic and structural studies to be D97, D188, and E326 (13, 14, 28, 31).
Comparison of the IS50 transposase sequence with the six homologs listed in Table 1 demonstrates that there are clearly conserved amino acid motifs around all three of the catalytic residues (Fig. 2). These motifs are (L/V/I) (L/F) () (I/L) (E/Q) (D97) (T) (T/S) () (L/I) (S/N/T) (F/Y) (residues 92 to 103; IS50 transposase numbering scheme) surrounding the first catalytic residue (subsequently called the DTT motif); (V/A/T) (I/V) () (V/I) () (D188) (R) (E) (A/S) (D) (I/M/L) (residues 183 to 193) surrounding the second catalytic residue (subsequently called the DREAD motif); and the (Y) () () (R) (W) () (I/V) (E326) () () (H) () () () (K) (T/S) (G) motif (residues 319 to 335) surrounding the third catalytic residue (called the YREK motif [34]). The YREK motif was previously described as an important signature within the C1 region of the IS4 family transposases (10, 24, 34).
The present communication is the first identification of the conserved DTT and DREAD motifs. The second T residue of the DTT motif is the only residue other than the catalytic residues within the DTT and DREAD motifs for which any biochemical data exists. This amino acid (T99) contacts the phosphate backbone of the recognition end sequence DNA in the IS50 transposase-end sequence DNA cocrystal structure (14). The presence of several charged residues within the DREAD motif suggests that they may interact with DNA as would occur during target and/or donor DNA binding. A role for this region in target DNA binding is supported by the recent isolation of a mutation at residue R189 that changes target sequence specificity (Goryshin and Reznikoff, unpublished results). Target DNA binding is a critical step in transposition about which little is known.
In Fig. 3, various residues mentioned above are highlighted as colored spheres in the IS50 transposase-end sequence DNA cocrystal structure. The DDE catalytic residues are presented in this and all subsequent structural figures as yellow spheres. The DTT motif is presented as orange spheres, and the DREAD motif is presented as magenta spheres. Other sequences highlighted in Fig. 2 will be discussed below, as will the YREK motif.
![]() View larger version (64K): [in a new window] |
FIG. 3. Molecular location of DTT and DREAD active-site motifs. The locations of the indicated residues are located on the molecular structure of the IS50 transposase bound to the precleaved recognition end DNA sequences. In this and all subsequent figures, the catalytic residues D97, D188, and E326 are shown in yellow space-filled spheres. Conserved motifs are presented in the indicated colors.
|
![]() View larger version (76K): [in a new window] |
FIG. 4. Conserved N-terminal DNA binding motifs.
|
Residues 160 to 166 and 207 to 211. Residues 160 to 166 and 207 to 211 are also conserved (Fig. 2), but there are no biochemical data to suggest a function for these amino acids. As demonstrated in Fig. 3, these residues (160 to 167, cyan spheres; 207 to 211, dark blue spheres) come in close contact with the active-site motifs DTT and DREAD. Two possible roles can be entertained to explain the conservation of these residues. They could provide important structural support for the active-site residues D97 and D188. Alternatively, or in addition, some of these residues are charged or polar and thus might interact with either donor backbone DNA during formation of the synaptic complex or with target DNA just prior to strand transfer. In fact, a mutation of residue K212 has been found that changes the target site specificity of the transposition reaction (Goryshin and Reznikoff, unpublished).
Residues 296 to 306. Residue W298 (cyan sphere in Fig. 5) is known to play an important role in forming a stacking relationship with a flipped base at position 2 in the transposon end recognition sequence DNA (3). This is an important trans transposase-DNA contact. trans transposase contacts occur as a consequence of the second transposase monomer's binding the end recognition DNA near the cleavage site within the confines of the active site (14). The W-thymine stacking stabilizes the "flipped" base arrangement that likely destabilizes the DNA near where cleavage will occur and facilitates hairpin formation (see Fig. 1). Hairpin formation is a key step in the double-strand breakage that is characteristic of IS50 and IS10 transposon excision from donor DNA (6, 22).
![]() View larger version (64K): [in a new window] |
FIG. 5. Conserved C1 region. Residues 296 to 306 (except 298) are in dark blue. W298, which stabilizes the T2 flipped base, is in cyan. The YREK motif (except for the yellow 326) is in purple.
|
|
View larger version (10K): [in a new window] |
FIG. 6. Clustal W alignment of IS50 and IS10 C1 regions. The aligned residues are from 298 to 355 (Tn5 transposase numbering scheme). The conservation of the residues is indicated as described for Fig. 2A.
|
Residues 342 to 351. Residues 342 to 351 (blue spheres in Fig. 6) from X-ray crystallographic structural analyses encompass three residues thought to form cis transposase end recognition sequence contacts (14). These residues are 342, 344, and 348. Mutations in 342 and 348 cause defects in transposition, presumably via defects in DNA binding. (Sterling and Reznikoff, unpublished data).
Residues 354 to 366, 417 to 425, and 429 to 434. Residues 354 to 356, 417 to 425, and 429 to 434 appear to be conserved, but there exist no genetic, biochemical, or structural explanation for this conservation.
A homology that was not found in some IS50-like transposases: an AUG residue at or near residue 56 One of the key mechanisms that regulates IS50 transposase activity is the synthesis of the inhibitory regulatory protein (inhibitor or p2) (21). The inhibitor protein is thought to inactivate transposase through the formation of inactive inhibitor-transposase heterodimers (8). Inhibitor translation initiation occurs at residue 56 of the transposase sequence (23). Although the inhibitor is translated in the same reading frame as the transposase, the bulk of inhibitor production is encoded by a distinct mRNA species programmed by a promoter that overlaps and competes with the transposase mRNA promoter (23). The production of the inhibitor results in a reduction in the frequency of IS50 transposition and couples this downregulation to the Dam methylation state of the transposase promoter site and thus to DNA replication (45). The ability of IS50 transposase gene homologs to separately encode an inhibitor protein has not been experimentally studied; however, an analysis of the sequences indicates that this potential is not shared by all of the homologs, since three of the homologs (encoded in the Gloeobacter genome, the Wolbachia wTai phage WO genome, and one of the pooled environmental sample genomes) lack a methionine (or a valine) residue near position 56.
However, it should be noted that for IS50, the production of N-terminally truncated versions of transposase that potentially could inhibit transposition also occurs by means of proteolytic digestion of the transposase (38). It is quite possible that this type of proteolytic digestion of transposase also occurs for the transposase homologs that are studied in this communication.
Relationship to IS10 transposase. The above BLAST-P analysis did not reveal IS10 transposase to be a close homolog of IS50 transposase. IS10 and IS50 transposases have been classified as belonging to distinct groups within the IS4 transposase family (10, 24). In order to investigate the relationship between IS50 and IS10 transposases, a BLAST pairwise comparison was performed and the previous IS50-IS10 transposase primary and secondary structure comparison (17) was reviewed. These two transposase proteins appear to have three different regions of interest. There is a central region (between IS50 W298 and I355) that demonstrates high homology between IS50 and IS10 transposases (30% identity) (Fig. 6). This central region is very similar to the C1 region described previously (10, 24, 34). This sequence includes the YREK motif that was used to define the IS4 transposase family and W298, which, as mentioned above, is believed to play an important role in the hairpin cleavage mechanism. N-terminal residues 1 to 297 of IS50 transposase and C-terminal residues 356 to 476 of IS50 transposase share limited homologies with IS10 transposase, although most proposed secondary structures and the DD active-site residues are shared (17). Of note is that IS10 transposase lacks the DTT and DREAD motifs as well as other regions of homology described above.
One model to explain the IS50-IS10 transposase relationship is to posit that they evolved from the same ancestral transposase sequence but that the requirements for hairpin formation as an intermediate resulted in greater conservation of the sequences around the YREK motif then for sequences in other portions of the proteins.
A second model posits that IS50 and IS10 transposases are related through gene fusion events. That is, the 298 to 355 sequence evolved from a common ancestral sequence that in one or the other transposase became fused to two sequences from a different transposase lineage. These observations also explain why the first critical catalytic D residue of IS50 transposase was misidentified in the IS4 transposase family analysis (10, 24). The significant divergence of the N-terminal and C-terminal sequences between IS50 and IS10 transposases suggests that the IS50 synaptic complex molecular structure may not be a good detailed model for the IS10 synaptic complex. Similarly, some of the detailed biochemistry deduced for IS10 transposase may not be precisely shared by IS50 transposase, with the exception that both IS50 and IS10 transposases are known to catalyze DNA strand breakage through a DNA hairpin mechanism (see Fig. 1). Notice that the sequence between W298 and I355 is thought to play a key role in the hairpin mechanism. It has recently been hypothesized that a similar sequence plays a critical role in hairpin binding by the tyrosine recombinase-like protein ResT from Borrelia burgdorferi (4).
Conclusion. The bioinformatics analysis of IS50 transposase identified residues known from previous genetic, biochemical, and structural studies to be important for IS50 transposition. In addition the bioinformatics analysis laid the framework for future structure-function studies of this important protein. For instance, it identified the presence of motifs in which the first two catalytic residues are located and it suggested that an intramolecular regulatory N terminal-C-terminal interaction has been conserved. These and other homologies that were found will be useful in guiding site-specific mutagenesis studies. Finally, the analysis offered insights into the ecology and evolution of IS50 transposase homologs. For example, the IS50 transposase-like sequences, while rare, are distributed throughout the genomes of vastly divergent terrestrial and aquatic bacteria with both free-living and intracellular lifestyles. In addition, it identified a putative event of lateral IS50 transfer mediated by a bacteriophage in the genome of the endosymbiont Wolbachia.
J.A. was supported by a grant from the National Science Foundation (MCB0084089) administered by W.S.R. S.R.B. held a National Research Council Research Associateship Award. W.S.R. is the Evelyn Mercer Professor of Biochemistry and Molecular Biology. Additional thanks are given to the NASA Astrobiology Institute (Cooperative Agreement NNA04CC04A to Mitchell L. Sogin) and the W. M. Keck Ecological and Evolutionary Genetics Facility within the Josephine Bay Paul Center for Comparative Molecular Biology and Evolution at the M.B.L. Molecular graphics images were produced by using the UCSF Chimera package from the Computer Graphics Laboratory, University of California, San Francisco (supported by NIH grant P41 RR-01081).
|
|
|---|
. Proc. Natl. Acad. Sci. USA 72:3628-3632.
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»