Previous Article | Next Article 
Journal of Bacteriology, May 2002, p. 2837-2840, Vol. 184, No. 10
0021-9193/02/$04.00+0 DOI: 10.1128/JB.184.10.2837-2840.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
The secE Gene of Helicobacter pylori
Claudine Médigue,1,2 Benjamin Chun-Yu Wong,3 Marie Chia-Mi Lin,4 Stéphanie Bocs,2 and Antoine Danchin5*
Génétique des Génomes Bactériens, Institut Pasteur, Paris,1
Génomique Comparative des Micro-organismes Pathogènes, Génopole, Evry, France,2
Department of Medicine,3
Institute of Molecular Biology, University of Hong Kong,4
HKU-Pasteur Research Center, Hong Kong, China5
Received 28 December 2001/
Accepted 6 February 2002

ABSTRACT
Despite extensive annotation by two independent teams, the
Helicobacter pylori genome appeared to lack a complete secretion machinery.
The use of clinical isolates to substantiate in silico annotation
is used here to identify the missing
secE component of the major
secretion machinery of
Helicobacter pylori.

TEXT
Two independent sequences of the
Helicobacter pylori genome,
with many clinical isolates from hospital laboratories, have
been annotated by two independent consortia. It is therefore
expected that the identification errors have been kept to a
minimum. Naturally, some genes of unknown function may have
escaped attention while spurious sequences have been taken for
bona fide genes. It is, however, important for the community
to be sure that no essential gene has been missed or misannotated
since most scientists now rely on data library searches to substantiate
their experiments. How would the discovery of an important gene
sequence stand in a much explored (and patented) sequence? By
combining and chaining a series of independent tasks meant to
identify coding sequences (CDSs) in bacterial genomes (
2), we
explored in silico the genome of
H. pylori to see whether important
genes have escaped notice. To predict CDSs, this strategy combined
periodical Markov chain analysis (the original GeneMark program,
which works by discrimination of relevant protein coding sequences
from the background [
5]) and now popular derivatives (such as
Glimmer, which works by assimilation from previously known sequences
[
10]) together with BlastX computation and identification of
tRNAs, terminators, and putative ribosome binding sites by using
the platform Imagene (
6). In contrast with the usual approaches,
which mostly rest on one single method for gene identification,
this allows one to discriminate with fair certainty between
spurious genes and bona fide genes. The method is therefore
particularly important for reannotating regions where genes
have already been thought to be identified.
In the case of the H. pylori genomes, the situation presented downstream of the nusG gene was puzzling: GeneMark predicted a short CDS in the same orientation as nusG with a good upstream ribosome binding site, whereas Glimmer proposed a longer sequence in the opposite strand (and a poor indication of the former putative CDS) (Fig. 1). In many instances GeneMark is preferred over Glimmer because it discriminates between coding and noncoding regions while Glimmer assimilates putative coding regions to known ones. This results in the carrying over of features from one strand to its complement as soon as they are palindromic in nature (e.g., the RNY coding rule is true both in the coding strand and in its complement [9]). A Blast search revealed that the latter sequence did not display similarity with known sequences, whereas the former was similar to the secE gene present in a variety of organisms. The neighboring gene order was consistent with this, since secE is often part of an operon with genes involved in translation, as shown by Pohlschroder et al. at a time when the first genome sequences appeared (8).
The sequence of the two known
H. pylori genomes (
http://genolist.pasteur.fr/PyloriGene)
suggested that
secE was a bona fide gene, but the data were
too scarce and contradictory to warrant this hypothesis. (Only
eight codons differed in the two reference sequences, including
one modified in its second base position, therefore arguing
against the hypothesis. One expects that most sequences display
synonymous mutations; therefore, mutations would appear in the
third codon position.) To substantiate this interpretation,
we sequenced the homologous region in seven
H. pylori isolates
collected at the University of Hong Kong. The new sequences
differed from those of the model strains at 10 more significant
positions. Twelve positions corresponded to synonymous replacements
while six others yielded conservative replacements in the
secE frame (Fig.
2). The only position which was not strictly conservative
(AAA to GAA) is located immediately after the start of the protein
at a nonconservative position (a gap in some SecE proteins).
These same mutations would yield several nonconservative replacements
in the complementary putative coding sequence (in particular
an AAA (lysine)

ATA (isoleucine) replacement). Interestingly,
the sequences indicate that the Asian isolates are from a common
group that differs from those of the rest of the world (
1).
The structure of the SecE protein is comprised of a cytoplasmic
domain, a transmembrane helix, and a periplasmic domain. The
bacterial translocase consists of the SecEYG membrane protein
complex and the peripheral-membrane-associated SecA dimer (for
a review, see reference
3). In the present sequence, residues
in both the cytoplasm and the periplasm are conserved (in particular,
the residues making contact with SecY) in comparison with other
bacterial counterparts (this is in complete agreement with the
identifications made by Hartmann et al. [
4] and Murphy and Beckwith
[
7], thus further substantiating our identification of the
secE gene) while the extremities of the transmembrane helix are also
conserved, with the hydrophobic core preserving not the residue
but its hydrophobic nature (
11) (Fig.
3). SecE, which in most
bacteria is comprised of a single transmembrane domain (as predicted
by the present study), is an essential component of the highly
conserved general secretion machinery. In addition to placing
H. pylori in the normal class of secreting bacteria and providing
a new example of the SecE structure, this work emphasizes the
need for continuous reannotation of genome sequences, including
those regions which have been previously annotated thoroughly,
and the associated experimental substantiation.
ADDENDUM IN PROOF
We have recently noted that Doig et al. (P. Doig, B. L. de Jonge,
R. A. Alm, E. D. Brown, M. Uria-Nickelsen, B. Noonan, S. D.
Mills, P. Tummino, G. Carmel, B. C. Guild, D. T. Moir, G. F.
Vovis, and T. J. Trust, Microbiol. Mol. Biol. Rev.
63:675-707,
1999) in their in silico prediction of the
H. pylori genes have
suggested in passing the existence of a
secE counterpart. However,
this prediction has not been included in the corresponding databases,
perhaps for want of experimental substantiation. This is now
done in the present work.

FOOTNOTES
* Corresponding author. Mailing address: HKU Pasteur Research Centre, Dexter HC Man Building, 8, Sassoon Road, Pokfulam, Hong Kong, China. Phone: 852 2816 8403. Fax: 852 2168 4427. E-mail:
adanchin{at}hkucc.hku.hk.


REFERENCES
1
- Achtman, M., T. Azuma, D. E. Berg, Y. Ito, G. Morelli, Z. J. Pan, S. Suerbaum, S. A. Thompson, A. van der Ende, and L. J. van Doorn. 1999. Recombination and clonal groupings within Helicobacter pylori from different geographical regions. Mol. Microbiol. 32:459-470.[CrossRef][Medline]
2
- Bocs, S., A. Danchin, and C. Médigue. 2002. Re-annotation of genome microbial coding sequences: finding new genes and incorrectly annotated genes. BMC Bioinformatics 3:5. [Online.] http://www.biomedcentral.com/1471-2105/3/5.[CrossRef][Medline]
3
- Driessen, A. J., E. H. Manting, and C. van der Does. 2001. The structural basis of protein targeting and translocation in bacteria. Nat. Struct. Biol. 8:492-498.[CrossRef][Medline]
4
- Hartmann, E., T. Sommer, S. Prehn, D. Gorlich, S. Jentsch, and T. A. Rapoport. 1994. Evolutionary conservation of components of the protein translocation complex. Nature 367:654-657.[CrossRef][Medline]
5
- McIninch, J. D., W. S. Hayes, and M. Borodovsky. 1996. Applications of GeneMark in multispecies environments. Proc. Int. Conf. Intell. Syst. Mol. Biol. 4:165-175.[Medline]
6
- Médigue, C., F. Rechenmann, A. Danchin, and A. Viari. 1999. Imagene: an integrated computer environment for sequence annotation and analysis. Bioinformatics 15:2-15.[Abstract/Free Full Text]
7
- Murphy, C. K., and J. Beckwith. 1994. Residues essential for the function of SecE, a membrane component of the Escherichia coli secretion apparatus, are located in a conserved cytoplasmic region. Proc. Natl. Acad. Sci. USA 91:2557-2561.[Abstract/Free Full Text]
8
- Pohlschroder, M., W. A. Prinz, E. Hartmann, and J. Beckwith. 1997. Protein translocation in the three domains of life: variations on a theme. Cell 91:563-566.[CrossRef][Medline]
9
- Rother, K. I., O. K. Clay, J. P. Bourquin, J. Silke, and W. Schaffner. 1997. Long non-stop reading frames on the antisense strand of heat shock protein 70 genes and prion protein (PrP) genes are conserved between species. Biol. Chem. 378:1521-1530.[Medline]
10
- Salzberg, S. L., A. L. Delcher, S. Kasif, and O. White. 1998. Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26:544-548.[Abstract/Free Full Text]
11
- Veenendaal, A. K., C. van Der Does, and A. J. Driessen. 2001. Mapping the sites of interaction between SecY and SecE by cysteine scanning mutagenesis. J. Biol. Chem. 276:32559-32566.[Abstract/Free Full Text]
Journal of Bacteriology, May 2002, p. 2837-2840, Vol. 184, No. 10
0021-9193/02/$04.00+0 DOI: 10.1128/JB.184.10.2837-2840.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
This article has been cited by other articles:
-
Kakizawa, S., Oshima, K., Nishigawa, H., Jung, H.-Y., Wei, W., Suzuki, S., Tanaka, M., Miyata, S.-i., Ugaki, M., Namba, S.
(2004). Secretion of immunodominant membrane protein from onion yellows phytoplasma through the Sec protein-translocation system in Escherichia coli. Microbiology
150: 135-142
[Abstract]
[Full Text]
-
Bocs, S., Cruveiller, S., Vallenet, D., Nuel, G., Medigue, C.
(2003). AMIGene: Annotation of MIcrobial Genes. Nucleic Acids Res
31: 3723-3726
[Abstract]
[Full Text]
-
Boneca, I. G., Reuse, H. d., Epinat, J.-C., Pupin, M., Labigne, A., Moszer, I.
(2003). A revised annotation and comparative analysis of Helicobacter pylori genomes. Nucleic Acids Res
31: 1704-1714
[Abstract]
[Full Text]