中文 |

Research Progress

An Integrated Proteogenomic Pipeline Help Re-annotate Genome of a Model Diatom Phaeodactylum tricornutum

Sep 11, 2018

Diatoms comprise a large group of photosynthetic eukaryotic phytoplankton, commonly inhabiting freshwater and marine habitats. Phaeodactylum tricornutum is commonly used as a model organism for studying diatom biology. Although its genome was sequenced in 2008, a high quality genome annotation still has not been obtained for this diatom.

Recently, the research group led by Dr. GE Feng at Institute of Hydrobiology (IHB) of Chinese Academy of Sciences developed a systematic approach for conducting an integrated proteogenomic analysis of Phaeodactylum tricornutum using mass spectrometry (MS). The results were published in Molecular Plant. 

Proteogenomic has been applied to the identification of previously unidentified genes and the correction and validation of predicted genes in various organisms. This research group has performed systematic investigation to characterize the proteogenomic analysis of Synechococcus sp. PCC 7002 and developed a one-stop open source software termed GAPP for carrying out genome annotation in prokaryotes. 

In this study, a minimally redundant proteogenomic database was constructed from six-frame translation of genomic sequences and three-frame translation of RNA sequences. To maximize the coverage and achieve in-depth identification of peptides and proteins, the researchers performed two different sample prefractionation methods, two different enzymes digestion, five different algorithms and more stringent false discovery rate (FDR) filtering strategy in the study.

A total of 6,628 Phaeodactylum tricornutum proteins were unambiguous identifications, evidenced by reliable peptides. Based on the protein-coding potential analysis, 1,895 genes generated from the Phaeodactylum tricornutum genome may not code for any proteins.

Besides, the proteogenomic analysis unambiguously revealed 606 possible new protein-coding annotations and 506 corrections to existing gene models, 94 splice variants, and 58 single amino acid variants. Among 606 possible novel genes, 56 genes were confirmed by their proteomic data to be the bona fide proteins that had been previously mis-annotated as lincRNAs in Phaeodactylum tricornutum.

Importantly, they identified 24 different posttranslational modifications (PTM) using the same experimental MS data, which may play important roles in cellular functions.

The findings expanded the genomic landscapes of Phaeodactylum tricornutum and provided a rich resource for the study of diatom biology. The proteogenomic pipeline developed in this study is applicable to any sequenced eukaryotes and so represents a significant contribution to the toolset for eukaryotic proteogenomic analysis.

This study is supported by grants from the National Key Research and Development Program, the National Natural Science Foundation of China, and the Strategic Priority Research Program of Chinese Academy of Sciences.

Contact Us
  • 86-10-68597521 (day)

    86-10-68597289 (night)

  • 86-10-68511095 (day)

    86-10-68512458 (night)

  • cas_en@cas.cn

  • 52 Sanlihe Rd., Xicheng District,

    Beijing, China (100864)

Copyright © 2002 - Chinese Academy of Sciences