TY - GEN
T1 - Proteogenomic mapping for structural annotation of prokaryote genomes
AU - Wang, Nan
AU - Burgess, Shane
AU - Lawrence, Mark
AU - Bridges, Susan
PY - 2009
Y1 - 2009
N2 - Structural annotation of genomes is one of major goals of genomics research. Most popular tools for structural annotation of genomes are determined by computational pipelines. It is well-known that these computational methods have a number of shortcomings including false identifications and incorrect identification of gene boundaries. Proteomic data can used to confirm the identification of genes identified by computational methods and correct mistakes. A Proteogenomic mapping method has been developed, which uses peptides identified from mass spectrometry for structural annotation of genomes. Spectra are matched against both a protein database and the genome database translated in all six reading frames. Those peptides that match the genome but not the protein database potentially represent novel protein coding genes, annotation errors. These short experimentally derived peptides are used to discover potential novel protein coding genes called expressed Protein Sequence Tags (ePSTs) by aligning the peptides to the genomic DNA and extending the translation in the 3' and 5' direction. In the paper, an enhanced pipeline, has been designed and developed for discovering and evaluating of potential novel protein coding genes: 1) a distance-based outlier detection method for validating peptides identified from MS/MS, 2) a proteogenomic mapping for discovery of potential novel protein coding genes, 3) collection of evidence from a number of sources and automatically evaluate potential novel protein coding genes by using machine learning techniques, such as Neural Network, Support Vector Machine, Naïve Bayes etc.
AB - Structural annotation of genomes is one of major goals of genomics research. Most popular tools for structural annotation of genomes are determined by computational pipelines. It is well-known that these computational methods have a number of shortcomings including false identifications and incorrect identification of gene boundaries. Proteomic data can used to confirm the identification of genes identified by computational methods and correct mistakes. A Proteogenomic mapping method has been developed, which uses peptides identified from mass spectrometry for structural annotation of genomes. Spectra are matched against both a protein database and the genome database translated in all six reading frames. Those peptides that match the genome but not the protein database potentially represent novel protein coding genes, annotation errors. These short experimentally derived peptides are used to discover potential novel protein coding genes called expressed Protein Sequence Tags (ePSTs) by aligning the peptides to the genomic DNA and extending the translation in the 3' and 5' direction. In the paper, an enhanced pipeline, has been designed and developed for discovering and evaluating of potential novel protein coding genes: 1) a distance-based outlier detection method for validating peptides identified from MS/MS, 2) a proteogenomic mapping for discovery of potential novel protein coding genes, 3) collection of evidence from a number of sources and automatically evaluate potential novel protein coding genes by using machine learning techniques, such as Neural Network, Support Vector Machine, Naïve Bayes etc.
KW - Bayesian network
KW - Expressed protein sequence tags
KW - Naïve bayes
KW - Neural network
KW - Outlier detection
KW - PST
KW - Peptide validation
KW - Potential genes
KW - Proteogenomic mapping
KW - Support vector machine
KW - Target decoy strategy
UR - http://www.scopus.com/inward/record.url?scp=70450158420&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70450158420&partnerID=8YFLogxK
U2 - 10.1109/IJCBS.2009.126
DO - 10.1109/IJCBS.2009.126
M3 - Conference contribution
AN - SCOPUS:70450158420
SN - 9780769537399
T3 - Proceedings - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009
SP - 103
EP - 106
BT - Proceedings - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009
T2 - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009
Y2 - 3 August 2009 through 5 August 2009
ER -