Integrating evidence for evaluation of potential novel protein-coding genes using Bayesian networks

Nan Wang, Changhe Yuan, Shane Burgess, Bindu Nanduri, Mark Lawrence, Susan Bridges

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Evaluating the quality of potential new protein-coding genes that have been predicted by directly searching mass spectrometry against genome sequence is a very challenging task. Many machine learning techniques such as neural networks, decision trees, and support vector machines have been applied to this task. All of these techniques learn a model from a training dataset and predict the quality of potential novel protein-coding genes using various evidential features as inputs. The quality and quantity of the training dataset significantly affect the performance of the learned models. In biological research, data collected is often incomplete and with very few data points. It is desirable to have methods that are robust to noisy data and low sample-size. Furthermore, the models learned by these machine learning techniques typically do not reveal the conditional (in)dependence relations among the evidential features. Gaining insight into the relationships among features is important for biological domains .In this paper, we describe methods for learning Bayesian networks for modeling the conditional (in)dependence relations among features of protein-coding genes and calculating confidence scores for potential novel genes based on their evidential features. Bootstrap methods are applied to assess the confidence measure on the arcs of the learned network structures and to identify a set of robust arcs in order to construct a final model for future predictions. We tested the Bayesian network model learned from our method using a training experimental dataset. The results show that the method significantly improved the accuracy of the learned model in predicting potential novel genes.

Original languageEnglish (US)
Title of host publicationProceedings of the 2008 International Conference on Bioinformatics and Computational Biology, BIOCOMP 2008
Pages838-843
Number of pages6
StatePublished - 2008
Externally publishedYes
Event2008 International Conference on Bioinformatics and Computational Biology, BIOCOMP 2008 - Las Vegas, NV, United States
Duration: Jul 14 2008Jul 17 2008

Publication series

NameProceedings of the 2008 International Conference on Bioinformatics and Computational Biology, BIOCOMP 2008

Other

Other2008 International Conference on Bioinformatics and Computational Biology, BIOCOMP 2008
Country/TerritoryUnited States
CityLas Vegas, NV
Period7/14/087/17/08

Keywords

  • Bayesian networks
  • Bootstrap
  • protein evaluation

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Biomedical Engineering
  • Health Informatics

Fingerprint

Dive into the research topics of 'Integrating evidence for evaluation of potential novel protein-coding genes using Bayesian networks'. Together they form a unique fingerprint.

Cite this