TY - JOUR
T1 - Exploring the reproducibility of probabilistic causal molecular network models
AU - Cohain, Ariella
AU - Divaraniya, Aparna A.
AU - Zhu, Kuixi
AU - Scarpa, Joseph R.
AU - Kasarskis, Andrew
AU - Zhu, Jun
AU - Chang, Rui
AU - Dudley, Joel T.
AU - Schadt, Eric E.
N1 - Funding Information:
Funding for this project was provided by National Institutes of Health (NIH) grants U54CA189201, R01DK098242, 5U01AG046170, and 1R01MH109897 and Leducq Foundation grant 12CVD02.
Funding Information:
Funding for this project was provided by National Institute of Health (NIH) grants U54CA189201, R01DK098242, 5U01AG046170, and 1R01MH109897 and Leducq Foundation grant 12CVD02.
Publisher Copyright:
© 2017, World Scientific Publishing Co. Pte. Ltd. All rights reserved.
PY - 2017
Y1 - 2017
N2 - Network reconstruction algorithms are increasingly being employed in biomedical and life sciences research to integrate large-scale, high-dimensional data informing on living systems. One particular class of probabilistic causal networks being applied to model the complexity and causal structure of biological data is Bayesian networks (BNs). BNs provide an elegant mathematical framework for not only inferring causal relationships among many different molecular and higher order phenotypes, but also for incorporating highly diverse priors that provide an efficient path for incorporating existing knowledge. While significant methodological developments have broadly enabled the application of BNs to generate and validate meaningful biological hypotheses, the reproducibility of BNs in this context has not been systematically explored. In this study, we aim to determine the criteria for generating reproducible BNs in the context of transcription-based regulatory networks. We utilize two unique tissues from independent datasets, whole blood from the GTEx Consortium and liver from the Stockholm-Tartu Atherosclerosis Reverse Network Engineering Team (STARNET) study. We evaluated the reproducibility of the BNs by creating networks on data subsampled at different levels from each cohort and comparing these networks to the BNs constructed using the complete data. To help validate our results, we used simulated networks at varying sample sizes. Our study indicates that reproducibility of BNs in biological research is an issue worthy of further consideration, especially in light of the many publications that now employ findings from such constructs without appropriate attention paid to reproducibility. We find that while edge-to-edge reproducibility is strongly dependent on sample size, identification of more highly connected key driver nodes in BNs can be carried out with high confidence across a range of sample sizes.
AB - Network reconstruction algorithms are increasingly being employed in biomedical and life sciences research to integrate large-scale, high-dimensional data informing on living systems. One particular class of probabilistic causal networks being applied to model the complexity and causal structure of biological data is Bayesian networks (BNs). BNs provide an elegant mathematical framework for not only inferring causal relationships among many different molecular and higher order phenotypes, but also for incorporating highly diverse priors that provide an efficient path for incorporating existing knowledge. While significant methodological developments have broadly enabled the application of BNs to generate and validate meaningful biological hypotheses, the reproducibility of BNs in this context has not been systematically explored. In this study, we aim to determine the criteria for generating reproducible BNs in the context of transcription-based regulatory networks. We utilize two unique tissues from independent datasets, whole blood from the GTEx Consortium and liver from the Stockholm-Tartu Atherosclerosis Reverse Network Engineering Team (STARNET) study. We evaluated the reproducibility of the BNs by creating networks on data subsampled at different levels from each cohort and comparing these networks to the BNs constructed using the complete data. To help validate our results, we used simulated networks at varying sample sizes. Our study indicates that reproducibility of BNs in biological research is an issue worthy of further consideration, especially in light of the many publications that now employ findings from such constructs without appropriate attention paid to reproducibility. We find that while edge-to-edge reproducibility is strongly dependent on sample size, identification of more highly connected key driver nodes in BNs can be carried out with high confidence across a range of sample sizes.
UR - http://www.scopus.com/inward/record.url?scp=85024368267&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85024368267&partnerID=8YFLogxK
U2 - 10.1142/9789813207813_0013
DO - 10.1142/9789813207813_0013
M3 - Conference article
C2 - 27896968
AN - SCOPUS:85024368267
SN - 2335-6928
VL - 0
SP - 120
EP - 131
JO - Pacific Symposium on Biocomputing
JF - Pacific Symposium on Biocomputing
T2 - 22nd Pacific Symposium on Biocomputing, PSB 2017
Y2 - 4 January 2017 through 8 January 2017
ER -