TY - GEN
T1 - A gossip-based system for fast approximate score computation in multinomial Bayesian networks
AU - Zachariah, Arun
AU - Rao, Praveen
AU - Katib, Anas
AU - Senapati, Monica
AU - Barnard, Kobus
N1 - Funding Information:
ACKNOWLEDGMENTS Praveen Rao would like to acknowledge the partial support of NSF Grant No. 1747751.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/4
Y1 - 2019/4
N2 - In this paper, we present a system for fast approximate score computation, a fundamental task for score-based structure learning of multinomial Bayesian networks. Our work is motivated by the fact that exact score computation on large datasets is very time consuming. Our system enables approximate score computation on large datasets in an efficient and scalable manner with probabilistic error bounds on the statistics required for score computation. Our system has several novel features including gossip-based decentralized computation of statistics, lower resource consumption via a probabilistic approach of maintaining statistics, and effective distribution of tasks for score computation using hashing techniques. The demo will provide a real-time and interactive experience to a user on how our system employs the principle of gossiping and hashing techniques in a novel way for fast approximate score computation. The user will be able to control different aspects of our system's execution on a cluster with up to 32 nodes. The approximate scores output by our system can be then used by existing score-based structure learning algorithms.
AB - In this paper, we present a system for fast approximate score computation, a fundamental task for score-based structure learning of multinomial Bayesian networks. Our work is motivated by the fact that exact score computation on large datasets is very time consuming. Our system enables approximate score computation on large datasets in an efficient and scalable manner with probabilistic error bounds on the statistics required for score computation. Our system has several novel features including gossip-based decentralized computation of statistics, lower resource consumption via a probabilistic approach of maintaining statistics, and effective distribution of tasks for score computation using hashing techniques. The demo will provide a real-time and interactive experience to a user on how our system employs the principle of gossiping and hashing techniques in a novel way for fast approximate score computation. The user will be able to control different aspects of our system's execution on a cluster with up to 32 nodes. The approximate scores output by our system can be then used by existing score-based structure learning algorithms.
KW - Approximate score computation
KW - Bayesian networks
KW - Gossip algorithms
KW - Large scale data
UR - http://www.scopus.com/inward/record.url?scp=85067932574&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85067932574&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2019.00216
DO - 10.1109/ICDE.2019.00216
M3 - Conference contribution
AN - SCOPUS:85067932574
T3 - Proceedings - International Conference on Data Engineering
SP - 1968
EP - 1971
BT - Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019
PB - IEEE Computer Society
T2 - 35th IEEE International Conference on Data Engineering, ICDE 2019
Y2 - 8 April 2019 through 11 April 2019
ER -