TY - GEN
T1 - Scalable score computation for learning multinomial Bayesian networks over distributed data
AU - Rao, Praveen
AU - Katib, Anas
AU - Barnard, Kobus
AU - Kamhoua, Charles
AU - Kwiat, Kevin
AU - Njilla, Laurent
N1 - Publisher Copyright:
© 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2017
Y1 - 2017
N2 - In this paper, we focus on the problem of learning a Bayesian network over distributed data stored in a commodity cluster. Specifically, we address the challenge of computing the scoring function over distributed data in a scalable manner, which is a fundamental task during learning. We propose a novel approach designed to achieve: (a) scalable score computation using the principle of gossiping; (b) lower resource consumption via a probabilistic approach for maintaining scores using the properties of a Markov chain; and (c) effective distribution of tasks during score computation (on large datasets) by syncrgistically combining well-known hashing techniques. Through theoretical analysis, we show that our approach is superior to a MapReduce-style computation in terms of communication bandwidth. Further, it is superior to the batch- style processing of MapReduce for recomputing scores when new data are available.
AB - In this paper, we focus on the problem of learning a Bayesian network over distributed data stored in a commodity cluster. Specifically, we address the challenge of computing the scoring function over distributed data in a scalable manner, which is a fundamental task during learning. We propose a novel approach designed to achieve: (a) scalable score computation using the principle of gossiping; (b) lower resource consumption via a probabilistic approach for maintaining scores using the properties of a Markov chain; and (c) effective distribution of tasks during score computation (on large datasets) by syncrgistically combining well-known hashing techniques. Through theoretical analysis, we show that our approach is superior to a MapReduce-style computation in terms of communication bandwidth. Further, it is superior to the batch- style processing of MapReduce for recomputing scores when new data are available.
UR - http://www.scopus.com/inward/record.url?scp=85046106223&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046106223&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85046106223
T3 - AAAI Workshop - Technical Report
SP - 498
EP - 504
BT - WS-17-01
PB - AI Access Foundation
T2 - 31st AAAI Conference on Artificial Intelligence, AAAI 2017
Y2 - 4 February 2017 through 5 February 2017
ER -