TY - GEN
T1 - Efficiently loading and processing XML streams
AU - Li, Ming
AU - Mani, Murali
AU - Rundensteiner, Elke A.
PY - 2008
Y1 - 2008
N2 - XML stream applications bring the novel challenge of efficiently processing queries on sequentially accessible token-based input streams. Our Raindrop project is the first to accommodate token-based stream processing using an algebraic framework where both tokens and tuples are modeled in a uniform manner. In this paper, we illustrate how the stream loading model of our system on the fly conducts XML navigation over the input stream via concurrently constructing a minimized light-weight XML tree representation, which is called navigation-free data instance. These captured XML fragments are minimized in terms of buffer consumption. Based on the compact representation of the navigation-free data instances, we propose techniques for subsequent algebraic query evaluation, in particular, effective strategies for supporting multi-mode query operators and alternative data output semantics. The proposed stream loading model requires a much smaller buffer footprint, compared to alternative solutions in the literature such as Y-Filter. And the proposed algebra-based evaluation techniques offer effective ways to handle data recursion over XML streams, i.e., avoiding overhead from the structural join operators. Our stream loading and query evaluation techniques have been implemented as part of the Raindrop system. Experimental results based on the Raindrop system are also reported in this paper.
AB - XML stream applications bring the novel challenge of efficiently processing queries on sequentially accessible token-based input streams. Our Raindrop project is the first to accommodate token-based stream processing using an algebraic framework where both tokens and tuples are modeled in a uniform manner. In this paper, we illustrate how the stream loading model of our system on the fly conducts XML navigation over the input stream via concurrently constructing a minimized light-weight XML tree representation, which is called navigation-free data instance. These captured XML fragments are minimized in terms of buffer consumption. Based on the compact representation of the navigation-free data instances, we propose techniques for subsequent algebraic query evaluation, in particular, effective strategies for supporting multi-mode query operators and alternative data output semantics. The proposed stream loading model requires a much smaller buffer footprint, compared to alternative solutions in the literature such as Y-Filter. And the proposed algebra-based evaluation techniques offer effective ways to handle data recursion over XML streams, i.e., avoiding overhead from the structural join operators. Our stream loading and query evaluation techniques have been implemented as part of the Raindrop system. Experimental results based on the Raindrop system are also reported in this paper.
KW - XML
KW - XQuery
KW - query algebra
KW - query processing
KW - stream
UR - http://www.scopus.com/inward/record.url?scp=77954440132&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77954440132&partnerID=8YFLogxK
U2 - 10.1145/1451940.1451950
DO - 10.1145/1451940.1451950
M3 - Conference contribution
AN - SCOPUS:77954440132
SN - 9781605581880
T3 - ACM International Conference Proceeding Series
SP - 59
EP - 67
BT - Proceedings of IDEAS'08
T2 - International Database Engineering and Applications Symposium, IDEAS'08
Y2 - 10 September 2008 through 12 September 2008
ER -