TY - JOUR
T1 - Processing recursive XQuery over XML streams
T2 - The Raindrop approach
AU - Wei, Mingzhu
AU - Rundensteiner, Elke A.
AU - Mani, Murali
AU - Li, Ming
N1 - Funding Information:
This research is supported by NSF under Grant No. NSF IIS-0414567.
Funding Information:
Prof. Rundensteiner is a well-known expert in databases and information systems, having spend over 20 years of her career focussing on the development of scalable data management technology in support of advanced applications including manufacturing and automation, human genome and digital libraries. Her current research interests include scalable stream data management, XML and web data management, data integration and migration, data warehousing for distributed systems, and large-scale visual information exploration. She has over 280 publications in these and related areas. Her research has been funded by government agencies including NSF, NIH and industry like IBM, Verizon Labs, GTE, NEC, and others. She has been recipient of numerous honors and awards, including the NSF Young Investigator Grant, Sigma Xi Outstanding Senior Faculty Researcher Award, and WPI Trustees’ Award for outstanding research and creative scholarship. She is on numerous program committees of prestigious conferences in the database field and editor of several journals, including Associate Editor of the IEEE Transactions on Data and Knowledge Engineering Journal.
PY - 2008/5
Y1 - 2008/5
N2 - XML stream applications bring the challenge of efficiently processing queries on sequentially accessible token-based data. For efficient processing of queries, we need to ensure that memory usage stays low. This in turn requires that we avoid holding data in the query buffer, by outputting it at the earliest possible time. In this paper, we propose a new class of stream algebra operators for efficient recursive XQuery stream processing. Our plan generator will analyze the query, and the schema when available to determine which join operators in the query need recursive join support and thus can plug in the more inexpensive just-in-time structural join whenever possible. In particular, we propose two strategies for implementing structural joins: (a) the just-in-time structural join strategy efficiently processes joins over non-recursive XML token streams; and (b) the recursive structural join strategy supports structural joins over recursive XML substreams, however, at an added cost of generating and comparing tuple-level ID. Both structural join strategies are complemented by an automata-driven invocation mechanism that triggers the execution of each join process at the first possible moment upon recognizing the end of the targeted input stream subelement. Further, we design this StructuralJoin operator itself to be context-aware. The operator is capable of at run-time switching from the efficient just-in-time join strategy for elements that are recognized to be non-recursive to the more powerful ID-based structural join strategy for elements that are identified to be recursive. We incorporate the proposed techniques into the Raindrop stream engine. We also report on experimental studies we conducted using the ToXgene benchmark that demonstrate that the performance improvements of the techniques.
AB - XML stream applications bring the challenge of efficiently processing queries on sequentially accessible token-based data. For efficient processing of queries, we need to ensure that memory usage stays low. This in turn requires that we avoid holding data in the query buffer, by outputting it at the earliest possible time. In this paper, we propose a new class of stream algebra operators for efficient recursive XQuery stream processing. Our plan generator will analyze the query, and the schema when available to determine which join operators in the query need recursive join support and thus can plug in the more inexpensive just-in-time structural join whenever possible. In particular, we propose two strategies for implementing structural joins: (a) the just-in-time structural join strategy efficiently processes joins over non-recursive XML token streams; and (b) the recursive structural join strategy supports structural joins over recursive XML substreams, however, at an added cost of generating and comparing tuple-level ID. Both structural join strategies are complemented by an automata-driven invocation mechanism that triggers the execution of each join process at the first possible moment upon recognizing the end of the targeted input stream subelement. Further, we design this StructuralJoin operator itself to be context-aware. The operator is capable of at run-time switching from the efficient just-in-time join strategy for elements that are recognized to be non-recursive to the more powerful ID-based structural join strategy for elements that are identified to be recursive. We incorporate the proposed techniques into the Raindrop stream engine. We also report on experimental studies we conducted using the ToXgene benchmark that demonstrate that the performance improvements of the techniques.
KW - Query optimization
KW - Recursive query
KW - Structural join
KW - XML
KW - XQuery processing
UR - http://www.scopus.com/inward/record.url?scp=41149152394&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=41149152394&partnerID=8YFLogxK
U2 - 10.1016/j.datak.2007.09.007
DO - 10.1016/j.datak.2007.09.007
M3 - Article
AN - SCOPUS:41149152394
VL - 65
SP - 243
EP - 265
JO - Data and Knowledge Engineering
JF - Data and Knowledge Engineering
SN - 0169-023X
IS - 2
ER -