TY - JOUR
T1 - Using evolutive summary counters for efficient cooperative caching in search engines
AU - Dominguez-Sal, David
AU - Aguilar-Saborit, Josep
AU - Surdeanu, Mihai
AU - Larriba-Pey, Josep Lluis
N1 - Funding Information:
The members of DAMA-UPC thank the Ministry of Science and Innovation of Spain and Generalitat de Catalunya, for grant numbers TIN2009-14560-C03-03 and GRC-1087, respectively.
PY - 2012
Y1 - 2012
N2 - We propose and analyze a distributed cooperative caching strategy based on the Evolutive Summary Counters (ESC), a new data structure that stores an approximated record of the data accesses in each computing node of a search engine. The ESC capture the frequency of accesses to the elements of a data collection, and the evolution of the access patterns for each node in a network of computers. The ESC can be efficiently summarized into what we call ESC-summaries to obtain approximate statistics of the document entries accessed by each computing node. We use the ESC-summaries to introduce two algorithms that manage our distributed caching strategy, one for the distribution of the cache contents, ESC-placement, and another one for the search of documents in the distributed cache, ESC-search. While the former improves the hit rate of the system and keeps a large ratio of data accesses local, the latter reduces the network traffic by restricting the number of nodes queried to find a document. We show that our cooperative caching approach outperforms state-of-the-art models in both hit rate, throughput, and location recall for multiple scenarios, i.e., different query distributions and systems with varying degrees of complexity.
AB - We propose and analyze a distributed cooperative caching strategy based on the Evolutive Summary Counters (ESC), a new data structure that stores an approximated record of the data accesses in each computing node of a search engine. The ESC capture the frequency of accesses to the elements of a data collection, and the evolution of the access patterns for each node in a network of computers. The ESC can be efficiently summarized into what we call ESC-summaries to obtain approximate statistics of the document entries accessed by each computing node. We use the ESC-summaries to introduce two algorithms that manage our distributed caching strategy, one for the distribution of the cache contents, ESC-placement, and another one for the search of documents in the distributed cache, ESC-search. While the former improves the hit rate of the system and keeps a large ratio of data accesses local, the latter reduces the network traffic by restricting the number of nodes queried to find a document. We show that our cooperative caching approach outperforms state-of-the-art models in both hit rate, throughput, and location recall for multiple scenarios, i.e., different query distributions and systems with varying degrees of complexity.
KW - Distributed systems
KW - count filter
KW - distributed caching
KW - resource intensive applications
UR - http://www.scopus.com/inward/record.url?scp=84858075067&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84858075067&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2011.162
DO - 10.1109/TPDS.2011.162
M3 - Article
AN - SCOPUS:84858075067
SN - 1045-9219
VL - 23
SP - 776
EP - 784
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 4
M1 - 5871600
ER -