TY - CONF
T1 - Accordion
T2 - 7th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2015
AU - Lewis, Russell
AU - Hartman, John H.
N1 - Funding Information:
6 Conclusions In this work, we presented Accordion, a toolset for creating and using multi-scale recipes. We demonstrated two algorithms for finding duplication using multi-scale recipes, each optimized for a different sce- nario. Top-down worked best when large amounts of duplication were available, and was up to 80x as efficient as single-scale recipes. Bottom-up was optimized for the worst-case (performing no worse than single-scale except in artificial torture tests), and was competitive with top-down in the best case. Acknowledgements This material is based upon work supported by the National Science Foundation under Award Numbers DBI-0735191 and DBI-1265383. URL: www.iplantcollaborative.org Preliminary work on multi-scale recipes was done in connection with CSc 630 (Advanced Topics in Software Systems) at the University of Arizona. Thanks to project partner Gavin Simmons for contributions to the early prototype and to Professors Larry Peterson and Todd Proebsting for inspiration and direction. Thanks to team member Illyoung Choi for feedback as we investigated these algorithms.
Funding Information:
This material is based upon work supported by the National Science Foundation under Award Numbers DBI-0735191 and DBI-1265383. URL: www.iplantcollaborative.org Preliminary work on multi-scale recipes was done in connection with CSc 630 (Advanced Topics in Software Systems) at the University of Arizona. Thanks to project partner Gavin Simmons for contributions to the early prototype and to Professors Larry Peterson and Todd Proebsting for inspiration and direction. Thanks to team member Illyoung Choi for feedback as we investigated these algorithms.
Publisher Copyright:
© USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2015.All right reserved.
PY - 2015
Y1 - 2015
N2 - A recipe is metadata that describes the contents of a file as a sequence of blocks identified by their hash. Using recipes, one can rapidly compare the contents of two files without reading the files themselves. Unfortunately, recipes present a space/precision tradeoff: small block sizes will maximize the duplication that is discoverable, but large block sizes produce small recipes that can be compared more quickly. In this paper, we present Accordion, a toolset for the creation and use of multi-scale recipes - that is, recipes that include blocks at several different scales. We demonstrate two duplication-detection algorithms - one optimized for situations where lots of duplication is expected, and another for those where the existence of duplication is uncertain.
AB - A recipe is metadata that describes the contents of a file as a sequence of blocks identified by their hash. Using recipes, one can rapidly compare the contents of two files without reading the files themselves. Unfortunately, recipes present a space/precision tradeoff: small block sizes will maximize the duplication that is discoverable, but large block sizes produce small recipes that can be compared more quickly. In this paper, we present Accordion, a toolset for the creation and use of multi-scale recipes - that is, recipes that include blocks at several different scales. We demonstrate two duplication-detection algorithms - one optimized for situations where lots of duplication is expected, and another for those where the existence of duplication is uncertain.
UR - http://www.scopus.com/inward/record.url?scp=85088232857&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088232857&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85088232857
Y2 - 6 July 2015 through 7 July 2015
ER -