How caching affects hashing

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Scopus citations

Abstract

A number of recent papers have considered the influence of modern computer memory hierarchies on the performance of hashing algorithms [1, 2, 3]. Motivation for these papers is drawn from recent technology trends that have produced an ever-widening gap between the speed of CPUs and the latency of dynamic random access memories. The result is an emerging computing folklore which contends that inferior hash functions, in terms of the number of collisions they produce, may in fact load to superior performance because these collisions mainly occur in cache rather than main memory. This line of reasoning is the antithesis of that used to justify most of the improvements that have been proposed for open address hashing over the past forty years. Such improvements have generally sought to minimize collisions by spreading data elements more randomly through the hash table. Indeed the name "hashing" itself is meant to convey this notion [12], However, the very act of spreading the data elements throughout the table negatively impacts their degree of spatial locality in computer memory, thereby increasing the likelihood of cache misses during long probe sequences. In this paper we study the performance trade-offs that exist when implementing open address hash functions on contemporary computers. Experimental analyses are reported that make use of a variety of different hash functions, ranging from linear probing to highly "chaotic" forms of double hashing, using data sets that are justified through information-theoretic analyses. Our results, clarify and extend those in a number of recently published papers, show that the savings gained by reducing collisions (and therefore probe sequence lengths) usually compensate for any increase in cache misses. That is, due to primary clustering, the variability in the performance of linear probing can be extreme, in the worst case producing running times orders of magni tude worse than double hashing, and in the best case yielding running times only slightly better than double hashing.

Original languageEnglish (US)
Title of host publicationProceedings of the Seventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics
EditorsC. Demetrescu, R. Sedgewick, R. Tamassia
Pages141-154
Number of pages14
StatePublished - 2005
Externally publishedYes
EventSeventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics - Vancouver, BC, Canada
Duration: Jan 22 2005Jan 22 2005

Publication series

NameProceedings of the Seventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics

Other

OtherSeventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics
Country/TerritoryCanada
CityVancouver, BC
Period1/22/051/22/05

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'How caching affects hashing'. Together they form a unique fingerprint.

Cite this