## Abstract

A number of recent papers have considered the influence of modern computer memory hierarchies on the performance of hashing algorithms [1, 2, 3]. Motivation for these papers is drawn from recent technology trends that have produced an ever-widening gap between the speed of CPUs and the latency of dynamic random access memories. The result is an emerging computing folklore which contends that inferior hash functions, in terms of the number of collisions they produce, may in fact load to superior performance because these collisions mainly occur in cache rather than main memory. This line of reasoning is the antithesis of that used to justify most of the improvements that have been proposed for open address hashing over the past forty years. Such improvements have generally sought to minimize collisions by spreading data elements more randomly through the hash table. Indeed the name "hashing" itself is meant to convey this notion [12], However, the very act of spreading the data elements throughout the table negatively impacts their degree of spatial locality in computer memory, thereby increasing the likelihood of cache misses during long probe sequences. In this paper we study the performance trade-offs that exist when implementing open address hash functions on contemporary computers. Experimental analyses are reported that make use of a variety of different hash functions, ranging from linear probing to highly "chaotic" forms of double hashing, using data sets that are justified through information-theoretic analyses. Our results, clarify and extend those in a number of recently published papers, show that the savings gained by reducing collisions (and therefore probe sequence lengths) usually compensate for any increase in cache misses. That is, due to primary clustering, the variability in the performance of linear probing can be extreme, in the worst case producing running times orders of magni tude worse than double hashing, and in the best case yielding running times only slightly better than double hashing.

Original language | English (US) |
---|---|

Title of host publication | Proceedings of the Seventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics |

Editors | C. Demetrescu, R. Sedgewick, R. Tamassia |

Pages | 141-154 |

Number of pages | 14 |

State | Published - 2005 |

Externally published | Yes |

Event | Seventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics - Vancouver, BC, Canada Duration: Jan 22 2005 → Jan 22 2005 |

### Publication series

Name | Proceedings of the Seventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics |
---|

### Other

Other | Seventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithms and Combinatorics |
---|---|

Country/Territory | Canada |

City | Vancouver, BC |

Period | 1/22/05 → 1/22/05 |

## ASJC Scopus subject areas

- Engineering(all)