Keywords: hash, hash-table, open addressing, chain, collision, memory level parallelism, cache, cache-line, cache miss
Impact of key representation size on hash tables performance in modern CPUs
UDC 519.6
DOI: 10.26102/2310-6018/2025.50.3.014
This paper considers a method for increasing the search speed in hash tables with links if the problem assumes that the performance is limited by the throughput of one of the interfaces between the storage levels (caches L1, L2, L3, memory). To reduce the impact of this limitation, an algorithm for optimal use of the cache line size, the minimum portion of information transferred between the storage levels, is proposed. The paper shows that there is an optimal size of information about a key in a hash table (key representation) for a specific problem and architecture; equations are given for its numerical and approximate analytical calculation for the cases of a key present and absent in the table. A separate case of using a part of a key as its representation in the table is considered. An algorithm for working with inconvenient key representation sizes that are not a power of two is proposed. The presented calculation results confirm the increase in search performance when using a calculated key representation size compared to other options. The presented experimental result confirms the assumption that the associated complication of the code has virtually no effect on performance due to partial processor idleness. The work assumes collision resolution via chains, but similar calculations should be applicable to other methods given their specific features.
1. Knuth D.E. The Art of Computer Programming. Volume 3. Sorting and Searching. Moscow: I.D. Vil'yams; 2018. 832 p. (In Russ.).
2. Farach-Colton M., Krapivin A., Kuszmaul W. Optimal Bounds for Open Addressing Without Reordering. In: 2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS), 27–30 October 2024, Chicago, IL, USA. IEEE; 2024. P. 594–605. https://doi.org/10.1109/FOCS61266.2024.00045
3. Zukowski M., Héman S., Boncz P. Architecture-Conscious Hashing. In: DaMoN '06: Proceedings of the 2nd International Workshop on Data Management on New Hardware, 25 June 2006, Chicago, IL, USA. New York: Association for Computing Machinery; 2006. https://doi.org/10.1145/1140402.114041
4. De Cristo F., De Almeida E., Alves M. ViViD Cuckoo Hash: Fast Cuckoo Table Building in SIMD. In: Proceedings of the 20th Symposium on High Performance Computing Systems, SSCAD 2019, 16–18 October 2019, Campo Grande, MS, Brasil. Porto Alegre: Sociedade Brasileira de Computação; 2019. P. 288–299. https://doi.org/10.5753/wscad.2019.8676
5. Thorup M. String Hashing for Linear Probing. In: SODA '09: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, 04–06 January 2009, New York, USA. Philadelphia: Society for Industrial and Applied Mathematics; 2009. P. 655–664. https://doi.org/10.1137/1.9781611973068.72
6. Barredo A., Cebrian J.M., Valero M., Casas M., Moreto M. Efficiency Analysis of Modern Vector Architectures: Vector ALU Sizes, Core Counts and Clock Frequencies. The Journal of Supercomputing. 2020;76(3):1960–1979. https://doi.org/10.1007/s11227-019-02841-6
7. Chen S., Ailamaki A., Gibbons P.B., Mowry T.C. Improving Hash Join Performance Through Prefetching. ACM Transactions on Database Systems. 2007;32(3). https://doi.org/10.1145/1272743.1272747
8. Ross K.A. Efficient Hash Probes on Modern Processors. In: 2007 IEEE 23rd International Conference on Data Engineering, 15 April 2006 – 20 April 2007, Istanbul, Turkey. IEEE; 2007. P. 1297–1301. https://doi.org/10.1109/ICDE.2007.368997
9. Herlihy M., Shavit N., Tzafrir M. Hopscotch Hashing. In: Distributed Computing: 22nd International Symposium, DISC 2008: Proceedings, 22–24 September 2008, Arcachon, France. Berlin, Heidelberg: Springer; 2008. P. 350–364. https://doi.org/10.1007/978-3-540-87779-0_24
10. Tripathi R.R.K., Singh P.K., Singh S. Templing and Temple Search-Same Height Hashing. [Preprint]. ResearchSquare. URL: https://doi.org/10.21203/rs.3.rs-1947527/v1 [Accessed 21st May 2025].
Keywords: hash, hash-table, open addressing, chain, collision, memory level parallelism, cache, cache-line, cache miss
For citation: Buevich E.A. Impact of key representation size on hash tables performance in modern CPUs. Modeling, Optimization and Information Technology. 2025;13(3). URL: https://moitvivt.ru/ru/journal/pdf?id=1974 DOI: 10.26102/2310-6018/2025.50.3.014 .
Received 27.05.2025
Revised 27.06.2025
Accepted 08.07.2025