Commit 48e93f57 authored Jul 18, 2023 by Fangrui Song
[Support] Add llvm::xxh3_64bits

ld.lld SHF_MERGE|SHF_STRINGS duplicate elimination is computation heavy
and utilitizes llvm::xxHash64, a simplified version of XXH64.
Externally many sources confirm that a new variant XXH3 is much faster.

I have picked a few hash implementations and computed the
proportion of time spent on hashing in the overall link time (a debug
build of clang 16 on a machine using AMD Zen 2 architecture):

* llvm::xxHash64: 3.63%
* official XXH64 (`#define XXH_VECTOR XXH_SCALAR`): 3.53%
* official XXH3_64bits (`#define XXH_VECTOR XXH_SCALAR`): 1.21%
* official XXH3_64bits (default, essentially `XXH_SSE2`): 1.22%
* this patch llvm::xxh3_64bits: 1.19%

The remaining part of lld remains unchanged. Consequently, a lower ratio
indicates that hashing is faster. Therefore, it is evident that XXH3 from xxhash
is significantly faster than both the official version and our llvm::xxHash64.

(
string length: count
1-3: 393434
4-8: 2084056
9-16: 2846249
17-128: 5598928
129-240: 1317989
241-: 328058
)

This patch adds heavily simplified https://github.com/Cyan4973/xxHash,
taking account of many simplification ideas from Devin Hussey's xxhash-clean.

Important x86-64 optimization ideas:

* Make XXH3_len_129to240_64b and XXH3_hashLong_64b noinline
* Unroll XXH3_len_17to128_64b
* __restrict does not affect Clang code generation

Beside SHF_MERGE|SHF_STRINGS duplicate elimination, llvm/ADT/StringMap.h
StringMapImpl::LookupBucketFor and a few places in lld can potentially be
accelerated by switching to llvm::xxh3_64bits.

Link: https://github.com/llvm/llvm-project/issues/63750

Reviewed By: serge-sans-paille

Differential Revision: https://reviews.llvm.org/D154812
parent e4aa1428
Show whitespace changes
Inline Side-by-side
Please to comment