Unverified Commit cf9b1f7a authored Jun 01, 2021 by Roman Lebedev

[X86] Split FeatureFastVariableShuffle tuning into Lane-Crossing and Per-Lane variants

Currently, X86 backend only has a global one-size-fits-all `FeatureFastVariableShuffle` feature,
which controls profitability of both the cross-lane and per-lane variable shuffles.
I guess, this has been fine so far.

But at least on AMD Zen 3, while per-line variable shuffles (e.g. `VPSHUFB`)
are as fast as as shuffles with fixed/immediate mask,
while lane-crossing shuffles, e.g. `VPERMPS` is performing worse.

So to get the benefits of variable-mask shuffles, but not the drawbacks of lane-crossing shuffles,
as suggested by @RKSimon, split the feature flag into two.

Differential Revision: https://reviews.llvm.org/D103274

parent 41d79093

Expand all Show whitespace changes

Inline Side-by-side

Please to comment