Commit 68ef68f8 authored May 24, 2021 by Simon Pilgrim

[CostModel][X86] Improve accuracy of vXi8/vXi16 vector non-uniform shift costs...

[CostModel][X86] Improve accuracy of vXi8/vXi16 vector non-uniform shift costs on AVX2/AVX512 targets

Determined from llvm-mca analysis, AVX2+ capable targets have a higher throughput for VPBLENDVB and VPMOVZX ops, making it cheaper to perform shift+select patterns for vXi8 shifts or extend/shift/truncate for vXi16 shifts. Similarly AVX512BW can perform vXi8 as extend/shift/truncate patterns.

parent e3b8e6d4

Show whitespace changes

Inline Side-by-side

Please to comment