Commit 201877d5 authored Apr 06, 2021 by Simon Pilgrim

[CostModel][X86] Improve accuracy of vXi8 multiply reduction costs

After rG47321c311bdbe0145b9bf45d822185c37b19fa50 we promote vXi8 reductions to vXi16 to create a much faster PMULLW mul reduction, followed by a (free) truncation. This avoids the high cost of repeated vXi8 multiplications (which extend+multiply+truncate to/from vXi16 types....).

Fixes the missing vXi8 mul reduction vectorization in PR42674 (Comment #20) 'mul16' test case.

parent 6eb5b06e

Show whitespace changes

Inline Side-by-side

Please to comment