[X86][SSE] Aggressively use PMADDWD for v4i32 multiplies with 17 or more leading zeros (9f551ad6) · Commits · Lorenzo Albano / LLVM bpEVL

Commit 9f551ad6 authored Jan 24, 2018 by Simon Pilgrim

[X86][SSE] Aggressively use PMADDWD for v4i32 multiplies with 17 or more leading zeros

As discussed in D41484, PMADDWD for 'zero extended' vXi32 is nearly always a better option than PMULLD:
On SNB it will result in code that isn't any faster, but not any slower so we may as well keep it.
On KNL it only has half the throughput, so I've disabled it on there - ideally there'd be a better way than this.

Differential Revision: https://reviews.llvm.org/D42258

llvm-svn: 323367

parent a9263c89

Expand all Hide whitespace changes

Inline Side-by-side

Please register or to comment