Commit 285b8abc authored Oct 29, 2021 by Sanjay Patel

[x86] limit vector increment fold to allow load folding

The tests are based on the example from:
https://llvm.org/PR52032

I suspect that it looks worse than it actually is. :)
That is, llvm-mca says there's no uop/timing difference with the
load folding and pcmpeq vs. broadcast on Haswell (and probably
other targets).
The load-folding definitely makes the code smaller, so it's good
for that at least. So this requires carving a narrow hole in the
transform to get just this case without changing others that look
good as-is (in other words, the transform still seems good for
most examples).

Differential Revision: https://reviews.llvm.org/D112464

parent 837518d6

Show whitespace changes

Inline Side-by-side

Please to comment