Skip to content
Commit 285b8abc authored by Sanjay Patel's avatar Sanjay Patel
Browse files

[x86] limit vector increment fold to allow load folding

The tests are based on the example from:
https://llvm.org/PR52032

I suspect that it looks worse than it actually is. :)
That is, llvm-mca says there's no uop/timing difference with the
load folding and pcmpeq vs. broadcast on Haswell (and probably
other targets).
The load-folding definitely makes the code smaller, so it's good
for that at least. So this requires carving a narrow hole in the
transform to get just this case without changing others that look
good as-is (in other words, the transform still seems good for
most examples).

Differential Revision: https://reviews.llvm.org/D112464
parent 837518d6
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment