Commits · 6f922dbbeae7b647dfd548c4d74f384ca9c252da · Roger Ferrer / llvm-epi

Jan 03, 2020

[NFC][InstCombine] '(Op1 & С) - Op1' pattern tests (PR44427) · 6f922dbb
Roman Lebedev authored Jan 03, 2020

6f922dbb
[NFC][InstCombine] Autogenerate and2.ll checklines · 9b750cc6
Roman Lebedev authored Jan 03, 2020

9b750cc6

[NFC][InstCombine] '(X & (- Y)) - X' -> '- (X & (Y - 1))' fold (PR44448) · cc0216be

Roman Lebedev authored Jan 03, 2020

Name: (X & (- Y)) - X  ->  - (X & (Y - 1))  (PR44448)
  %negy = sub i8 0, %y
  %unbiasedx = and i8 %negy, %x
  %r = sub i8 %unbiasedx, %x
=>
  %ymask = add i8 %y, -1
  %xmasked = and i8 %ymask, %x
  %r = sub i8 0, %xmasked

https://rise4fun.com/Alive/OIpla

This decreases use count of %x, may allow us to
later hoist said negation even further,
and results in marginally nicer X86 codegen.

See
  https://bugs.llvm.org/show_bug.cgi?id=44448
  https://reviews.llvm.org/D71499

cc0216be

[NFC][InstCombine] '(X & (- Y)) - X' pattern tests (PR44448) · b87a3511
Roman Lebedev authored Jan 03, 2020
```
As discussed in https://bugs.llvm.org/show_bug.cgi?id=44448,
we can hoist negation out of the pattern.
```
b87a3511

[InstCombine] replace undef elements in vector constant when doing icmp folds (PR44383) · 16405827

Sanjay Patel authored Jan 03, 2020

As shown in P44383:
https://bugs.llvm.org/show_bug.cgi?id=44383
...we can't safely propagate a vector constant through this icmp fold
if that vector constant contains undefined elements.

We know that each defined element of the constant is safe though, so
find the first of those and replicate it into the formerly undef lanes.

Differential Revision: https://reviews.llvm.org/D72101

16405827

Jan 02, 2020

[InstCombine] add tests for vector icmp with undef constant elements; NFC · 4bb4f5b1
Sanjay Patel authored Jan 02, 2020

4bb4f5b1

[InstCombine] remove uses before deleting instructions (PR43723) · 88fc5fde

Sanjay Patel authored Jan 02, 2020

This is a less ambitious alternative to previous attempts to fix
this bug with:
rG56b2aee1875a
rGef02831f0a4e
rG56b2aee1875a
...because those all failed bot testing with use-after-free or
other problems.

The original crashing/assert problem is still showing up on
various fuzzers, so I've added a new minimal test based on
another one of those failures.

Instead of trying to manage and coordinate the logic in
isAllocSiteRemovable() with the deletion loops, just loosen
the existing code that handles casts and GEP by replacing
with undef to allow other opcodes. That means that no
instructions with uses should assert on deletion, and there
are hopefully no non-obvious sanitizer bugs induced.

88fc5fde

Jan 01, 2020

[InstCombine] Preserve inbounds when merging with zero-index GEP (PR44423) · 8dd9a136

Nikita Popov authored Jan 01, 2020

This addresses https://bugs.llvm.org/show_bug.cgi?id=44423.
If one of the GEPs is inbounds and the other is zero-index,
we can also preserve inbounds.

Differential Revision: https://reviews.llvm.org/D72060

8dd9a136

[InstCombine] Fix incorrect inbounds on GEP of GEP (PR44425) · 6ba5f8c4

Nikita Popov authored Jan 01, 2020

This fixes https://bugs.llvm.org/show_bug.cgi?id=44425. We need to
drop inbounds if one of the GEPs is not inbounds. This was already
done when creating a new GEP, but not when modifying in place.

Differential Revision: https://reviews.llvm.org/D72059

6ba5f8c4

[InstCombine] Add tests for PR44423 and PR44425; NFC · 11552433
Nikita Popov authored Jan 01, 2020

11552433
[InstCombine] Regenerate test checks; NFC · 7f48171d
Nikita Popov authored Jan 01, 2020

7f48171d
[InstCombine] Add tests for sub nuw of geps; NFC · 8756cd09
Nikita Popov authored Jan 01, 2020
```
Tests for PR44419.
```
8756cd09

[X86][InstCombine] Add constant folding and simplification support for pdep and pext · 374e0299

Craig Topper authored Dec 31, 2019

The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result.

There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted.

Fixes PR44389

Differential Revision: https://reviews.llvm.org/D71952

374e0299

Dec 31, 2019

[InstCombine] fold zext of masked bit set/clear · a041c4ec

Sanjay Patel authored Dec 31, 2019

This does not solve PR17101, but it is one of the
underlying diffs noted here:
https://bugs.llvm.org/show_bug.cgi?id=17101#c8

We could ease the one-use checks for the 'clear'
(no 'not' op) half of the transform, but I do not
know if that asymmetry would make things better
or worse.

Proofs:
https://rise4fun.com/Alive/uVB

  Name: masked bit set
  %sh1 = shl i32 1, %y
  %and = and i32 %sh1, %x
  %cmp = icmp ne i32 %and, 0
  %r = zext i1 %cmp to i32
  =>
  %s = lshr i32 %x, %y
  %r = and i32 %s, 1

  Name: masked bit clear
  %sh1 = shl i32 1, %y
  %and = and i32 %sh1, %x
  %cmp = icmp eq i32 %and, 0
  %r = zext i1 %cmp to i32
  =>
  %xn = xor i32 %x, -1
  %s = lshr i32 %xn, %y
  %r = and i32 %s, 1

a041c4ec

[InstCombine] add/adjust tests for masked bit; NFC · eb5c026e
Sanjay Patel authored Dec 31, 2019

eb5c026e
Revert "[InstCombine] Fix infinite loop due to bitcast <-> phi transforms" · 7adb5c2a
Nikita Popov authored Dec 31, 2019
```
This reverts commit 27a07959.

Seems to break test-suite.
```
7adb5c2a
[InstCombine] add tests for masked bit set/clear; NFC · 108645cd
Sanjay Patel authored Dec 30, 2019

108645cd

[InstCombine] Fix infinite loop due to bitcast <-> phi transforms · 27a07959

Nikita Popov authored Dec 07, 2019

Fix for https://bugs.llvm.org/show_bug.cgi?id=44245.

The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up
fighting against each other, because optimizeBitCastFromPhi()
assumes that bitcasts of loads will get folded. This doesn't happen
here, because a dangling phi node prevents the one-use fold in
https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering.

This patch fixes the issue by adding manually removing the old phis.

Differential Revision: https://reviews.llvm.org/D71164

27a07959

[InstCombine] Don't rewrite phi-of-bitcast when the phi has other users · fb114694

Connor Abbott authored Dec 31, 2019

Judging by the existing comments, this was the intention, but the
transform never actually checked if the existing phi's would be removed.
See https://bugs.llvm.org/show_bug.cgi?id=44242 for an example where
this causes much worse code generation on AMDGPU.

Differential Revision: https://reviews.llvm.org/D71209

fb114694

[InstCombine] Add tests for PR44242 · d04e64a2
Connor Abbott authored Dec 31, 2019
```
Differential Revision: https://reviews.llvm.org/D71260
```
d04e64a2

Dec 30, 2019
- [InstCombine] remove stale comment on test; NFC · ee3eebba
  Sanjay Patel authored Dec 30, 2019
  
  ee3eebba
- [InstCombine] propagate sign argument through nested copysigns · 987eb8e2
  Sanjay Patel authored Dec 30, 2019
```
This is another optimization suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
```
  987eb8e2
- [NFC] Add test for load-insert-store pattern · 65661908
  Qiu Chaofan authored Dec 30, 2019
```
This patch adds necessary test cases for load-update-store pattern
which only updates single element of vector.

Differential Revision: https://reviews.llvm.org/D71886
```
  65661908
Dec 25, 2019
- Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as... · 502a77f1
  Fangrui Song authored Dec 24, 2019
```
Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351
```
  502a77f1
Dec 23, 2019
- [InstCombine] add test for copysign; NFC · 25cf5d97
  Sanjay Patel authored Dec 23, 2019
  
  25cf5d97
- [InstCombine] add tests for not(select ...); NFC · 9a77c209
  Sanjay Patel authored Dec 23, 2019
  
  9a77c209
Dec 22, 2019
- [InstCombine] enhance fold for copysign with known sign arg · 9cdcd81d
  Sanjay Patel authored Dec 22, 2019
```
This is another optimization suggested in PRPR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
```
  9cdcd81d
Dec 21, 2019

[InstCombine] check alloc size in bitcast of geps fold (PR44321) · 79c7fa31

Sanjay Patel authored Dec 21, 2019

We missed a constraint in D44833
when folding a bitcast into a GEP with vector/array types.
If the alloc sizes specified by the datalayout don't match,
this could miscompile as shown in:
https://bugs.llvm.org/show_bug.cgi?id=44321

Differential Revision: https://reviews.llvm.org/D71771

79c7fa31

[SimplifyLibCalls] require fast-math-flags for pow(X, -0.5) transforms · 19f9f374

Sanjay Patel authored Dec 20, 2019

As discussed in PR44330:
https://bugs.llvm.org/show_bug.cgi?id=44330
...the transform from pow(X, -0.5) libcall/intrinsic to
reciprocal square root can result in small deviations from
the expected result due to differences in the pow()
implementation and/or the extra rounding step from the division.

This patch proposes to allow that difference with either the
'approximate functions' or 'reassociate' FMF:
http://llvm.org/docs/LangRef.html#fast-math-flags

In practice, this likely means that the code is compiled with
all of 'fast' (-ffast-math), but I have preserved the existing
specializations for -0.0/-INF that enable generating safe code
if those special values are allowed simultaneously with
allowing approximation/reassociation.

The question about whether a similar restriction is needed for
the non-reciprocal case -- pow(X, 0.5) -- is deferred. That
transform is allowed without FMF currently, and this patch does
not change that behavior.

Differential Revision: https://reviews.llvm.org/D71706

19f9f374

Dec 20, 2019

[InstCombine] Improve infinite loop detection · c431c407

Jakub Kuderski authored Dec 20, 2019

Summary:
This patch limits the default number of iterations performed by InstCombine. It also exposes a new option that allows to specify how many iterations is considered getting stuck in an infinite loop.

Based on experiments performed on real-world C++ programs, InstCombine seems to perform at most ~8-20 iterations, so treating 1000 iterations as an infinite loop seems like a safe choice. See D71145 for details.

The two limits can be specified via command line options.

Reviewers: spatel, lebedev.ri, nikic, xbolva00, grosser

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71673

c431c407

[InstCombine] add tests for cast+gep; NFC · 0b421d84
Sanjay Patel authored Dec 20, 2019
```
PR44321:
https://bugs.llvm.org/show_bug.cgi?id=44321
```
0b421d84

Dec 19, 2019

[ValueTracking] isKnownNonZero() should take non-null-ness assumptions into consideration (PR43267) · 047186cc

Roman Lebedev authored Dec 18, 2019

Summary:
It is pretty common to assume that something is not zero.
Even optimizer itself sometimes emits such assumptions
(e.g. `addAssumeNonNull()` in `PromoteMemoryToRegister.cpp`).

But we currently don't deal with such assumptions :)
The only way `isKnownNonZero()` handles assumptions is
by calling `computeKnownBits()` which calls `computeKnownBitsFromAssume()`.
But `x != 0` does not tell us anything about set bits,
it only says that there are *some* set bits.
So naturally, `KnownBits` does not get populated,
and we fail to make use of this assumption.

I propose to deal with this special case by special-casing it
via adding a `isKnownNonZeroFromAssume()` that returns boolean
when there is an applicable assumption.

While there, we also deal with other predicates,
mainly if the comparison is with constant.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43267 | PR43267 ]].

Differential Revision: https://reviews.llvm.org/D71660

047186cc

[ValueTracking] isValidAssumeForContext(): CxtI itself also must transfer execution to successor · 92083a29

Roman Lebedev authored Dec 20, 2019

This is a pretty rare case, when CxtI and assume are
in the same basic block, with assume being located later.

We were already checking that assumption was guaranteed to be
executed, but we omitted CxtI itself from consideration,
and as the test (miscompile) shows, that is incorrect.

As noted in D71660 review by @nikic.

92083a29

[NFC][InstCombine] Add a test for assume-induced miscompile · ffcae008

Roman Lebedev authored Dec 20, 2019

@escape() may throw here, we don't know that assumption, which is located
afterwards in the same block, is executed, therefore %load arg of
call to @escape() can not be marked as non-null.

As noted in D71660 review by @nikic.

ffcae008

[InstCombine] add/adjust tests for pow->sqrt; NFC · 5889e782
Sanjay Patel authored Dec 19, 2019
```
There's at least 1 bug here as discussed in PR44330.
```
5889e782

[InstCombine] Canonicalize select immediates · a59cc5e1

David Green authored Dec 19, 2019

In certain situations after inlining and simplification we end up with
code that is _almost_ a min/max pattern, but contains constants that
have been demand-bit optimised to the wrong values, ending up with code
like:
  %1 = icmp slt i32 %shr, -128
  %2 = select i1 %1, i32 128, i32 %shr
  %.inv = icmp sgt i32 %shr, 127
  %spec.select.i = select i1 %.inv, i32 127, i32 %2
  %conv7 = trunc i32 %spec.select.i to i8
This should be turned into a min/max pattern, but the -128 in the first
select was instead transformed into 128, as only the bottom byte was
ever demanded.

To fix this, I've put in further canonicalisation for the immediates of
selects, preferring to use the same value as the icmp if available.

Differential Revision: https://reviews.llvm.org/D71516

a59cc5e1

[Instcombine] Add select canonicalization tests. NFC · d3815332
David Green authored Dec 19, 2019

d3815332

Dec 18, 2019

Revert "[InstCombine][AMDGPU] Trim more components of *buffer_load" · 40b5a0f7
Piotr Sobczak authored Dec 18, 2019
```
Revert D70315, as it breaks gfx8 for some reason.

This reverts commit 65f94b33.
```
40b5a0f7

[InstCombine] Insert instructions before adding them to worklist · 3d29c41a

Jakub Kuderski authored Dec 18, 2019

Summary:
This patch adds instructions to the InstCombine worklist after they are properly inserted. This way we don't get `<badref>`s printed when logging added instructions.
It also adds a check in `Worklist::Add` that ensures that all added instructions have parents.

Simple test case that illustrates the difference when run with `--debug-only=instcombine`:

```
define i32 @test35(i32 %a, i32 %b) {
  %1 = or i32 %a, 1135
  %2 = or i32 %1, %b
  ret i32 %2
}
```

Before this patch:
```
INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting:   %1 = or i32 %a, 1135
IC: Visiting:   %2 = or i32 %1, %b
IC: ADD:   %2 = or i32 %a, %b
IC: Old =   %3 = or i32 %1, %b
    New =   <badref> = or i32 %2, 1135
IC: ADD:   <badref> = or i32 %2, 1135
...
```

With this patch:
```
INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting:   %1 = or i32 %a, 1135
IC: Visiting:   %2 = or i32 %1, %b
IC: ADD:   %2 = or i32 %a, %b
IC: Old =   %3 = or i32 %1, %b
    New =   <badref> = or i32 %2, 1135
IC: ADD:   %3 = or i32 %2, 1135
...
```

Reviewers: fhahn, davide, spatel, foad, grosser, nikic

Reviewed By: nikic

Subscribers: nikic, lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71093

3d29c41a

[InstCombine] Allow to limit the max number of iterations · 406b6019

Jakub Kuderski authored Dec 18, 2019

Summary:
This patch teaches InstCombine to accept a new parameter: maximum number of iterations over functions.

InstCombine tries to simplify instructions by iterating over the whole function until the function stops changing. As a consequence, the last iteration before reaching a fixpoint visits all instructions in the worklist and never performs any rewrites.

Bounding the number of iterations can have 2 benefits:
* In case the users of the pass can make a good guess about the number of required iterations, we can save the time normally spent on the last iteration that doesn't change anything.
* When the wants to use InstCombine as a cleanup pass, it may be enough to run just a few iterations and stop even before reaching a fixpoint. This can be also useful for implementing a lightweight pass pipeline (think `-O1`).

This patch does not change the behavior of opt or Clang -- limiting the number of iterations is entirely opt-in.

Reviewers: fhahn, davide, spatel, foad, nlopes, grosser, lebedev.ri, nikic, xbolva00

Reviewed By: spatel

Subscribers: craig.topper, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71145

406b6019