Commits · b561cf953ab13c6bbc6e93e46f34336eff381211 · Lorenzo Albano / LLVM bpEVL

Jan 30, 2017
- [Inliner] Fix a comment to match the code. NFC. · f8dc2d8c
  Haicheng Wu authored Jan 30, 2017
```
TotalAltCost => TotalSecondaryCost

Differential Revision: https://reviews.llvm.org/D29231

llvm-svn: 293490
```
  f8dc2d8c
- [InstCombine] enable lshr(shl X, C1), C2 folds for vectors with splat constants · 1196d7cd
  Sanjay Patel authored Jan 30, 2017
```
llvm-svn: 293489
```
  1196d7cd
- Revert "[MemorySSA] Revert r293361 and r293363, as the tests fail under asan." · 9d8a335c
  Daniel Berlin authored Jan 30, 2017
```
This reverts commit r293471, reapplying r293361 and r293363 with a fix
for an out-of-bounds read.

llvm-svn: 293474
```
  9d8a335c
- [MemorySSA] Revert r293361 and r293363, as the tests fail under asan. · b9d6c10c
  Sam McCall authored Jan 30, 2017
```
llvm-svn: 293471
```
  b9d6c10c
- [LoopVectorize] Improve getVectorCallCost() getScalarizationOverhead() call. · 3f71d6a3
  Jonas Paulsson authored Jan 30, 2017
```
By calling getScalarizationOverhead with the CallInst instead of the types of
its arguments, we make sure that only unique call arguments are added to the
scalarization cost.

getScalarizationOverhead() is extended to handle calls by only passing on the
actual call arguments (which is not all the operands).

This also eliminates a wrapper function with the same name.

review: Hal Finkel
llvm-svn: 293459
```
  3f71d6a3
- [MemorySSA] Correct an assertion surrounding with parentheses. · 6c77de03
  Davide Italiano authored Jan 30, 2017
```
llvm-svn: 293453
```
  6c77de03
Jan 29, 2017

[InstCombine] enable (X >>?,exact C1) << C2 --> X << (C2 - C1) for vectors with splats · 062adaab
Sanjay Patel authored Jan 29, 2017
```
llvm-svn: 293435
```
062adaab
NewGVN: Fix where newline is printed in debug printing of memory equivalence · 9f376b7b
Daniel Berlin authored Jan 29, 2017
```
llvm-svn: 293428
```
9f376b7b

[ArgPromote] Move static helpers to modern LLVM naming conventions while · 8e9c0a84

Chandler Carruth authored Jan 29, 2017

here. NFC.

Simple refactoring while prepping a port to the new PM.

Differential Revision: https://reviews.llvm.org/D29249

llvm-svn: 293426

8e9c0a84

[ArgPromote] Run clang-format to normalize remarkably idiosyncratic · ae9ce3d4

Chandler Carruth authored Jan 29, 2017

formatting that has evolved here over the past years prior to making
somewhat invasive changes to thread new PM support through the business
logic.

Differential Revision: https://reviews.llvm.org/D29248

llvm-svn: 293425

ae9ce3d4

[ArgPromote] Re-arrange the code in a more typical, logical way. · cd836cd4

Chandler Carruth authored Jan 29, 2017

This arranges the static helpers in an order where they are defined
prior to their use to avoid the need of forward declarations, and
collect the core pass components at the bottom below their helpers.

This also folds one trivial function into the pass itself. Factoring
this 'runImpl' was an attempt to help porting to the new pass manager,
however in my attempt to begin this port in earnest it turned out to not
be a substantial help. I think it will be easier to factor things
without it.

This is an NFC change and does a minimal amount of edits over all.
Subsequent NFC cleanups will normalize the formatting with clang-format
and improve the basic doxygen commenting.

Differential Revision: https://reviews.llvm.org/D29247

llvm-svn: 293424

cd836cd4

Remove inclusion of SSAUpdater from several passes. · 9d8f6f8a

Davide Italiano authored Jan 29, 2017

It is, in fact, unused. Found while reviewing Danny's new
SSAUpdater and porting passes to it to see how the new API
looked like.

llvm-svn: 293407

9d8f6f8a

[PM] MLSM has been enabled for a way. Reclaim a cl::opt. · 9b8738d7
Davide Italiano authored Jan 28, 2017
```
llvm-svn: 293401
```
9b8738d7

Jan 28, 2017

[SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way. · 3121334d

Mohammad Shahid authored Jan 28, 2017

The jumbled scalar loads will be sorted while building the tree and these accesses will be marked to generate shufflevector after the vectorized load with proper mask.

Reviewers: hfinkel, mssimpso, mkuper

Differential Revision: https://reviews.llvm.org/D26905

Change-Id: I9c0c8e6f91a00076a7ee1465440a3f6ae092f7ad
llvm-svn: 293386

3121334d

[InstCombine] Merge DebugLoc when speculatively hoisting store instruction · 505a25ae

Taewook Oh authored Jan 28, 2017

Summary: Along with https://reviews.llvm.org/D27804, debug locations need to be merged when hoisting store instructions as well. Not sure if just dropping debug locations would make more sense for this case, but as the branch instruction will have at least different discriminator with the hoisted store instruction, I think there will be no difference in practice.

Reviewers: aprantl, andreadb, danielcdh

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29062

llvm-svn: 293372

505a25ae

Use print() instead of dump() in code · 194ded55
Matthias Braun authored Jan 28, 2017
```
llvm-svn: 293371
```
194ded55

MemorySSA: Allow movement to arbitrary places · ee6e3a59

Daniel Berlin authored Jan 28, 2017

Summary: Extend the MemorySSAUpdater API to allow movement to arbitrary places

Reviewers: davide, george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29239

llvm-svn: 293363

ee6e3a59

MemorySSA: Fix block numbering invalidation and replacement bugs discovered by updater · 2f1ab4ba
Daniel Berlin authored Jan 28, 2017
```
llvm-svn: 293361
```
2f1ab4ba

Cleanup dump() functions. · 8c209aa8

Matthias Braun authored Jan 28, 2017

We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html

For reference:
- Public headers should just declare the dump() method but not use
  LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
  #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  LLVM_DUMP_METHOD void MyClass::dump() {
    // print stuff to dbgs()...
  }
  #endif

llvm-svn: 293359

8c209aa8

MemorySSA: Move updater to its own file · ae6b8b69
Daniel Berlin authored Jan 28, 2017
```
llvm-svn: 293357
```
ae6b8b69

Introduce a basic MemorySSA updater, that supports insertDef, · 60ead05f

Daniel Berlin authored Jan 28, 2017

insertUse, moveBefore and moveAfter operations.

Summary:
This creates a basic MemorySSA updater that handles arbitrary
insertion of uses and defs into MemorySSA, as well as arbitrary
movement around the CFG. It replaces the current splice API.

It can be made to handle arbitrary control flow changes.
Currently, it uses the same updater algorithm from D28934.

The main difference is because MemorySSA is single variable, we have
the complete def and use list, and don't need anyone to give it to us
as part of the API.  We also have to rename stores below us in some
cases.

If we go that direction in that patch, i will merge all the updater
implementations (using an updater_traits or something to provide the
get* functions we use, called read*/write* in that patch).

Sadly, the current SSAUpdater algorithm is way too slow to use for
what we are doing here.

I have updated the tests we have to basically build memoryssa
incrementally using the updater api, and make sure it still comes out
the same.

Reviewers: george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29047

llvm-svn: 293356

60ead05f

[RegisterCoalescing] Recommit the patch "Remove partial redundent copy". · 35109902

Quentin Colombet authored Jan 28, 2017

In r292621, the recommit fixes a bug related with live interval update
after the partial redundent copy is moved.

This recommit solves an additional bug related to the lack of update of
subranges.

The original patch is to solve the performance problem described in
PR27827. Register coalescing sometimes cannot remove a copy because of
interference. But if we can find a reverse copy in one of the predecessor
block of the copy, the copy is partially redundent and we may remove the
copy partially by moving it to the predecessor block without the
reverse copy.

Differential Revision: https://reviews.llvm.org/D28585

Re-apply r292621

Revert "Revert rL292621. Caused some internal build bot failures in apple."

This reverts commit r292984.

Original patch: Wei Mi <wmi@google.com>
Subrange fix: Mostly Matthias Braun <matze@braunis.de>

llvm-svn: 293353

35109902

[InstCombine] move icmp transforms that might be recognized as min/max and inf-loop (PR31751) · febcb9ce

Sanjay Patel authored Jan 27, 2017

This is a minimal patch to avoid the infinite loop in:
https://llvm.org/bugs/show_bug.cgi?id=31751

But the general problem is bigger: we're not canonicalizing all of the min/max forms reported
by value tracking's matchSelectPattern(), and we don't define min/max consistently. Some code
uses matchSelectPattern(), other code uses matchers like m_Umax, and others have their own
inline definitions which may be subtly different from any of the above.

The reason that the test cases in this patch need a cast op to trigger is because we don't
(yet) canonicalize all min/max forms based on matchSelectPattern() in 
canonicalizeMinMaxWithConstant(), but we do make min/max+cast transforms based on 
matchSelectPattern() in visitSelectInst().

The location of the icmp transforms that trigger the inf-loop seems arbitrary at best, so
I'm moving those behind the min/max fence in visitICmpInst() as the quick fix.

llvm-svn: 293345

febcb9ce

Jan 27, 2017

Global DCE performance improvement · 888dee44

Mehdi Amini authored Jan 27, 2017

Change the original algorithm so that it scales better when meeting
very large bitcode where every instruction does not implies a global.

The target query is "how to you get all the globals referenced by
another global"?

Before this patch, it was doing this by walking the body (or the
initializer) and collecting the references. What this patch is doing,
it precomputing the answer to this query for the whole module by
walking the use-list of every global instead.

Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>

Differential Revision: https://reviews.llvm.org/D28549

llvm-svn: 293328

888dee44

[PGO] add debug option to view raw count after prof use annotation · d289e454
Xinliang David Li authored Jan 27, 2017
```
Differential Revision: https://reviews.llvm.org/D29045

llvm-svn: 293325
```
d289e454
NFC: Add debug tracing for more cases where loop unrolling fails. · e7d865e3
Anna Thomas authored Jan 27, 2017
```
llvm-svn: 293313
```
e7d865e3

[SLP] Refactoring of horizontal reduction analysis, NFC. · 4015bf83

Alexey Bataev authored Jan 27, 2017

Some checks in SLP horizontal reduction analysis function are performed
several times, though it is enough to perform these checks only once
during an initial attempt at adding candidate for the reduction
instruction/reduced value.

Differential Revision: https://reviews.llvm.org/D29175

llvm-svn: 293274

4015bf83

[LICM] When we are recomputing the alias sets for a subloop, we cannot · fd2d7c72

Chandler Carruth authored Jan 27, 2017

skip sub-subloops.

The logic to skip subloops dated from when this code was shared with the
cached case. Once it was factored out to only run in the case of
recomputed subloops it became a dangerous bug. If a subsubloop contained
an interfering instruction it would be silently skipped from the alias
sets for LICM.

With the old pass manager this was extremely hard to trigger as it would
require failing to visit these subloops with the LICM pass but then
visiting the outer loop somehow. I've not yet contrived any test case
that actually manages to trigger this.

But with the new pass manager we don't do the cross-loop caching hack
that the old PM does and so we recompute alias set information from
first principles. While this seems much cleaner and simpler it exposed
this bug and would subtly miscompile code due to failing to correctly
model the aliasing constraints of deeply nested loops.

llvm-svn: 293273

fd2d7c72

Fix unused variable warning. · 0b79aa33
Richard Trieu authored Jan 27, 2017
```
llvm-svn: 293260
```
0b79aa33

NewGVN: Add basic dead and redundant store elimination · c479686a

Daniel Berlin authored Jan 27, 2017

Summary:
This adds basic dead and redundant store elimination to
NewGVN.  Unlike our current DSE, it will happily do cross-block DSE if
it meets our requirements.

We get a bunch of DSE's simple.ll cases, and some stuff it doesn't.
Unlike DSE, however, we only try to eliminate stores of the same value
to the same memory location, not just general stores to the same
memory location.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29149

llvm-svn: 293258

c479686a

[NVPTX] [InstCombine] Add llvm_unreachable to appease MSVC. · 25ebe2d7
Justin Lebar authored Jan 27, 2017
```
llvm-svn: 293253
```
25ebe2d7
[NVPTX] Fix use-after-stack-free bug in InstCombineCalls. · e3ac0fb9
Justin Lebar authored Jan 27, 2017
```
Introduced in r293244.

llvm-svn: 293251
```
e3ac0fb9

Constant fold switch inst when looking for trivial conditions to unswitch on. · e5f8d643

Xin Tong authored Jan 27, 2017

Summary: Constant fold switch inst when looking for trivial conditions to unswitch on.

Reviewers: sanjoy, chenli, hfinkel, efriedma

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D29037

llvm-svn: 293250

e5f8d643

[PM] Port LoopLoadElimination to the new pass manager and wire it into · baabda93

Chandler Carruth authored Jan 27, 2017

the main pipeline.

This is a very straight forward port. Nothing weird or surprising.

This brings the number of missing passes from the new PM's pipeline down
to three.

llvm-svn: 293249

baabda93

[NVPTX] Upgrade NVVM intrinsics in InstCombineCalls. · 698c31b8

Justin Lebar authored Jan 27, 2017

Summary:
There are many NVVM intrinsics that we can't entirely get rid of, but
that nonetheless often correspond to target-generic LLVM intrinsics.

For example, if flush denormals to zero (ftz) is enabled, we can convert
@llvm.nvvm.ceil.ftz.f to @llvm.ceil.f32.  On the other hand, if ftz is
disabled, we can't do this, because @llvm.ceil.f32 will be lowered to a
non-ftz PTX instruction.  In this case, we can, however, simplify the
non-ftz nvvm ceil intrinsic, @llvm.nvvm.ceil.f, to @llvm.ceil.f32.

These transformations are particularly useful because they let us
constant fold instructions that appear in libdevice, the bitcode library
that ships with CUDA and essentially functions as its libm.

Reviewers: tra

Subscribers: hfinkel, majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D28794

llvm-svn: 293244

698c31b8

[LangRef] Make @llvm.sqrt(x) return undef, rather than have UB, for negative x. · cb9b41dd

Justin Lebar authored Jan 27, 2017

Summary:
Some frontends emit a speculate-and-select idiom for sqrt, wherein they compute
sqrt(x), check if x is negative, and select NaN if it is:

  %cmp = fcmp olt double %a, -0.000000e+00
  %sqrt = call double @llvm.sqrt.f64(double %a)
  %ret = select i1 %cmp, double 0x7FF8000000000000, double %sqrt

This is technically UB as the LangRef is written today if %a is ever less than
-0.  But emitting code that's compliant with the current definition of sqrt
would require a branch, which would then prevent us from matching this idiom in
SelectionDAG (which we do today -- ISD::FSQRT has defined behavior on negative
inputs), because SelectionDAG looks at one BB at a time.

Nothing in LLVM takes advantage of this undefined behavior, as far as we can
tell, and the fact that llvm.sqrt has UB dates from its initial addition to the
LangRef.

Reviewers: arsenm, mehdi_amini, hfinkel

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D28797

llvm-svn: 293242

cb9b41dd

Revert a couple of InstCombine/Guard checkins · 7516192a

Sanjoy Das authored Jan 26, 2017

This change reverts:

r293061: "[InstCombine] Canonicalize guards for NOT OR condition"
r293058: "[InstCombine] Canonicalize guards for AND condition"

They miscompile cases like:

```
declare void @llvm.experimental.guard(i1, ...)

define void @test_guard_not_or(i1 %A, i1 %B) {
  %C = or i1 %A, %B
  %D = xor i1 %C, true
  call void(i1, ...) @llvm.experimental.guard(i1 %D, i32 20, i32 30)[ "deopt"() ]
  ret void
}
```

because they do transfer the `i32 20, i32 30` parameters to newly
created guard instructions.

llvm-svn: 293227

7516192a

Jan 26, 2017

NewGVN: Fix bug exposed by PR31761 · 1ea5f324

Daniel Berlin authored Jan 26, 2017

Summary:
This does not actually fix the testcase in PR31761 (discussion is
ongoing on the testcase), but does fix a bug it exposes, where stores
were not properly clobbering loads.

We accomplish this by unifying the memory equivalence infratructure
back into the normal congruence infrastructure, and then properly
destroying congruence classes when memory state leaders disappear.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29195

llvm-svn: 293216

1ea5f324

[InstCombine] fold (X >>u C) << C --> X & (-1 << C) · 50753f02

Sanjay Patel authored Jan 26, 2017

We already have this fold when the lshr has one use, but it doesn't need that
restriction. We may be able to remove some code from foldShiftedShift().

Also, move the similar:
(X << C) >>u C --> X & (-1 >>u C)
...directly into visitLShr to help clean up foldShiftByConstOfShiftByConst().

That whole function seems questionable since it is called by commonShiftTransforms(),
but there's really not much in common if we're checking the shift opcodes for every
fold.

llvm-svn: 293215

50753f02

NewGVN: Add algorithm overview · db3c7be0
Daniel Berlin authored Jan 26, 2017
```
llvm-svn: 293212
```
db3c7be0