Commits · 4de44b7ef87bcef83798eee69fdcbfab9866d52e · Lorenzo Albano / LLVM bpEVL

Mar 02, 2016

Attempt to fix ASAN failure in a MemorySSA test. · e0e6e48b
George Burgess IV authored Mar 02, 2016
```
llvm-svn: 262452
```
e0e6e48b
revert r262424 because there's a *clang test* for AArch64 that checks -O3 asm output · 5e4c46de
Sanjay Patel authored Mar 02, 2016
```
that is broken by this change

llvm-svn: 262440
```
5e4c46de

[InstCombine] convert 'isPositive' and 'isNegative' vector comparisons to shifts (PR26701) · 147e9279

Sanjay Patel authored Mar 01, 2016

As noted in the code comment, I don't think we can do the same transform that we do for
*scalar* integers comparisons to *vector* integers comparisons because it might pessimize
the general case. 

Exhibit A for an incomplete integer comparison ISA remains x86 SSE/AVX: it only has EQ and GT
for integer vectors.

But we should now recognize all the variants of this construct and produce the optimal code
for the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701
 

llvm-svn: 262424

147e9279

Mar 01, 2016

Perform InstructioinCombiningPass before SampleProfile pass. · 1012be12

Dehao Chen authored Mar 01, 2016

Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17742

llvm-svn: 262419

1012be12

Fix an issue where fast math flags were dropped during scalarization. · 7ea02fc7

Owen Anderson authored Mar 01, 2016

Most portions of InstCombine properly propagate fast math flags, but
apparently the vector scalarization section was overlooked.

llvm-svn: 262376

7ea02fc7

Add the beginnings of an update API for preserving MemorySSA · 83fc77b4

Daniel Berlin authored Mar 01, 2016

Summary:
This adds the beginning of an update API to preserve MemorySSA.  In particular,
this patch adds a way to remove memory SSA accesses when instructions are
deleted.

It also adds relevant unit testing infrastructure for MemorySSA's API.

(There is an actual user of this API, i will make that diff dependent on this one.  In practice, a ton of opt passes remove memory instructions, so it's hopefully an obviously useful API :P)

Reviewers: hfinkel, reames, george.burgess.iv

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17157

llvm-svn: 262362

83fc77b4

Revert "calculate builtin_object_size if argument is a removable pointer" · 6315f3f9
Petar Jovanovic authored Mar 01, 2016
```
Revert r262337 as "check-llvm ubsan" step failed on
sanitizer-x86_64-linux-fast buildbot.

llvm-svn: 262349
```
6315f3f9

calculate builtin_object_size if argument is a removable pointer · 8aef99aa

Petar Jovanovic authored Mar 01, 2016

This patch fixes calculating correct value for builtin_object_size function
when pointer is used only in builtin_object_size function call and never
after that.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D17337

llvm-svn: 262337

8aef99aa

[x86, InstCombine] transform more x86 masked loads to LLVM intrinsics · 6f2c01f7
Sanjay Patel authored Feb 29, 2016
```
Continuation of:
http://reviews.llvm.org/rL262269

llvm-svn: 262273
```
6f2c01f7
[LLE] Fix a comment · efc091f4
Adam Nemet authored Feb 29, 2016
```
llvm-svn: 262270
```
efc091f4

[x86, InstCombine] transform x86 AVX masked loads to LLVM intrinsics · 98a71505

Sanjay Patel authored Feb 29, 2016

The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392
http://reviews.llvm.org/rL260145

is that customers using the AVX intrinsics in C will benefit from combines when
the load mask is constant:

__m128 mload_zeros(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(0));
}

__m128 mload_fakeones(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(1));
}

__m128 mload_ones(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(0x80000000));
}

__m128 mload_oneset(float *f) {
  return _mm_maskload_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0));
}

...so none of the above will actually generate a masked load for optimized code.

This is the masked load counterpart to:
http://reviews.llvm.org/rL262064

llvm-svn: 262269

98a71505

Feb 29, 2016

[LLE] Fix SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper with Polly · 83be06e5

Adam Nemet authored Feb 29, 2016

We can actually have dependences between accesses with different
underlying types.  Bail in this case.

A test will follow shortly.

llvm-svn: 262267

83be06e5

Enable LoopLoadElimination by default · dd9e637a

Adam Nemet authored Feb 29, 2016

Summary:
I re-benchmarked this and results are similar to original results in
D13259:

On ARM64:
  SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -59.27%
  SingleSource/Benchmarks/Polybench/stencils/adi                   -19.78%

On x86:
  SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog  -27.14%

And of course the original ~20% gain on SPECint_2006/456.hmmer with Loop
Distribution.

In terms of compile time, there is ~5% increase on both
SingleSource/Benchmarks/Misc/oourafft and
SingleSource/Benchmarks/Linkpack/linkpack-pc.  These are both very tiny
loop-intensive programs where SCEV computations dominates compile time.

The reason that time spent in SCEV increases has to do with the design
of the old pass manager.  If a transform pass does not preserve an
analysis we *invalidate* the analysis even if there was *no*
modification made by the transform pass.

This means that currently we don't take advantage of LLE and LV sharing
the same analysis (LAA) and unfortunately we recompute LAA *and* SCEV
for LLE.

(There should be a way to work around this limitation in the case of
SCEV and LAA since both compute things on demand and internally cache
their result.  Thus we could pretend that transform passes preserve
these analyses and manually invalidate them upon actual modification.
On the other hand the new pass manager is supposed to solve so I am not
sure if this is worthwhile.)

Reviewers: hfinkel, dberlin

Subscribers: dberlin, reames, mssimpso, aemerson, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D16300

llvm-svn: 262250

dd9e637a

Minor code cleanup. NFC · 9e926e8b
Rong Xu authored Feb 29, 2016
```
llvm-svn: 262242
```
9e926e8b

Move discriminator assignment to the right place. · 939993ff

Dehao Chen authored Feb 29, 2016

Summary: Now discriminator is assigned per-function instead of per-module.

Reviewers: davidxl, dnovillo

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D17664

llvm-svn: 262240

939993ff

Feb 28, 2016
- [PGO] Remove redundant counter copies for avail_extern functions. · 985ff20a
  Xinliang David Li authored Feb 27, 2016
```
Differential Revision: http://reviews.llvm.org/D17654

llvm-svn: 262157
```
  985ff20a
Feb 27, 2016

Revert "[sancov] do not instrument nodes that are full pre-dominators" · 9a5419ec
Renato Golin authored Feb 27, 2016
```
This reverts commit r262103, as it broke all ARM and AArch64 bots.

llvm-svn: 262139
```
9a5419ec

[instrprof] Use __{start,stop}_SECNAME on PS4 too. · ea399f02

Sean Silva authored Feb 27, 2016

Summary:
The PS4 linker seems to handle this fine.

Hi David, it seems that indeed most ELF linkers support
__{start,stop}_SECNAME, as our proprietary linker does as well.

This follows the pattern of r250679 w.r.t. the testing.

Maggie, Phillip, Paul: I've tested this with the PS4 SDK 3.5 toolchain
prerelease and it seems to work fine.

Reviewers: davidxl

Subscribers: probinson, phillip.power, MaggieYi

Differential Revision: http://reviews.llvm.org/D17672

llvm-svn: 262112

ea399f02

[sancov] properly initializing pass. · 90562849
Mike Aizatsky authored Feb 27, 2016
```
llvm-svn: 262111
```
90562849

[libFuzzer] don't emit callbacks to sanitizer run-time in... · 3c767db3

Kostya Serebryany authored Feb 27, 2016

[libFuzzer] don't emit callbacks to sanitizer run-time in -fsanitize-coverage=trace-pc mode; update libFuzzer doc for previous commit

llvm-svn: 262110

3c767db3

[LICM] Teach LICM how to handle cases where the alias set tracker was · ad8cb382

Chandler Carruth authored Feb 27, 2016

merged into a loop that was subsequently unrolled (or otherwise nuked).

In this case it can't merge in the ASTs for any remaining nested loops,
it needs to re-add their instructions dircetly.

The fix is very isolated, but I've pulled the code for merging blocks
into the AST into a single place in the process. The only behavior
change is in the case which would have crashed before.

This fixes a crash reported by Mikael Holmen on the list after r261316
restored much of the loop pass pipelining and allowed us to actually do
this kind of nested transformation sequenc. I've taken that test case
and further reduced it into the somewhat twisty maze of loops in the
included test case. This does in fact trigger the bug even in this
reduced form.

llvm-svn: 262108

ad8cb382

[sancov] do not instrument nodes that are full pre-dominators · 9b53ab71

Mike Aizatsky authored Feb 27, 2016

Summary:
Without tree pruning clang has 2,667,552 points.
Wiht only dominators pruning: 1,515,586.
With both dominators & predominators pruning: 1,340,534.

Differential Revision: http://reviews.llvm.org/D17671

llvm-svn: 262103

9b53ab71

[InstCombine] Be more conservative about removing stackrestore · 892ae2e2

Reid Kleckner authored Feb 27, 2016

We ended up removing a save/restore pair around an inalloca call,
leading to a miscompile in Chromium.

llvm-svn: 262095

892ae2e2

Feb 26, 2016

[x86, InstCombine] transform x86 AVX2 masked stores to LLVM intrinsics · fc7e7ebf
Sanjay Patel authored Feb 26, 2016
```
Replicate everything for integers...because x86.

Continuation of:
http://reviews.llvm.org/rL262064

llvm-svn: 262077
```
fc7e7ebf

[x86, InstCombine] transform x86 AVX masked stores to LLVM intrinsics · 1ace9935

Sanjay Patel authored Feb 26, 2016

The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392
http://reviews.llvm.org/rL260145

is that customers using the AVX intrinsics in C will benefit from combines when
the store mask is constant:

void mstore_zero_mask(float *f, __m128 v) {
  _mm_maskstore_ps(f, _mm_set1_epi32(0), v);
}

void mstore_fake_ones_mask(float *f, __m128 v) {
  _mm_maskstore_ps(f, _mm_set1_epi32(1), v);
}

void mstore_ones_mask(float *f, __m128 v) {
  _mm_maskstore_ps(f, _mm_set1_epi32(0x80000000), v);
}

void mstore_one_set_elt_mask(float *f, __m128 v) {
  _mm_maskstore_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0), v);
}

...so none of the above will actually generate a masked store for optimized code.

Differential Revision: http://reviews.llvm.org/D17485

llvm-svn: 262064

1ace9935

[JumpThreading] Simplify Instructions first in ComputeValueKnownInPredecessors() · 5539f852
Haicheng Wu authored Feb 26, 2016
```
This change tries to find more opportunities to thread over basic blocks.

llvm-svn: 261981
```
5539f852

[LoopUnrollAnalyzer] Check that we're using SCEV for the same loop we're simulating. · 9f520ebc

Michael Zolotukhin authored Feb 26, 2016

Summary: Check that we're using SCEV for the same loop we're simulating. Otherwise, we might try to use the iteration number of the current loop in SCEV expressions for inner/outer loops IVs, which is clearly incorrect.

Reviewers: chandlerc, hfinkel

Subscribers: sanjoy, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17632

llvm-svn: 261958

9f520ebc

[sancov] Pruning full dominator blocks from instrumentation. · 5971f181

Mike Aizatsky authored Feb 26, 2016

Summary:
This is the first simple attempt to reduce number of coverage-
instrumented blocks.

If a basic block dominates all its successors, then its coverage
information is useless to us. Ingore such blocks if
santizer-coverage-prune-tree option is set.

Differential Revision: http://reviews.llvm.org/D17626

llvm-svn: 261949

5971f181

Feb 24, 2016

[asan] Do not instrument globals in the special "LLVM" sections · 40148f17
Anna Zaks authored Feb 24, 2016
```
llvm-svn: 261794
```
40148f17

[SimplifyCFG] Use a more elegant solution than r261731 · ee0cbbbe

David Majnemer authored Feb 24, 2016

The cleanupret instruction has an invariant that it's 'from' operand be
a cleanuppad.  This invariant was violated when we removed a dead block
which removed a cleanuppad leaving behind a cleanupret with an undef
'from' operand.

This was solved in r261731 by staving off the removal of the dead block
to a later pass.

However, it occured to me that we do not need to do this.
Instead, we can simply avoid processing the cleanupret if it has an
undef 'from' operand because we know that it will be removed soon.

llvm-svn: 261754

ee0cbbbe

[InstCombine] enable optimization of casted vector xor instructions · dbbaca0e

Sanjay Patel authored Feb 24, 2016

This is part of the payoff for the refactoring in:
http://reviews.llvm.org/rL261649
http://reviews.llvm.org/rL261707

In addition to removing a pile of duplicated code, the xor case was
missing the optimization for vector types because it checked
"SrcTy->isIntegerTy()" rather than "SrcTy->isIntOrIntVectorTy()"
like 'and' and 'or' were already doing.

This solves part of:
https://llvm.org/bugs/show_bug.cgi?id=26702

llvm-svn: 261750

dbbaca0e

NFC. Move isDereferenceable to Loads.h/cpp · 31bcca47

Artur Pilipenko authored Feb 24, 2016

This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated.

Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D16180

llvm-svn: 261736

31bcca47

[SimplifyCFG] Do not blindly remove unreachable blocks · ec72e372

David Majnemer authored Feb 24, 2016

DeleteDeadBlock was called indiscriminately, leading to cleanuprets with
undef cleanuppad references.

Instead, try to drain the BB of most of it's instructions if it is
unreachable.  We can then remove the BB if it solely consists of a
terminator (and maybe some phis).

llvm-svn: 261731

ec72e372

[InstCombine] refactor visitOr() to use foldCastedBitwiseLogic() · 75b4ae25

Sanjay Patel authored Feb 23, 2016

Note: The 'and' case in foldCastedBitwiseLogic() is inheriting one extra
check from the nearly identical 'or' case:
  if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))

But I'm not sure how to expose that difference in a regression test. 
Without that check, the 'or' path will infinite loop on:
test/Transforms/InstCombine/zext-or-icmp.ll
because the zext-or-icmp fold is attempting a reverse transform.

The refactoring should extend to the 'xor' case next to solve part of
PR26702.

llvm-svn: 261707

75b4ae25

Feb 23, 2016

[InstCombine] improve readability ; NFCI · 713f25e0
Sanjay Patel authored Feb 23, 2016
```
Less indenting, named local variables, more descriptive names.

llvm-svn: 261659
```
713f25e0

[WinEH] Don't inline an 'unwinds to caller' cleanupret into funclets which locally unwind · 223538f7

David Majnemer authored Feb 23, 2016

It is problematic if the inlinee has a cleanupret which unwinds to
caller and we inline it into a call site which doesn't unwind.

If the funclet unwinds anywhere other than to the caller,
then we will give the funclet two unwind destinations.
This will result in a verifier failure.

Seeing as how the caller wasn't an invoke (which would locally unwind)
and that the funclet cannot unwind to caller, we must conclude that an
'unwind to caller' cleanupret is dynamically unreachable.

This fixes PR26698.

Differential Revision: http://reviews.llvm.org/D17536

llvm-svn: 261656

223538f7

[InstCombine] less indenting; NFC · 7d0d810c
Sanjay Patel authored Feb 23, 2016
```
llvm-svn: 261652
```
7d0d810c

[InstCombine] add helper function to foldCastedBitwiseLogic() ; NFCI · 40e7ba00

Sanjay Patel authored Feb 23, 2016

This is a straight cut and paste of the existing code and is intended to
be the first step in solving part of PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702

We should be able to reuse most of this and delete the nearly identical 
existing code in visitOr(). Then, we can enhance visitXor() to use the
same code too.

llvm-svn: 261649

40e7ba00

Follow up for r261597: Add the * to the auto. · 792a8855
Michael Zolotukhin authored Feb 23, 2016
```
llvm-svn: 261600
```
792a8855
Follow-up for r261595: use range loop. · 4fdf974e
Michael Zolotukhin authored Feb 23, 2016
```
llvm-svn: 261597
```
4fdf974e