Commits · 33023662cbaebfda5005f0ac7c3a99554d18b2cc · Lorenzo Albano / LLVM bpEVL

Mar 08, 2016

[tsan] Add support for pointer typed atomic stores, loads, and cmpxchg · c1efa64c

Anna Zaks authored Mar 07, 2016

TSan instrumentation functions for atomic stores, loads, and cmpxchg work on
integer value types. This patch adds casts before calling TSan instrumentation
functions in cases where the value is a pointer.

Differential Revision: http://reviews.llvm.org/D17833

llvm-svn: 262876

c1efa64c

Mar 07, 2016

[LoopDataPrefetch] If prefetch distance is not set, skip pass · bb3680bd

Adam Nemet authored Mar 07, 2016

This lets select sub-targets enable this pass.  The patch implements the
idea from the recent llvm-dev thread:
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/94925

The goal is to enable the LoopDataPrefetch pass for the Cyclone
sub-target only within Aarch64.

Positive and negative tests will be included in an upcoming patch that
enables selective prefetching of large-strided accesses on Cyclone.

llvm-svn: 262844

bb3680bd

Revert "Enable LoopLoadElimination by default" · 81113ef6

Adam Nemet authored Mar 07, 2016

This reverts commit r262250.

It causes SPEC2006/gcc to generate wrong result (166.s) in AArch64 when
running with *ref* data set.  The error happens with
"-Ofast -flto -fuse-ld=gold" or "-O3 -fno-strict-aliasing".

llvm-svn: 262839

81113ef6

[DFSan] Remove an overly aggressive assert reported in PR26068. · 9ca96384

Chandler Carruth authored Mar 07, 2016

This code has been successfully used to bootstrap libc++ in a no-asserts
mode for a very long time, so the code that follows cannot be completely
incorrect. I've added a test that shows the current behavior for this
kind of code with DFSan. If it is desirable for DFSan to do something
special when processing an invoke of a variadic function, it can be
added, but we shouldn't keep an assert that we've been ignoring due to
release builds anyways.

llvm-svn: 262829

9ca96384

Mar 04, 2016
- [PGO] Add a commandline option to control number of the VP annotation metadata. · ecdc98fd
  Rong Xu authored Mar 04, 2016
```
llvm-svn: 262750
```
  ecdc98fd
- Fix a use-after-free bug introduced in r262636 · 3b7a8246
  Easwaran Raman authored Mar 04, 2016
```
llvm-svn: 262679
```
  3b7a8246
- [InstCombine] Combine A->B->A BitCast · 92e9d0e8
  Guozhi Wei authored Mar 03, 2016
```
This patch enhances InstCombine to handle following case:

        A  ->  B    bitcast
        PHI
        B  ->  A    bitcast

llvm-svn: 262670
```
  92e9d0e8
Mar 03, 2016

[InstCombine] transform bitcasted bitwise logic ops with constants (PR26702) · 9bba7508

Sanjay Patel authored Mar 03, 2016

Given that we're not actually reducing the instruction count in the included
regression tests, I think we would call this a canonicalization step.

The motivation comes from the example in PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702

If we hoist the bitwise logic ahead of the bitcast, the previously unoptimizable
example of:

define <4 x i32> @is_negative(<4 x i32> %x) {
  %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
  %not = xor <4 x i32> %lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
  %bc = bitcast <4 x i32> %not to <2 x i64>
  %notnot = xor <2 x i64> %bc, <i64 -1, i64 -1>
  %bc2 = bitcast <2 x i64> %notnot to <4 x i32>
  ret <4 x i32> %bc2
}

Simplifies to the expected:

define <4 x i32> @is_negative(<4 x i32> %x) {
  %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
  ret <4 x i32> %lobit
}

Differential Revision: http://reviews.llvm.org/D17583

llvm-svn: 262645

9bba7508

Infrastructure for PGO enhancements in inliner · 3035719c

Easwaran Raman authored Mar 03, 2016

This patch provides the following infrastructure for PGO enhancements in inliner:

Enable the use of block level profile information in inliner
Incremental update of block frequency information during inlining
Update the function entry counts of callees when they get inlined into callers.

Differential Revision: http://reviews.llvm.org/D16381

llvm-svn: 262636

3035719c

Use LineLocation instead of CallsiteLocation to index callsite profile. · 57d1dda5

Dehao Chen authored Mar 03, 2016

Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples).

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17827

llvm-svn: 262634

57d1dda5

[LoopUtils, LV] Fix PR26734 · b840a6d6

Matthew Simpson authored Mar 03, 2016

The vectorization of first-order recurrences (r261346) caused PR26734. When
detecting these recurrences, we need to ensure that the previous value is
actually defined inside the loop. This patch includes the fix and test case.

llvm-svn: 262624

b840a6d6

Mar 02, 2016

Explode store of arrays in instcombine · 3b8b2ea2

Amaury Sechet authored Mar 02, 2016

Summary: This is the last step toward supporting aggregate memory access in instcombine. This explodes stores of arrays into a serie of stores for each element, allowing them to be optimized.

Reviewers: joker.eph, reames, hfinkel, majnemer, mgrang

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17828

llvm-svn: 262530

3b8b2ea2

Unpack array of all sizes in InstCombine · 7cd3fe7d

Amaury Sechet authored Mar 02, 2016

Summary: This is another step toward improving fca support. This unpack load of array in a series of load to array's elements.

Reviewers: chandlerc, joker.eph, majnemer, reames, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15890

llvm-svn: 262521

7cd3fe7d

Really fix ASAN leak/etc issues with MemorySSA unittests · 6412002d
Daniel Berlin authored Mar 02, 2016
```
llvm-svn: 262519
```
6412002d
Revert "Fix ASAN detected errors in code and test" (it was not meant to be committed yet) · 989e601b
Daniel Berlin authored Mar 02, 2016
```
This reverts commit 890bbccd600ba1eb050353d06a29650ad0f2eb95.

llvm-svn: 262512
```
989e601b
Fix ASAN detected errors in code and test · 27ed1c2e
Daniel Berlin authored Mar 02, 2016
```
llvm-svn: 262511
```
27ed1c2e

[AA] Hoist the logic to reformulate various AA queries in terms of other · 12884f7f

Chandler Carruth authored Mar 02, 2016

parts of the AA interface out of the base class of every single AA
result object.

Because this logic reformulates the query in terms of some other aspect
of the API, it would easily cause O(n^2) query patterns in alias
analysis. These could in turn be magnified further based on the number
of call arguments, and then further based on the number of AA queries
made for a particular call. This ended up causing problems for Rust that
were actually noticable enough to get a bug (PR26564) and probably other
places as well.

When originally re-working the AA infrastructure, the desire was to
regularize the pattern of refinement without losing any generality.
While I think it was successful, that is clearly proving to be too
costly. And the cost is needless: we gain no actual improvement for this
generality of making a direct query to tbaa actually be able to
re-use some other alias analysis's refinement logic for one of the other
APIs, or some such. In short, this is entirely wasted work.

To the extent possible, delegation to other API surfaces should be done
at the aggregation layer so that we can avoid re-walking the
aggregation. In fact, this significantly simplifies the logic as we no
longer need to smuggle the aggregation layer into each alias analysis
(or the TargetLibraryInfo into each alias analysis just so we can form
argument memory locations!).

However, we also have some delegation logic inside of BasicAA and some
of it even makes sense. When the delegation logic is baking in specific
knowledge of aliasing properties of the LLVM IR, as opposed to simply
reformulating the query to utilize a different alias analysis interface
entry point, it makes a lot of sense to restrict that logic to
a different layer such as BasicAA. So one aspect of the delegation that
was in every AA base class is that when we don't have operand bundles,
we re-use function AA results as a fallback for callsite alias results.
This relies on the IR properties of calls and functions w.r.t. aliasing,
and so seems a better fit to BasicAA. I've lifted the logic up to that
point where it seems to be a natural fit. This still does a bit of
redundant work (we query function attributes twice, once via the
callsite and once via the function AA query) but it is *exactly* twice
here, no more.

The end result is that all of the delegation logic is hoisted out of the
base class and into either the aggregation layer when it is a pure
retargeting to a different API surface, or into BasicAA when it relies
on the IR's aliasing properties. This should fix the quadratic query
pattern reported in PR26564, although I don't have a stand-alone test
case to reproduce it.

It also seems general goodness. Now the numerous AAs that don't need
target library info don't carry it around and depend on it. I think
I can even rip out the general access to the aggregation layer and only
expose that in BasicAA as it is the only place where we re-query in that
manner.

However, this is a non-trivial change to the AA infrastructure so I want
to get some additional eyes on this before it lands. Sadly, it can't
wait long because we should really cherry pick this into 3.8 if we're
going to go this route.

Differential Revision: http://reviews.llvm.org/D17329

llvm-svn: 262490

12884f7f

Attempt to fix ASAN failure in a MemorySSA test. · e0e6e48b
George Burgess IV authored Mar 02, 2016
```
llvm-svn: 262452
```
e0e6e48b
revert r262424 because there's a *clang test* for AArch64 that checks -O3 asm output · 5e4c46de
Sanjay Patel authored Mar 02, 2016
```
that is broken by this change

llvm-svn: 262440
```
5e4c46de

[InstCombine] convert 'isPositive' and 'isNegative' vector comparisons to shifts (PR26701) · 147e9279

Sanjay Patel authored Mar 01, 2016

As noted in the code comment, I don't think we can do the same transform that we do for
*scalar* integers comparisons to *vector* integers comparisons because it might pessimize
the general case. 

Exhibit A for an incomplete integer comparison ISA remains x86 SSE/AVX: it only has EQ and GT
for integer vectors.

But we should now recognize all the variants of this construct and produce the optimal code
for the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701
 

llvm-svn: 262424

147e9279

Mar 01, 2016

Perform InstructioinCombiningPass before SampleProfile pass. · 1012be12

Dehao Chen authored Mar 01, 2016

Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17742

llvm-svn: 262419

1012be12

Fix an issue where fast math flags were dropped during scalarization. · 7ea02fc7

Owen Anderson authored Mar 01, 2016

Most portions of InstCombine properly propagate fast math flags, but
apparently the vector scalarization section was overlooked.

llvm-svn: 262376

7ea02fc7

Add the beginnings of an update API for preserving MemorySSA · 83fc77b4

Daniel Berlin authored Mar 01, 2016

Summary:
This adds the beginning of an update API to preserve MemorySSA.  In particular,
this patch adds a way to remove memory SSA accesses when instructions are
deleted.

It also adds relevant unit testing infrastructure for MemorySSA's API.

(There is an actual user of this API, i will make that diff dependent on this one.  In practice, a ton of opt passes remove memory instructions, so it's hopefully an obviously useful API :P)

Reviewers: hfinkel, reames, george.burgess.iv

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17157

llvm-svn: 262362

83fc77b4

Revert "calculate builtin_object_size if argument is a removable pointer" · 6315f3f9
Petar Jovanovic authored Mar 01, 2016
```
Revert r262337 as "check-llvm ubsan" step failed on
sanitizer-x86_64-linux-fast buildbot.

llvm-svn: 262349
```
6315f3f9

calculate builtin_object_size if argument is a removable pointer · 8aef99aa

Petar Jovanovic authored Mar 01, 2016

This patch fixes calculating correct value for builtin_object_size function
when pointer is used only in builtin_object_size function call and never
after that.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D17337

llvm-svn: 262337

8aef99aa

[x86, InstCombine] transform more x86 masked loads to LLVM intrinsics · 6f2c01f7
Sanjay Patel authored Feb 29, 2016
```
Continuation of:
http://reviews.llvm.org/rL262269

llvm-svn: 262273
```
6f2c01f7
[LLE] Fix a comment · efc091f4
Adam Nemet authored Feb 29, 2016
```
llvm-svn: 262270
```
efc091f4

[x86, InstCombine] transform x86 AVX masked loads to LLVM intrinsics · 98a71505

Sanjay Patel authored Feb 29, 2016

The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392
http://reviews.llvm.org/rL260145

is that customers using the AVX intrinsics in C will benefit from combines when
the load mask is constant:

__m128 mload_zeros(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(0));
}

__m128 mload_fakeones(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(1));
}

__m128 mload_ones(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(0x80000000));
}

__m128 mload_oneset(float *f) {
  return _mm_maskload_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0));
}

...so none of the above will actually generate a masked load for optimized code.

This is the masked load counterpart to:
http://reviews.llvm.org/rL262064

llvm-svn: 262269

98a71505

Feb 29, 2016

[LLE] Fix SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper with Polly · 83be06e5

Adam Nemet authored Feb 29, 2016

We can actually have dependences between accesses with different
underlying types.  Bail in this case.

A test will follow shortly.

llvm-svn: 262267

83be06e5

Enable LoopLoadElimination by default · dd9e637a

Adam Nemet authored Feb 29, 2016

Summary:
I re-benchmarked this and results are similar to original results in
D13259:

On ARM64:
  SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -59.27%
  SingleSource/Benchmarks/Polybench/stencils/adi                   -19.78%

On x86:
  SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog  -27.14%

And of course the original ~20% gain on SPECint_2006/456.hmmer with Loop
Distribution.

In terms of compile time, there is ~5% increase on both
SingleSource/Benchmarks/Misc/oourafft and
SingleSource/Benchmarks/Linkpack/linkpack-pc.  These are both very tiny
loop-intensive programs where SCEV computations dominates compile time.

The reason that time spent in SCEV increases has to do with the design
of the old pass manager.  If a transform pass does not preserve an
analysis we *invalidate* the analysis even if there was *no*
modification made by the transform pass.

This means that currently we don't take advantage of LLE and LV sharing
the same analysis (LAA) and unfortunately we recompute LAA *and* SCEV
for LLE.

(There should be a way to work around this limitation in the case of
SCEV and LAA since both compute things on demand and internally cache
their result.  Thus we could pretend that transform passes preserve
these analyses and manually invalidate them upon actual modification.
On the other hand the new pass manager is supposed to solve so I am not
sure if this is worthwhile.)

Reviewers: hfinkel, dberlin

Subscribers: dberlin, reames, mssimpso, aemerson, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D16300

llvm-svn: 262250

dd9e637a

Minor code cleanup. NFC · 9e926e8b
Rong Xu authored Feb 29, 2016
```
llvm-svn: 262242
```
9e926e8b

Move discriminator assignment to the right place. · 939993ff

Dehao Chen authored Feb 29, 2016

Summary: Now discriminator is assigned per-function instead of per-module.

Reviewers: davidxl, dnovillo

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D17664

llvm-svn: 262240

939993ff

Feb 28, 2016
- [PGO] Remove redundant counter copies for avail_extern functions. · 985ff20a
  Xinliang David Li authored Feb 27, 2016
```
Differential Revision: http://reviews.llvm.org/D17654

llvm-svn: 262157
```
  985ff20a
Feb 27, 2016

Revert "[sancov] do not instrument nodes that are full pre-dominators" · 9a5419ec
Renato Golin authored Feb 27, 2016
```
This reverts commit r262103, as it broke all ARM and AArch64 bots.

llvm-svn: 262139
```
9a5419ec

[instrprof] Use __{start,stop}_SECNAME on PS4 too. · ea399f02

Sean Silva authored Feb 27, 2016

Summary:
The PS4 linker seems to handle this fine.

Hi David, it seems that indeed most ELF linkers support
__{start,stop}_SECNAME, as our proprietary linker does as well.

This follows the pattern of r250679 w.r.t. the testing.

Maggie, Phillip, Paul: I've tested this with the PS4 SDK 3.5 toolchain
prerelease and it seems to work fine.

Reviewers: davidxl

Subscribers: probinson, phillip.power, MaggieYi

Differential Revision: http://reviews.llvm.org/D17672

llvm-svn: 262112

ea399f02

[sancov] properly initializing pass. · 90562849
Mike Aizatsky authored Feb 27, 2016
```
llvm-svn: 262111
```
90562849

[libFuzzer] don't emit callbacks to sanitizer run-time in... · 3c767db3

Kostya Serebryany authored Feb 27, 2016

[libFuzzer] don't emit callbacks to sanitizer run-time in -fsanitize-coverage=trace-pc mode; update libFuzzer doc for previous commit

llvm-svn: 262110

3c767db3

[LICM] Teach LICM how to handle cases where the alias set tracker was · ad8cb382

Chandler Carruth authored Feb 27, 2016

merged into a loop that was subsequently unrolled (or otherwise nuked).

In this case it can't merge in the ASTs for any remaining nested loops,
it needs to re-add their instructions dircetly.

The fix is very isolated, but I've pulled the code for merging blocks
into the AST into a single place in the process. The only behavior
change is in the case which would have crashed before.

This fixes a crash reported by Mikael Holmen on the list after r261316
restored much of the loop pass pipelining and allowed us to actually do
this kind of nested transformation sequenc. I've taken that test case
and further reduced it into the somewhat twisty maze of loops in the
included test case. This does in fact trigger the bug even in this
reduced form.

llvm-svn: 262108

ad8cb382

[sancov] do not instrument nodes that are full pre-dominators · 9b53ab71

Mike Aizatsky authored Feb 27, 2016

Summary:
Without tree pruning clang has 2,667,552 points.
Wiht only dominators pruning: 1,515,586.
With both dominators & predominators pruning: 1,340,534.

Differential Revision: http://reviews.llvm.org/D17671

llvm-svn: 262103

9b53ab71

[InstCombine] Be more conservative about removing stackrestore · 892ae2e2

Reid Kleckner authored Feb 27, 2016

We ended up removing a save/restore pair around an inalloca call,
leading to a miscompile in Chromium.

llvm-svn: 262095

892ae2e2