Commits · 9364fa3434b6967b796e1eedc480198806ead916 · Lorenzo Albano / LLVM bpEVL

Dec 04, 2017

Move splitIndirectCriticalEdges() to BasicBlockUtils.h. · 9364fa34

Hiroshi Yamauchi authored Dec 04, 2017

Summary:
Move splitIndirectCriticalEdges() from CodeGenPrepare to BasicBlockUtils.h so
that it can be called from other places.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40750

llvm-svn: 319689

9364fa34

[BypassSlowDivision] Improve our handling of divisions by constants · aa92cae1

Sanjoy Das authored Dec 04, 2017

(This reapplies r314253.  r314253 was reverted on r314482 because of a
correctness regression on P100, but that regression was identified to be
something else.)

Summary:
Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow .  This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.

Reviewers: jlebar

Subscribers: jholewinski, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D38265

llvm-svn: 319677

aa92cae1

[Loop Predication] Teach LP about reverse loops · 7b360434

Anna Thomas authored Dec 04, 2017

Summary:
Currently, we only support predication for forward loops with step
of 1.  This patch enables loop predication for reverse or
countdownLoops, which satisfy the following conditions:
   1. The step of the IV is -1.
   2. The loop has a singe latch as B(X) = X <pred>
latchLimit with pred as s> or u>
   3. The IV of the guard is the decrement
IV of the latch condition (Guard is: G(X) = X-1 u< guardLimit).

This patch was downstream for a while and is the last series of patches
that's from our LP implementation downstream.

Reviewers: apilipenko, mkazantsev, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40353

llvm-svn: 319659

7b360434

Dec 01, 2017

[IndVars] Fix a bug introduced in r317012 · 6260cf71

Philip Reames authored Dec 01, 2017

Turns out we can have comparisons which are indirect users of the induction variable that we can make invariant. In this case, there is no loop invariant value contributing and we'd fail an assert.

The test case was found by a java fuzzer and reduced. It's a real cornercase. You have to have a static loop which we've already proven only executes once, but haven't broken the backedge on, and an inner phi whose result can be constant folded by SCEV using exit count reasoning but not proven by isKnownPredicate. To my knowledge, only the fuzzer has hit this case.

llvm-svn: 319583

6260cf71

Revert r319531 "[SLPVectorizer] Failure to beneficially vectorize 'copyable'... · e2470b95

Hans Wennborg authored Dec 01, 2017

Revert r319531 "[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops."

It causes builds to fail with "Instruction does not dominate all uses" (PR35497).

> Patch tries to improve vectorization of the following code:
>
> void add1(int * __restrict dst, const int * __restrict src) {
>   *dst++ = *src++;
>   *dst++ = *src++ + 1;
>   *dst++ = *src++ + 2;
>   *dst++ = *src++ + 3;
> }
> Allows to vectorize even if the very first operation is not a binary add, but just a load.
>
> Fixed issues related to previous commit.
>
> Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
>
> Reviewed By: ABataev, RKSimon
>
> Subscribers: llvm-commits, RKSimon
>
> Differential Revision: https://reviews.llvm.org/D28907

llvm-svn: 319550

e2470b95

Revert r319537: Bail out of a SimplifyCFG switch table opt at undef values. · 9c13c8b6
Mikael Holmen authored Dec 01, 2017
```
Broke build bots so reverting.

llvm-svn: 319539
```
9c13c8b6

Bail out of a SimplifyCFG switch table opt at undef values. · 9f047795

Mikael Holmen authored Dec 01, 2017

Summary:
A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef.

The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization.

Patch by: JesperAntonsson

Reviewers: spatel, eeckstein, hans

Reviewed By: hans

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40639

llvm-svn: 319537

9f047795

[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops. · 29e86584

Dinar Temirbulatov authored Dec 01, 2017

    
            Patch tries to improve vectorization of the following code:
    
            void add1(int * __restrict dst, const int * __restrict src) {
              *dst++ = *src++;
              *dst++ = *src++ + 1;
              *dst++ = *src++ + 2;
              *dst++ = *src++ + 3;
            }
            Allows to vectorize even if the very first operation is not a binary add, but just a load.
    
            Fixed issues related to previous commit.
    
            Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
    
            Reviewed By: ABataev, RKSimon
    
            Subscribers: llvm-commits, RKSimon
    
            Differential Revision: https://reviews.llvm.org/D28907

llvm-svn: 319531

29e86584

Recommit rL319407: [SROA] enable splitting for non-whole-alloca loads and stores · 48e4c7aa

Hiroshi Inoue authored Dec 01, 2017

Recommiting once reverted patch rL319407 after adding a check for bit vector size to avoid failures in some build bots.

llvm-svn: 319522

48e4c7aa

Mark all library options as hidden. · 8065f0b9

Zachary Turner authored Dec 01, 2017

These command line options are not intended for public use, and often
don't even make sense in the context of a particular tool anyway. About
90% of them are already hidden, but when people add new options they
forget to hide them, so if you were to make a brand new tool today, link
against one of LLVM's libraries, and run tool -help you would get a
bunch of junk that doesn't make sense for the tool you're writing.

This patch hides these options. The real solution is to not have
libraries defining command line options, but that's a much larger effort
and not something I'm prepared to take on.

Differential Revision: https://reviews.llvm.org/D40674

llvm-svn: 319505

8065f0b9

ThinLTOBitcodeWriter: Try harder to discard unused references to the merged module. · 1f034226

Peter Collingbourne authored Nov 30, 2017

If the thin module has no references to an internal global in the
merged module, we need to make sure to preserve that property if the
global is a member of a comdat group, as otherwise promotion can end
up adding global symbols to the comdat, which is not allowed.

This situation can arise if the external global in the thin module
has dead constant users, which would cause use_empty() to return
false and would cause us to try to promote it. To prevent this from
happening, discard the dead constant users before asking whether a
global is empty.

Differential Revision: https://reviews.llvm.org/D40593

llvm-svn: 319494

1f034226

Nov 30, 2017

[memcpyopt] Teach memcpyopt to optimize across basic blocks · 59e4c0b9

Dan Gohman authored Nov 30, 2017

This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.

Fixes PR28958.

Differential Revision: https://reviews.llvm.org/D38374

llvm-svn: 319482

59e4c0b9

[PGO] Skip counter promotion for infinite loops · c23d2c68
Xinliang David Li authored Nov 30, 2017
```
Differential Revision: http://reviews.llvm.org/D40662

llvm-svn: 319462
```
c23d2c68
Revert rL319407: [SROA] enable splitting for non-whole-alloca loads and stores · 21e8ded4
Hiroshi Inoue authored Nov 30, 2017
```
This reverts commit rL319407 due to failures in some buildbot.

llvm-svn: 319410
```
21e8ded4

[SROA] enable splitting for non-whole-alloca loads and stores · 422e80ae

Hiroshi Inoue authored Nov 30, 2017

Currently, SROA splits loads and stores only when they are accessing the whole alloca.
This patch relaxes this limitation to allow splitting a load/store if all other loads and stores to the alloca are disjoint to or fully included in the current load/store. If there is no other load or store that crosses the boundary of the current load/store, the current splitting implementation works as is.
The whole-alloca loads and stores meet this new condition and so they are still splittable.

Here is a simplified motivating example.

struct record {
    long long a;
    int b;
    int c;
};

int func(struct record r) {
    for (int i = 0; i < r.c; i++)
        r.b++;
    return r.b;
}

When updating r.b (or r.c as well), LLVM generates redundant instructions on some platforms (such as x86_64, ppc64); here, r.b and r.c are packed into one 64-bit GPR when the struct is passed as a method argument.

With this patch, the above example is compiled into only few instructions without loop.
Without the patch, unnecessary loop-carried dependency is introduced by SROA and the loop cannot be eliminated by the later optimizers.

Differential Revision: https://reviews.llvm.org/D32998

llvm-svn: 319407

422e80ae

- Removed unused lamba (IsReturnBlock) causing build bots to fail for r319398 · 70293fa2
Graham Yiu authored Nov 30, 2017
```
- Added lit testcases that were supposed to be part of r319398

llvm-svn: 319399
```
70293fa2

With PGO information, we can do more aggressive outlining of cold regions in... · 8b1882c1

Graham Yiu authored Nov 30, 2017

With PGO information, we can do more aggressive outlining of cold regions in the inline candidate function. This contrasts with the scheme of keeping only the 'early return' portion of the inline candidate and outlining the rest of the function as a single function call.

Support for outlining multiple regions of each function is added, as well as some basic heuristics to determine which regions are good to outline. Outline candidates limited to regions that are single-entry & single-exit. We also avoid outlining regions that produce live-exit variables, which may inhibit some forms of code motion (like commoning).

Fallback to the regular partial inlining scheme is retained when either i) no regions are identified for outlining in the function, or ii) the outlined function could not be inlined in any of its callers.

Differential Revision: https://reviews.llvm.org/D38190

llvm-svn: 319398

8b1882c1

LowerTypeTests: Deduplicate code. NFC. · 9e3175bb
Peter Collingbourne authored Nov 30, 2017
```
llvm-svn: 319390
```
9e3175bb
LowerTypeTests: Remove unnecessary cast. NFC. · 943aca3c
Peter Collingbourne authored Nov 30, 2017
```
llvm-svn: 319387
```
943aca3c

Nov 28, 2017

Demote this opt remark to DEBUG. · 2e922890

Adam Nemet authored Nov 28, 2017

From a random opt-stat output:

Top 10 remarks:
  tailcallelim/tailcall          53%
  inline/AlwaysInline            13%
  gvn/LoadClobbered              13%
  inline/Inlined                  8%
  inline/TooCostly                2%
  inline/NoDefinition             2%
  licm/LoadWithLoopInvariantAddressInvalidated  2%
  licm/Hoisted                    1%
  asm-printer/InstructionCount    1%
  prologepilog/StackSize          1%

llvm-svn: 319235

2e922890

SROA: Don't create variable fragments that are outside of the variable. · 77d90b0c

Adrian Prantl authored Nov 28, 2017

An alloca may be larger than a variable that is described to be stored
there. Don't create a dbg.value for fragments that are outside of the
variable.

This fixes PR35447.
https://bugs.llvm.org/show_bug.cgi?id=35447

llvm-svn: 319230

77d90b0c

EntryExitInstrumenter: set DebugLocs on the inserted call instructions (PR35412) · ca46db95
Hans Wennborg authored Nov 28, 2017
```
Apparently the verifier requires that inlineable calls in a function
with debug info have debug locations.

llvm-svn: 319199
```
ca46db95

Use getStoreSize() in various places instead of 'BitSize >> 3'. · f0ff20f1

Jonas Paulsson authored Nov 28, 2017

This is needed for cases when the memory access is not as big as the width of
the data type. For instance, storing i1 (1 bit) would be done in a byte (8
bits).

Using 'BitSize >> 3' (or '/ 8') would e.g. give the memory access of an i1 a
size of 0, which for instance makes alias analysis return NoAlias even when
it shouldn't.

There are no tests as this was done as a follow-up to the bugfix for the case
where this was discovered (r318824). This handles more similar cases.

Review: Björn Petterson
https://reviews.llvm.org/D40339

llvm-svn: 319173

f0ff20f1

Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable. · c34f789e

Chandler Carruth authored Nov 28, 2017

The core idea is to (re-)introduce some redundancies where their cost is
hidden by the cost of materializing immediates for constant operands of
PHI nodes. When the cost of the redundancies is covered by this,
avoiding materializing the immediate has numerous benefits:
1) Less register pressure
2) Potential for further folding / combining
3) Potential for more efficient instructions due to immediate operand

As a motivating example, consider the remarkably different cost on x86
of a SHL instruction with an immediate operand versus a register
operand.

This pattern turns up surprisingly frequently, but is somewhat rarely
obvious as a significant performance problem.

The pass is entirely target independent, but it does rely on the target
cost model in TTI to decide when to speculate things around the PHI
node. I've included x86-focused tests, but any target that sets up its
immediate cost model should benefit from this pass.

There is probably more that can be done in this space, but the pass
as-is is enough to get some important performance on our internal
benchmarks, and should be generally performance neutral, but help with
more extensive benchmarking is always welcome.

One awkward part is that this pass has to be scheduled after
*everything* that can eliminate these kinds of redundancies. This
includes SimplifyCFG, GVN, etc. I'm open to suggestions about better
places to put this. We could in theory make it part of the codegen pass
pipeline, but there doesn't really seem to be a good reason for that --
it isn't "lowering" in any sense and only relies on pretty standard cost
model based TTI queries, so it seems to fit well with the "optimization"
pipeline model. Still, further thoughts on the pipeline position are
welcome.

I've also only implemented this in the new pass manager. If folks are
very interested, I can try to add it to the old PM as well, but I didn't
really see much point (my use case is already switched over to the new
PM).

I've tested this pretty heavily without issue. A wide range of
benchmarks internally show no change outside the noise, and I don't see
any significant changes in SPEC either. However, the size class
computation in tcmalloc is substantially improved by this, which turns
into a 2% to 4% win on the hottest path through tcmalloc for us, so
there are definitely important cases where this is going to make
a substantial difference.

Differential revision: https://reviews.llvm.org/D37467

llvm-svn: 319164

c34f789e

[TailRecursionElimination] Skip debug intrinsics. · 25ea91a8

Florian Hahn authored Nov 28, 2017

Summary:
I think we do not need to analyze debug intrinsics here, as they should
not impact codegen. This has 2 benefits: 1) slightly less work to do and
2) avoiding generating optimization remarks for converting calls to
debug intrinsics to tail calls, which are not really helpful for users.

Based on work by Sander de Smalen.

Reviewers: davide, trentxintong, aprantl

Reviewed By: aprantl

Subscribers: llvm-commits, JDevlieghere

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D40440

llvm-svn: 319158

25ea91a8

[GVN] Prevent ScalarPRE from hoisting across instructions that don't pass... · 11560722

Max Kazantsev authored Nov 28, 2017

[GVN] Prevent ScalarPRE from hoisting across instructions that don't pass control flow to successors

This is to address a problem similar to those in D37460 for Scalar PRE. We should not
PRE across an instruction that may not pass execution to its successor unless it is safe
to speculatively execute it.

Differential Revision: https://reviews.llvm.org/D38619

llvm-svn: 319147

11560722

This reverts commit r319096 and r319097. · c06f55e1

Rafael Espindola authored Nov 28, 2017

Revert "[SROA] Propagate !range metadata when moving loads."
Revert "[Mem2Reg] Clang-format unformatted parts of this file. NFCI."

Davide says they broke a bot.

llvm-svn: 319131

c06f55e1

SROA: Avoid creating a fragment expression that covers the entire variable. · d7f6f163
Adrian Prantl authored Nov 28, 2017
```
Fixes PR35416.

https://bugs.llvm.org/show_bug.cgi?id=35416

llvm-svn: 319126
```
d7f6f163

Nov 27, 2017

[Mem2Reg] Clang-format unformatted parts of this file. NFCI. · 824d71a9
Davide Italiano authored Nov 27, 2017
```
llvm-svn: 319097
```
824d71a9

[SROA] Propagate !range metadata when moving loads. · b5d59e73

Davide Italiano authored Nov 27, 2017

This tries to propagate !range metadata to a pre-existing load
when a load is optimized out. This is done instead of adding an
assume because converting loads to and from assumes creates a
lot of IR.

Patch by Ariel Ben-Yehuda.

Differential Revision:  https://reviews.llvm.org/D37216

llvm-svn: 319096

b5d59e73

[PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend... · 0de1a4bc

Sanjay Patel authored Nov 27, 2017

[PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result

This should fix PR31455:
https://bugs.llvm.org/show_bug.cgi?id=31455

Differential Revision: https://reviews.llvm.org/D28314

llvm-svn: 319094

0de1a4bc

Inliner: Don't mark notail calls with the 'tail' attribute · d9e71098

Arnold Schwaighofer authored Nov 27, 2017

enum TailCallKind { TCK_None = 0, TCK_Tail = 1, TCK_MustTail = 2,
                    TCK_NoTail = 3 };

TCK_NoTail is greater than TCK_Tail so taking the min does not do the
correct thing.

rdar://35639547

llvm-svn: 319075

d9e71098

[InstCombine] use 'auto' with 'dyn_cast'; NFC · 863d4947
Sanjay Patel authored Nov 27, 2017
```
llvm-svn: 319067
```
863d4947

Nov 24, 2017
- Make helpers static. NFC. · 51ebcaaf
  Benjamin Kramer authored Nov 24, 2017
```
llvm-svn: 318953
```
  51ebcaaf
Nov 23, 2017

MSan: remove an unnecessary cast. NFC for userspace instrumenetation. · 9e5477f4
Alexander Potapenko authored Nov 23, 2017
```
llvm-svn: 318923
```
9e5477f4

[MSan] Move the access address check before the shadow access for that address · 391804f5

Alexander Potapenko authored Nov 23, 2017

MSan used to insert the shadow check of the store pointer operand
_after_ the shadow of the value operand has been written.
This happens to work in the userspace, as the whole shadow range is
always mapped. However in the kernel the shadow page may not exist, so
the bug may cause a crash.

This patch moves the address check in front of the shadow access.

llvm-svn: 318901

391804f5

[IRCE][NFC] Add no wrap flags to no-wrapping SCEV calculation · 716e647d

Max Kazantsev authored Nov 23, 2017

In a lambda where we expect to have result within bounds, add respective `nsw/nuw` flags to
help SCEV just in case if it fails to figure them out on its own.

Differential Revision: https://reviews.llvm.org/D40168

llvm-svn: 318898

716e647d

Nov 22, 2017

[SCCP] Pick the right lattice value for constants. · b480b5c2

Davide Italiano authored Nov 22, 2017

After the dataflow algorithm proves that an argument is constant,
it replaces it value with the integer constant and drops the lattice
value associated to the DEF.

e.g. in the example we have @f() that's called twice:
call @f(undef, ...)
call @f(2, ...)

`undef` MEET 2 = 2 so we replace the argument and all its uses with
the constant 2.

Shortly after, tryToReplaceWithConstantRange() tries to get the lattice
value for the argument we just replaced, causing an assertion.
This function is a little peculiar as it runs when we're doing replacement
and not as part of the solver but still queries the solver.

The fix is that of checking whether we replaced the value already and
get a temporary lattice value for the constant.

Thanks to Zhendong Su for the report!

Fixes PR35357.

llvm-svn: 318817

b480b5c2

Nov 21, 2017

EntryExitInstrumenter: support __cyg_profile_func_enter_bare · 37cbf28e
Hans Wennborg authored Nov 21, 2017
```
It works just like __cyg_profile_func_enter but takes no arguments.

llvm-svn: 318783
```
37cbf28e

Add MemorySSA as loop dependency, disabled by default [NFC]. · ff8b8aea

Alina Sbirlea authored Nov 21, 2017

Summary:
First step in adding MemorySSA as dependency for loop pass manager.
Adding the dependency under a flag.

New pass manager: MSSA pointer in LoopStandardAnalysisResults can be null.
Legacy and new pass manager: Use cl::opt EnableMSSALoopDependency. Disabled by default.

Reviewers: sanjoy, davide, gberry

Subscribers: mehdi_amini, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D40274

llvm-svn: 318772

ff8b8aea