Commits · 3a622a14f9eb62b9f8aae5f5e937b5a62dab06a4 · Lorenzo Albano / LLVM bpEVL

Aug 17, 2017

[dfsan] Add explicit zero extensions for shadow parameters in function wrappers. · b5205c69

Simon Dardis authored Aug 17, 2017

In the case where dfsan provides a custom wrapper for a function,
shadow parameters are added for each parameter of the function.
These parameters are i16s. For targets which do not consider this
a legal type, the lack of sign extension information would cause
LLVM to generate anyexts around their usage with phi variables
and calling convention logic.

Address this by introducing zero exts for each shadow parameter.

Reviewers: pcc, slthakur

Differential Revision: https://reviews.llvm.org/D33349

llvm-svn: 311087

b5205c69

[LV] Using VPlan to model the vectorized code and drive its transformation · 66278833

Ayal Zaks authored Aug 17, 2017

VPlan is an ongoing effort to refactor and extend the Loop Vectorizer. This
patch introduces the VPlan model into LV and uses it to represent the vectorized
code and drive the generation of vectorized IR.

In this patch VPlan models the vectorized loop body: the vectorized control-flow
is represented using VPlan's Hierarchical CFG, with predication refactored from
being a post-vectorization-step into a vectorization planning step modeling
if-then VPRegionBlocks, and generating code inline with non-predicated code. The
vectorized code within each VPBasicBlock is represented as a sequence of
Recipes, each responsible for modelling and generating a sequence of IR
instructions. To keep the size of this commit manageable the Recipes in this
patch are coarse-grained and capture large chunks of LV's code-generation logic.
The constructed VPlans are dumped in dot format under -debug.

This commit retains current vectorizer output, except for minor instruction
reorderings; see associated modifications to lit tests.

For further details on the VPlan model see docs/Proposals/VectorizationPlan.rst
and its references.

Authors: Gil Rapaport and Ayal Zaks

Differential Revision: https://reviews.llvm.org/D32871

llvm-svn: 311077

66278833

Reapply: [ADCE][Dominators] Teach ADCE to preserve dominators · fd5c5c91

Jakub Kuderski authored Aug 17, 2017

Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.

I didn't notice any performance impact when bootstrapping clang with this patch.

The patch was originally committed in r311039 and reverted in r311049.
This revision fixes the problem with not adding a dependency on the
DominatorTreeWrapperPass for the LegacyPassManager.

Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki

Reviewed By: davide

Subscribers: grandinj, zhendongsu, llvm-commits, david2050

Differential Revision: https://reviews.llvm.org/D35869

llvm-svn: 311057

fd5c5c91

[InstCombine] Teach canEvaluateTruncated to handle arithmetic shift (including... · 86111c66

Amjad Aboud authored Aug 16, 2017

[InstCombine] Teach canEvaluateTruncated to handle arithmetic shift (including those with vector splat shift amount)

Differential Revision: https://reviews.llvm.org/D36784

llvm-svn: 311050

86111c66

Revert "[ADCE][Dominators] Teach ADCE to preserve dominators" · cbcffb17

Jakub Kuderski authored Aug 16, 2017

This reverts commit r311039. The patch caused the
`test/Bindings/OCaml/Output/scalar_opts.ml` to fail.

llvm-svn: 311049

cbcffb17

Aug 16, 2017

[InstCombine] Make folding (X >s -1) ? C1 : C2 --> ((X >>s 31) & (C2 - C1)) +... · 882f2963

Craig Topper authored Aug 16, 2017

[InstCombine] Make folding (X >s -1) ? C1 : C2 --> ((X >>s 31) & (C2 - C1)) + C1 support splat vectors

This also uses decomposeBitTestICmp to decode the compare.

Differential Revision: https://reviews.llvm.org/D36781

llvm-svn: 311044

882f2963

[ADCE][Dominators] Teach ADCE to preserve dominators · 4552e9de

Jakub Kuderski authored Aug 16, 2017

Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.

I didn't notice any performance impact when bootstrapping clang with this patch.

Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki

Reviewed By: davide

Subscribers: grandinj, zhendongsu, llvm-commits, david2050

Differential Revision: https://reviews.llvm.org/D35869

llvm-svn: 311039

4552e9de

[LoopDataPrefetch][AArch64FalkorHWPFFix] Preserve ScalarEvolution · 40549ad1

Geoff Berry authored Aug 16, 2017

Summary:
Mark LoopDataPrefetch and AArch64FalkorHWPFFix passes as preserving
ScalarEvolution since they do not alter loop structure and should not
alter any SCEV values (though LoopDataPrefetch may introduce new
instructions that won't have cached SCEV values yet).

This can result in slight code differences, mainly w.r.t. nsw/nuw flags
on SCEVs, since these are computed somewhat lazily when a zext/sext
instruction is encountered.  As a result, passes after the modified
passes may see SCEVs with more nsw/nuw flags present.

Reviewers: sanjoy, anemet

Subscribers: aemerson, rengolin, mzolotukhin, javed.absar, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D36716

llvm-svn: 311032

40549ad1

[BDCE] Don't check demanded bits on unsized types · 9e54b709

Hal Finkel authored Aug 16, 2017

To clear assumptions that are potentially invalid after trivialization, we need
to walk the use/def chain. Normally, the only way to reach an instruction with
an unsized type is via an instruction that has side effects (or otherwise will
demand its input bits). That would stop the walk. However, if we have a
readnone function that returns an unsized type (e.g., void), we must avoid
asking for the demanded bits of the function call's return value. A
void-returning readnone function is always dead (and so we can stop walking the
use/def chain here), but the check is necessary to avoid asserting.

Fixes PR34211.

llvm-svn: 311014

9e54b709

Merge debug info when hoist then-else code to if. · 84d41203

Dehao Chen authored Aug 16, 2017

Summary: When we move then-else code to if, we need to merge its debug info, otherwise the hoisted instruction may have inaccurate debug info attached.

Reviewers: aprantl, probinson, dblaikie, echristo, loladiro

Reviewed By: aprantl

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D36778

llvm-svn: 310985

84d41203

[InstCombine] Teach canEvaluateZExtd and canEvaluateTruncated to handle vector... · 0a1a276d

Craig Topper authored Aug 15, 2017

[InstCombine] Teach canEvaluateZExtd and canEvaluateTruncated to handle vector shifts with splat shift amount

We were only allowing ConstantInt before. This patch allows splat of ConstantInt too.

Differential Revision: https://reviews.llvm.org/D36763

llvm-svn: 310970

0a1a276d

Aug 15, 2017

[InstCombine] Added support for (X >>s C) << C --> X & (-1 << C) · 0464c5d9
Amjad Aboud authored Aug 15, 2017
```
Differential Revision: https://reviews.llvm.org/D36743

llvm-svn: 310949
```
0464c5d9

[InstCombine] sink sext after ashr · f69b7d5c

Sanjay Patel authored Aug 15, 2017

Narrow ops are better for bit-tracking, and in the case of vectors,
may enable better codegen.

As the trunc test shows, this can allow follow-on simplifications.

There's a block of code in visitTrunc that deals with shifted ops
with FIXME comments. It may be possible to remove some of that now,
but I want to make sure there are no problems with this step first.

http://rise4fun.com/Alive/Y3a

Name: hoist_ashr_ahead_of_sext_1
  %s = sext i8 %x to i32
  %r = ashr i32 %s, 3  ; shift value is < than source bit width
  =>
  %a = ashr i8 %x, 3
  %r = sext i8 %a to i32
  
Name: hoist_ashr_ahead_of_sext_2
  %s = sext i8 %x to i32
  %r = ashr i32 %s, 8  ; shift value is >= than source bit width
  =>
  %a = ashr i8 %x, 7   ; so clamp this shift value
  %r = sext i8 %a to i32
  
Name: junc_the_trunc  
  %a = sext i16 %v to i32
  %s = ashr i32 %a, 18
  %t = trunc i32 %s to i16
  =>
  %t = ashr i16 %v, 15
llvm-svn: 310942

f69b7d5c

[Dominators] Include infinite loops in PostDominatorTree · 638c085d

Jakub Kuderski authored Aug 15, 2017

Summary:
This patch teaches PostDominatorTree about infinite loops. It is built on top of D29705 by @dberlin which includes a very detailed motivation for this change.

What's new is that the patch also teaches the incremental updater how to deal with reverse-unreachable regions and how to properly maintain and verify tree roots. Before that, the incremental algorithm sometimes ended up preserving reverse-unreachable regions after updates that wouldn't appear in the tree if it was constructed from scratch on the same CFG.

This patch makes the following assumptions:
- A sequence of updates should produce the same tree as a recalculating it.
- Any sequence of the same updates should lead to the same tree.
- Siblings and roots are unordered.

The last two properties are essential to efficiently perform batch updates in the future.
When it comes to the first one, we can decide later that the consistency between freshly built tree and an updated one doesn't matter match, as there are many correct ways to pick roots in infinite loops, and to relax this assumption. That should enable us to recalculate postdominators less frequently.

This patch is pretty conservative when it comes to incremental updates on reverse-unreachable regions and ends up recalculating the whole tree in many cases. It should be possible to improve the performance in many cases, if we decide that it's important enough.
That being said, my experiments showed that reverse-unreachable are very rare in the IR emitted by clang when bootstrapping clang. Here are the statistics I collected by analyzing IR between passes and after each removePredecessor call:

```
# functions: 52283
# samples: 337609
# reverse unreachable BBs: 216022
# BBs: 247840796
Percent reverse-unreachable: 0.08716159869015269 %
Max(PercRevUnreachable) in a function: 87.58620689655172 %
# > 25 % samples: 471 ( 0.1395104988314885 % samples )
... in 145 ( 0.27733680163724345 % functions )
```

Most of the reverse-unreachable regions come from invalid IR where it wouldn't be possible to construct a PostDomTree anyway.

I would like to commit this patch in the next week in order to be able to complete the work that depends on it before the end of my internship, so please don't wait long to voice your concerns :).

Reviewers: dberlin, sanjoy, grosser, brzycki, davide, chandlerc, hfinkel

Reviewed By: dberlin

Subscribers: nhaehnle, javed.absar, kparzysz, uabelho, jlebar, hiraditya, llvm-commits, dberlin, david2050

Differential Revision: https://reviews.llvm.org/D35851

llvm-svn: 310940

638c085d

Fix -Wunused-lambda-capture for Release build. · 4a179550

Rui Ueyama authored Aug 15, 2017

`I` and `this` are used only in assert or DEBUG, so they are unused
in Release build.

llvm-svn: 310934

4a179550

[LV] Minor savings to Sink casts to unravel first order recurrence · 25e2800e

Ayal Zaks authored Aug 15, 2017

Two minor savings: avoid copying the SinkAfter map and avoid moving a cast if it
is not needed.

Differential Revision: https://reviews.llvm.org/D36408

llvm-svn: 310910

25e2800e

[SLPVectorizer] Replace VL[0] to VL0 with assert, add propagateIRFlags extra parameter VL0, · 9e43d6e7
Dinar Temirbulatov authored Aug 15, 2017
```
                replace E->Scalars[0] to VL0, NFCI.

llvm-svn: 310904
```
9e43d6e7
Add missing dependency in ICP. (NFC) · 45847d36
Dehao Chen authored Aug 14, 2017
```
llvm-svn: 310896
```
45847d36

Remove checks for debug info intrinsics in use lists, NFC · 18728822

Reid Kleckner authored Aug 14, 2017

These haven't done anything since debug info intrinsics stopped
appearing in Value use lists in 2014.

llvm-svn: 310892

18728822

Aug 14, 2017

Recommit r310869, "[InstSimplify][InstCombine] Modify the interface of... · 0aa3a195

Craig Topper authored Aug 14, 2017

Recommit r310869, "[InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify"

This recommits r310869, with the moved files and no extra changes.

Original commit message:

This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.

I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.

I also had to make decomposeBitTest support vectors since InstSimplify needs that.

As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.

Differential Revision: https://reviews.llvm.org/D36593

llvm-svn: 310889

0aa3a195

Add strictfp attribute to prevent unwanted optimizations of libm calls · 53a5fbb4
Andrew Kaylor authored Aug 14, 2017
```
Differential Revision: https://reviews.llvm.org/D34163

llvm-svn: 310885
```
53a5fbb4

Revert r310869 "[InstSimplify][InstCombine] Modify the interface of... · 69fa8e0d

Craig Topper authored Aug 14, 2017

Revert r310869 "[InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify"

Failed to add the two files that moved. And then added an extra change I didn't mean to while trying to fix that. Reverting everything.

llvm-svn: 310873

69fa8e0d

[InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and... · 2f0b4506

Craig Topper authored Aug 14, 2017

[InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify

This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.

I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.

I also had to make decomposeBitTest support vectors since InstSimplify needs that.

As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.

Differential Revision: https://reviews.llvm.org/D36593

llvm-svn: 310869

2f0b4506

[SLPVectorizer] Schedule bundle with different opcodes. · 7b78f5e5

Dinar Temirbulatov authored Aug 14, 2017

This change let us schedule a bundle with different opcodes in it, for example : [ load, add, add, add ]

Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D36518

llvm-svn: 310847

7b78f5e5

[BDCE] reduce scope of an assert (PR34179) · a1067d9b

Sanjay Patel authored Aug 14, 2017

The assert was added with r310779 and is usually correct,
but as the test shows, not always. The 'volatile' on the
load is needed to expose the faulty path because without
it, DemandedBits would return that the load is just dead
rather than not demanded, and so we wouldn't hit the
bogus assert.

Also, since the lambda is just a single-line now, get rid 
of it and inline the DB.isAllOnesValue() calls. 

This should fix (prevent execution of a faulty assert):
https://bugs.llvm.org/show_bug.cgi?id=34179

llvm-svn: 310842

a1067d9b

[LoopUnroll] Enable option to peel remainder loop · 718c8a6a

Sam Parker authored Aug 14, 2017

On some targets, the penalty of executing runtime unrolling checks
and then not the unrolled loop can be significantly detrimental to
performance. This results in the need to be more conservative with
the unroll count, keeping a trip count of 2 reduces the overhead as
well as increasing the chance of the unrolled body being executed. But
being conservative leaves performance gains on the table.

This patch enables the unrolling of the remainder loop introduced by
runtime unrolling. This can help reduce the overhead of misunrolled
loops because the cost of non-taken branches is much less than the
cost of the backedge that would normally be executed in the remainder
loop. This allows larger unroll factors to be used without suffering
performance loses with smaller iteration counts.

Differential Revision: https://reviews.llvm.org/D36309

llvm-svn: 310824

718c8a6a

[InstCombine] Simplify and inline FoldOrWithConstants/FoldXorWithConstants · f7200990

Craig Topper authored Aug 14, 2017

Summary:
These functions were overly complicated. The body of this function was rechecking for an And operation to find the constant, but we already knew we were looking at two Ands ORed together and the pieces are in variables. We already had earlier nearby code that checked for ConstantInts. So just inline the remaining parts into the earlier code.

Next step is to use m_APInt instead of ConstantInt.

Reviewers: spatel, efriedma, davide, majnemer

Reviewed By: spatel

Subscribers: zzheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D36439

llvm-svn: 310806

f7200990

Aug 12, 2017

[BDCE] clear poison generators after turning a value into zero (PR33695, PR34037) · fe346f9f

Sanjay Patel authored Aug 12, 2017

nsw, nuw, and exact carry implicit assumptions about their operands, so we need
to clear those after trivializing a value. We decided there was no danger for
llvm.assume or metadata, so there's just a comment about that.

This fixes miscompiles as shown in:
https://bugs.llvm.org/show_bug.cgi?id=33695
https://bugs.llvm.org/show_bug.cgi?id=34037

Differential Revision: https://reviews.llvm.org/D36592

llvm-svn: 310779

fe346f9f

Aug 11, 2017

[OptDiag] Updating Remarks in SampleProfile · 51cf2604

Eli Friedman authored Aug 11, 2017

Updating remark API to newer OptimizationDiagnosticInfo API. This
allows remarks to show up in diagnostic yaml file, and enables use
of opt-viewer tool.

Hotness information for remarks (L505 and L751) do not display hotness
information, most likely due to profile information not being
propagated yet. Unsure if this is the desired outcome.

Patch by Tarun Rajendran.

Differential Revision: https://reviews.llvm.org/D36127

llvm-svn: 310763

51cf2604

Fix typo /NFC · 24524f31
Xinliang David Li authored Aug 11, 2017
```
llvm-svn: 310737
```
24524f31

Aug 10, 2017

[InstCombine] Make (X|C1)^C2 -> X^(C1^C2) iff X&~C1 == 0 work for splat vectors · 9a6110b2

Craig Topper authored Aug 10, 2017

This also corrects the description to match what was actually implemented. The old comment said X^(C1|C2), but it implemented X^((C1|C2)&~(C1&C2)). I believe ((C1|C2)&~(C1&C2)) is equivalent to (C1^C2).

Differential Revision: https://reviews.llvm.org/D36505

llvm-svn: 310658

9a6110b2

[InstCombine] Fix a crash in getSelectCondition if we happen to have two... · 57b4d864

Craig Topper authored Aug 10, 2017

[InstCombine] Fix a crash in getSelectCondition if we happen to have two inverse vectors of i1 constants.

We used to try to truncate the constant vector to vXi1, but if it's already i1 this would fail. Instead we now use IRBuilder::getZExtOrTrunc which should check the type and only create a trunc if needed. I believe this should trigger constant folding in the IRBuilder and ultimately do the same thing just with the additional type check.

llvm-svn: 310639

57b4d864

[InstCombine] Add a DEBUG_COUNTER to InstCombine to limit how many... · cd13ebca

Craig Topper authored Aug 10, 2017

[InstCombine] Add a DEBUG_COUNTER to InstCombine to limit how many instructions are visited for debug

Sometimes it would be nice to stop InstCombine mid way through its combining to see the current IR. By using a debug counter we can place an upper limit on how many instructions to process.

This will also allow skipping the first X combines, but that has the potential to change later combines since earlier canonicalizations might have been skipped.

Differential Revision: https://reviews.llvm.org/D36553

llvm-svn: 310638

cd13ebca

[DebugCounter] Move the semicolon out of the DEBUG_COUNTER macro and require... · 9cd976d0

Craig Topper authored Aug 10, 2017

[DebugCounter] Move the semicolon out of the DEBUG_COUNTER macro and require it to be placed at the end of each use.

This make it consistent with STATISTIC which it will often appears near.

While there move one DEBUG_COUNTER instance out of an anonymous namespace. It's already declaring a static variable so the namespace is unnecessary.

llvm-svn: 310637

9cd976d0

[sanitizer-coverage] Change cmp instrumentation to distinguish const operands · 52410815

Alexander Potapenko authored Aug 10, 2017

This implementation of SanitizerCoverage instrumentation inserts different
callbacks depending on constantness of operands:

  1. If both operands are non-const, then a usual
     __sanitizer_cov_trace_cmp[1248] call is inserted.
  2. If exactly one operand is const, then a
     __sanitizer_cov_trace_const_cmp[1248] call is inserted. The first
     argument of the call is always the constant one.
  3. If both operands are const, then no callback is inserted.

This separation comes useful in fuzzing when tasks like "find one operand
of the comparison in input arguments and replace it with the other one"
have to be done. The new instrumentation allows us to not waste time on
searching the constant operands in the input.

Patch by Victor Chibotaru.

llvm-svn: 310600

52410815

[NewGVN] Add CL option to control the generation of phi-of-ops (disable by default). · a5508e31
Chad Rosier authored Aug 10, 2017
```
Differential Revision: https://reviews.llvm.org/D36478539

llvm-svn: 310594
```
a5508e31

Aug 09, 2017

[InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046) · c50e55d0

Sanjay Patel authored Aug 09, 2017

I couldn't find any smaller folds to help the cases in:
https://bugs.llvm.org/show_bug.cgi?id=34046
after:
rL310141

The truncated rotate-by-variable patterns elude all of the existing transforms because 
of multiple uses and knowledge about demanded bits and knownbits that doesn't exist 
without the whole pattern. So we need an unfortunately large pattern match. But by 
simplifying this pattern in IR, the backend is already able to generate 
rolb/rolw/rorb/rorw for x86 using its existing rotate matching logic (although
there is a likely extraneous 'and' of the rotate amount). 

Note that rotate-by-constant doesn't have this problem - smaller folds should already 
produce the narrow IR ops.

Differential Revision: https://reviews.llvm.org/D36395

llvm-svn: 310509

c50e55d0

[asan] Fix instruction emission ordering with dynamic shadow. · 49e5acab

Matt Morehouse authored Aug 09, 2017

Summary:
Instrumentation to copy byval arguments is now correctly inserted
after the dynamic shadow base is loaded.

Reviewers: vitalybuka, eugenis

Reviewed By: vitalybuka

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D36533

llvm-svn: 310503

49e5acab

[LSR / TTI / SystemZ] Eliminate TargetTransformInfo::isFoldableMemAccess() · 6228aeda

Jonas Paulsson authored Aug 09, 2017

isLegalAddressingMode() has recently gained the extra optional Instruction*
parameter, and therefore it can now do the job that previously only
isFoldableMemAccess() could do.

The SystemZ implementation of isLegalAddressingMode() has gained the
functionality of checking for offsets, which used to be done with
isFoldableMemAccess().

The isFoldableMemAccess() hook has been removed everywhere.

Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D35933

llvm-svn: 310463

6228aeda

[LoopStrengthReduce] Don't neglect the Fixup.Offset in isAMCompletelyFolded(). · 5052771a

Jonas Paulsson authored Aug 09, 2017

In the recursive call to isAMCompletelyFolded(), the passed offset should be
the sum of F.BaseOffset and Fixup.Offset.

Review: Quentin Colombet.
llvm-svn: 310462

5052771a