Commits · 0be97208751110c8c2cf4cfb0fb29537d0441b52 · Lorenzo Albano / LLVM bpEVL

Dec 20, 2016

[IR] Remove the DIExpression field from DIGlobalVariable. · bceaaa96

Adrian Prantl authored Dec 20, 2016

This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 290153

bceaaa96

Dec 19, 2016

[LV] Sink tripcount query to where it's actually used. NFC. · fb7dd86f
Michael Kuperstein authored Dec 19, 2016
```
llvm-svn: 290142
```
fb7dd86f

[InstCombine] use commutative matcher for pattern with commutative operators · 5a443ac0

Sanjay Patel authored Dec 19, 2016

This is a case that was missed in:
https://reviews.llvm.org/rL290067
...and it would regress if we fix operand complexity (PR28296).

llvm-svn: 290127

5a443ac0

[InstCombine] add folds for icmp (umin|umax X, Y), X · dd46b529

Sanjay Patel authored Dec 19, 2016

This is a follow-up to:
https://reviews.llvm.org/rL289855 (https://reviews.llvm.org/D27531)
https://reviews.llvm.org/rL290111

llvm-svn: 290118

dd46b529

[LoopVersioning] Require loop-simplify form for loop versioning. · 2e03213f

Florian Hahn authored Dec 19, 2016

Summary:
Requiring loop-simplify form for loop versioning ensures that the
runtime check block always dominates the exit block.
    
This patch closes #30958 (https://llvm.org/bugs/show_bug.cgi?id=30958).

Reviewers: silviu.baranga, hfinkel, anemet, ashutosh.nema

Subscribers: ashutosh.nema, mzolotukhin, efriedma, hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D27469

llvm-svn: 290116

2e03213f

[InstCombine] add folds for icmp (smax X, Y), X · 8296c6c9
Sanjay Patel authored Dec 19, 2016
```
This is a follow-up to:
https://reviews.llvm.org/rL289855 (D27531)

llvm-svn: 290111
```
8296c6c9

Revert @llvm.assume with operator bundles (r289755-r289757) · aec2fa35

Daniel Jasper authored Dec 19, 2016

This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).

llvm-svn: 290086

aec2fa35

Dec 18, 2016

[InstCombine] use commutative matchers for patterns with commutative operators · 2b9d4b4d

Sanjay Patel authored Dec 18, 2016

Background/motivation - I was circling back around to:
https://llvm.org/bugs/show_bug.cgi?id=28296

I made a simple patch for that and noticed some regressions, so added test cases for
those with rL281055, and this is hopefully the minimal fix for just those cases.

But as you can see from the surrounding untouched folds, we are missing commuted patterns
all over the place, and of course there are no regression tests to cover any of those cases.

We could sprinkle "m_c_" dust all over this file and catch most of the missing folds, but
then we still wouldn't have test coverage, and we'd still miss some fraction of commuted
patterns because they require adjustments to the match order.

I'm aware of the concern about the potential compile-time performance impact of adding
matches like this (currently being discussed on llvm-dev), but I don't think there's any
evidence yet to suggest that handling commutative pattern matching more thoroughly is not
a worthwhile goal of InstCombine.

Differential Revision: https://reviews.llvm.org/D24419

llvm-svn: 290067

2b9d4b4d

Dec 17, 2016
- [InstCombine] Simplify code slightly. NFC · e32b5fd7
  Craig Topper authored Dec 17, 2016
```
llvm-svn: 290046
```
  e32b5fd7
- Revert "[GVNHoist] Move GVNHoist to function simplification part of pipeline." · 95294127
  Evgeniy Stepanov authored Dec 17, 2016
```
This reverts r289696, which caused TSan perf regression.

See PR31382.

llvm-svn: 290030
```
  95294127
Dec 16, 2016

Preserve loop metadata when folding branches to a common destination. · 3ca147ea
Michael Kuperstein authored Dec 16, 2016
```
Differential Revision: https://reviews.llvm.org/D27830

llvm-svn: 289992
```
3ca147ea

Revert "[IR] Remove the DIExpression field from DIGlobalVariable." · 73ec0656

Adrian Prantl authored Dec 16, 2016

This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).

Sorry for the churn!

llvm-svn: 289982

73ec0656

Reapply "[LV] Enable vectorization of loops with conditional stores by default" · a4964f29

Matthew Simpson authored Dec 16, 2016

This patch reapplies r289863. The original patch was reverted because it
exposed a bug causing the loop vectorizer to crash in the Python runtime on
PPC. The underlying issue was fixed with r289958.

llvm-svn: 289975

a4964f29

[LV] Don't attempt to type-shrink scalarized instructions · 099af810

Matthew Simpson authored Dec 16, 2016

After r288909, instructions feeding predicated instructions may be scalarized
if profitable. Since these instructions will remain scalar, we shouldn't
attempt to type-shrink them. We should only truncate vector types to their
minimal bit widths. This bug was exposed by enabling the vectorization of loops
containing conditional stores by default.

llvm-svn: 289958

099af810

Revert r289863: [LV] Enable vectorization of loops with conditional · 48b4e614

Chandler Carruth authored Dec 16, 2016

stores by default

This uncovers a crasher in the loop vectorizer on PPC when building the
Python runtime. I'll send the testcase to the review thread for the
original commit.

llvm-svn: 289934

48b4e614

[IR] Remove the DIExpression field from DIGlobalVariable. · 74a835cd

Adrian Prantl authored Dec 16, 2016

This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289920

74a835cd

[ThinLTO] Thin link efficiency: More efficient export list computation · edddca22

Teresa Johnson authored Dec 16, 2016

Summary:
Instead of checking whether a global referenced by a function being
imported is defined in the same module, speculatively always add the
referenced globals to the module's export list. After all imports are
computed, for each module prune any not in its defined set from its
export list.

For a huge C++ app with aggressive importing thresholds, even with
D27687 we spent a lot of time invoking modulePath() from
exportGlobalInModule (modulePath() was still the 2nd hottest routine in
profile). The reason is that with comdat/linkonce the summary lists for
each GUID can be long. For the app in question, for example, we were
invoking exportGlobalInModule almost 2 million times, and we traversed
an average of 63 entries in the summary list each time.

This patch reduced the thin link time for the app by about 10% (on top
of D27687) when using aggressive importing thresholds, and about 3.5% on
average with default importing thresholds.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27755

llvm-svn: 289918

edddca22

[SimplifyLibCalls] Use a lambda. NFCI. · f024a56c
Davide Italiano authored Dec 16, 2016
```
llvm-svn: 289911
```
f024a56c
Revert "[IR] Remove the DIExpression field from DIGlobalVariable." · 03c6d31a
Adrian Prantl authored Dec 16, 2016
```
This reverts commit 289902 while investigating bot berakage.

llvm-svn: 289906
```
03c6d31a
Add missing library dep. · 7a4be21d
Peter Collingbourne authored Dec 16, 2016
```
llvm-svn: 289903
```
7a4be21d

[IR] Remove the DIExpression field from DIGlobalVariable. · ce139357

Adrian Prantl authored Dec 16, 2016

This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289902

ce139357

IPO: Introduce ThinLTOBitcodeWriter pass. · 1398a32e

Peter Collingbourne authored Dec 16, 2016

This pass prepares a module containing type metadata for ThinLTO by splitting
it into regular and thin LTO parts if possible, and writing both parts to
a multi-module bitcode file. Modules that do not contain type metadata are
written unmodified as a single module.

All globals with type metadata are added to the regular LTO module, and
the rest are added to the thin LTO module.

Differential Revision: https://reviews.llvm.org/D27324

llvm-svn: 289899

1398a32e

[ThinLTO] Thin link efficiency improvement: don't re-export globals (NFC) · 19f2aa78

Teresa Johnson authored Dec 15, 2016

Summary:
We were reinvoking exportGlobalInModule numerous times redundantly.
No need to re-export globals referenced by a global that was already
imported from its module. This resulted in a large speedup in the thin
link for a big application, particularly when importing aggressiveness
was cranked up.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27687

llvm-svn: 289896

19f2aa78

[SimplifyLibCalls] Lower fls() to llvm.ctlz(). · 85ad36b0
Davide Italiano authored Dec 15, 2016
```
Differential Revision:  https://reviews.llvm.org/D14590

llvm-svn: 289894
```
85ad36b0

[SimplifyLibCalls] Remove redundant folding logic for ffs(). · 890e8503

Davide Italiano authored Dec 15, 2016

Lowering to llvm.cttz() will result in constant folding anyway
if the argument to ffs is a constant. Pointed out by Eli for
fls() in D14590.

llvm-svn: 289888

890e8503

Dec 15, 2016

[ThinLTO] Revert part of r289843 that belonged to another patch. · eb0ac241

Teresa Johnson authored Dec 15, 2016

The code change for D27687 accidentally got committed along with the
main change in r289843. Revert it temporarily, so that I can recommit it
along with its test as intended.

llvm-svn: 289875

eb0ac241

[ThinLTO] Remove stale comment (NFC) · 0c3f57b1
Teresa Johnson authored Dec 15, 2016
```
This should have been removed with r288446.

llvm-svn: 289871
```
0c3f57b1

[ThinLTO] Thin link efficiency: skip candidate added later with higher threshold (NFC) · 475b51a7

Teresa Johnson authored Dec 15, 2016

Summary:
Thin link efficiency improvement. After adding an importing candidate to
the worklist we might have later added it again with a higher threshold.
Skip it when popped from the worklist if we recorded a higher threshold
than the current worklist entry, it will get processed again at the
higher threshold when that entry is popped.

This required adding the summary's GUID to the worklist, so that it can
be used to query the recorded highest threshold for it when we pop from the
worklist.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27696

llvm-svn: 289867

475b51a7

[LV] Enable vectorization of loops with conditional stores by default · 6a98bcfe

Matthew Simpson authored Dec 15, 2016

This patch sets the default value of the "-enable-cond-stores-vec" command line
option to "true".

Differential Revision: https://reviews.llvm.org/D27814

llvm-svn: 289863

6a98bcfe

[SimplifyCFG] Merge debug locations when hoisting an instruction from a then/else branch. NFC. · f20c57ec

Andrea Di Biagio authored Dec 15, 2016

Now that a new API to merge debug locations has been committed at r289661 (see
review D26256 for more details), we can use it to "improve" the code added by
revision r280995.

Instead of nulling the debugloc of a commoned instruction, we use the 'merged'
debug location. At the moment, this is just a no functional change since
function `DILocation::getMergedLocation()` is just a stub and would always
return a null location.

Differential Revision: https://reviews.llvm.org/D27804

llvm-svn: 289862

f20c57ec

[InstCombine] add folds for icmp (smin X, Y), X · d640641a

Sanjay Patel authored Dec 15, 2016

Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns.
This patch won't solve the example that was attached to that thread, so something else still needs fixing.

The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that
we want to fold to already exists, but sometimes it's the swapped form of what we want.

Corresponding changes for smax/umin/umax to follow.

Differential Revision: https://reviews.llvm.org/D27531

llvm-svn: 289855

d640641a

[ThinLTO] Ensure callees get hot threshold when first seen on cold path · 1b859a23

Teresa Johnson authored Dec 15, 2016

This is split out from D27696, since it turned out to be a bug fix and
not part of the NFC efficiency change.

Keep the same adjusted (possibly decayed) threshold in both the worklist
and the ImportList. Otherwise if we encountered it first along a cold
path, the callee would be added to the worklist with a lower decayed
threshold than when it is later encountered along a hot path. But the
logic uses the threshold recorded in the ImportList entry to check if
we should re-add it, and without this patch the threshold recorded there
is the same along both paths so we don't re-add it. Using the
same possibly decayed threshold in the ImportList ensures we re-add it
later with the higher non-decayed hot path threshold.

llvm-svn: 289843

1b859a23

Revert "[SimplifyCFG] In sinkLastInstruction correctly set debugloc of common inst" · 6ea759a8
Robert Lougher authored Dec 15, 2016
```
Reverting as it is causing buildbot failures (address sanitizer).

llvm-svn: 289833
```
6ea759a8

[SimplifyCFG] In sinkLastInstruction correctly set debugloc of "common" inst · cf176742

Robert Lougher authored Dec 15, 2016

Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.

Differential Revision: https://reviews.llvm.org/D27590

llvm-svn: 289828

cf176742

[InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp · 795b0671

Ehsan Amiri authored Dec 15, 2016

A number of new patterns for simplifying and/xor of icmp:

(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.

(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
   %s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.

llvm-svn: 289813

795b0671

[AVX-512][InstCombine] Add masked scalar FMA intrinsics to SimplifyDemandedVectorElts. · ab5f355d
Craig Topper authored Dec 15, 2016
```
llvm-svn: 289759
```
ab5f355d

Remove the AssumptionCache · 3ca4a6bc

Hal Finkel authored Dec 15, 2016

After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...

llvm-svn: 289756

3ca4a6bc

Make processing @llvm.assume more efficient by using operand bundles · cb9f78e1

Hal Finkel authored Dec 15, 2016

There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).

Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.

At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.

Differential Revision: https://reviews.llvm.org/D27259

llvm-svn: 289755

cb9f78e1

Dec 14, 2016

Only sets profile summary when it was not preset. · 40dd8c51

Dehao Chen authored Dec 14, 2016

Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass.

Reviewers: eraman, davidxl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D27733

llvm-svn: 289725

40dd8c51

Fix the bug in r289714 (NFC). · fb699619
Dehao Chen authored Dec 14, 2016
```
llvm-svn: 289724
```
fb699619