Commits · 1a554be3b609959bd0611c5cb3a48c352ba3ebc3 · Lorenzo Albano / LLVM bpEVL

Sep 27, 2016

[LoopSimplify] When simplifying phis in loop-simplify, do it only if it preserves LCSSA form. · 1a554be3
Michael Zolotukhin authored Sep 27, 2016
```
llvm-svn: 282541
```
1a554be3

Output optimization remarks in YAML · a62b7e1a

Adam Nemet authored Sep 27, 2016

(Re-committed after moving the template specialization under the yaml
namespace.  GCC was complaining about this.)

This allows various presentation of this data using an external tool.
This was first recommended here[1].

As an example, consider this module:

  1 int foo();
  2 int bar();
  3
  4 int baz() {
  5   return foo() + bar();
  6 }

The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):

  remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
  remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)

Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:

  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: foo
    - String:  will not be inlined into
    - Caller: baz
  ...
  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: bar
    - String:  will not be inlined into
    - Caller: baz
  ...

This is a summary of the high-level decisions:

* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:

   ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                DEBUG_TYPE, "NotInlined", &I)
            << NV("Callee", Callee) << " will not be inlined into "
            << NV("Caller", CS.getCaller()) << setIsVerbose());

NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.

Subsequent patches will update ORE users to use the new streaming API.

* I am using YAML I/O for writing the YAML file.  YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types.  Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.

On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).

* The YAML stream is stored in the LLVM context.

* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".

* As before hotness is computed in the analysis pass instead of
DiganosticInfo.  This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.

[1] https://reviews.llvm.org/D19678#419445

Differential Revision: https://reviews.llvm.org/D24587

llvm-svn: 282539

a62b7e1a

[DebugInfo] Add comments to phi dbg.value tracking code, NFC · 6481822e

Reid Kleckner authored Sep 27, 2016

LLVM developers might be surprised to learn that there are blocks
without valid insertion points (catchswitch), so it seems worth calling
that out explicitly.  Also add a FIXME about what we should really be
doing if we ever need to make optimized Windows EH code debuggable.

While I'm here, make auto usage more consistent with LLVM standards and
avoid an unecessary call to insertBefore.

llvm-svn: 282521

6481822e

Revert "Output optimization remarks in YAML" · cc2a3fa8
Adam Nemet authored Sep 27, 2016
```
This reverts commit r282499.

The GCC bots are failing

llvm-svn: 282503
```
cc2a3fa8

Output optimization remarks in YAML · 92e928c1

Adam Nemet authored Sep 27, 2016

This allows various presentation of this data using an external tool.
This was first recommended here[1].

As an example, consider this module:

  1 int foo();
  2 int bar();
  3
  4 int baz() {
  5   return foo() + bar();
  6 }

The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):

  remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
  remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)

Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:

  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: foo
    - String:  will not be inlined into
    - Caller: baz
  ...
  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: bar
    - String:  will not be inlined into
    - Caller: baz
  ...

This is a summary of the high-level decisions:

* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:

   ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                DEBUG_TYPE, "NotInlined", &I)
            << NV("Callee", Callee) << " will not be inlined into "
            << NV("Caller", CS.getCaller()) << setIsVerbose());

NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.

Subsequent patches will update ORE users to use the new streaming API.

* I am using YAML I/O for writing the YAML file.  YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types.  Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.

On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).

* The YAML stream is stored in the LLVM context.

* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".

* As before hotness is computed in the analysis pass instead of
DiganosticInfo.  This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.

[1] https://reviews.llvm.org/D19678#419445

Differential Revision: https://reviews.llvm.org/D24587

llvm-svn: 282499

92e928c1

[sanitizer-coverage] fix a bug in trace-gep · 45c14475
Kostya Serebryany authored Sep 27, 2016
```
llvm-svn: 282467
```
45c14475
[sanitizer-coverage] don't emit the CTOR function if nothing has been instrumented · 186d6180
Kostya Serebryany authored Sep 27, 2016
```
llvm-svn: 282465
```
186d6180

Revert r277556. Add -lowertypetests-bitsets-level to control bitsets generation · 4ff4f21e

Ivan Krasin authored Sep 27, 2016

Summary:
We don't currently need this facility for CFI. Disabling individual hot methods proved
to be a better strategy in Chrome.

Also, the design of the feature is suboptimal, as pointed out by Peter Collingbourne.

Reviewers: pcc

Subscribers: kcc

Differential Revision: https://reviews.llvm.org/D24948

llvm-svn: 282461

4ff4f21e

LowerTypeTests: Remove unused variable. · 53a852b6
Peter Collingbourne authored Sep 26, 2016
```
llvm-svn: 282456
```
53a852b6
LowerTypeTests: Create LowerTypeTestsModule class and move implementation... · 6ed92e3f
Peter Collingbourne authored Sep 26, 2016
```
LowerTypeTests: Create LowerTypeTestsModule class and move implementation there. Related simplifications.

llvm-svn: 282455
```
6ed92e3f

Sep 26, 2016

[thinlto] Basic thinlto fdo heuristic · d9830eb7

Piotr Padlewski authored Sep 26, 2016

Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.

I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.

I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.

Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.

Reviewers: eraman, mehdi_amini, tejohnson

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24638

llvm-svn: 282437

d9830eb7

Remove pruning of phi nodes in MemorySSA - it makes updating harder · 1e98c042

Daniel Berlin authored Sep 26, 2016

Reviewers: george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24923

llvm-svn: 282419

1e98c042

[LV] Scalarize instructions marked scalar after vectorization · b764aba2

Matthew Simpson authored Sep 26, 2016

This patch ensures that we actually scalarize instructions marked scalar after
vectorization. Previously, such instructions may have been vectorized instead.

Differential Revision: https://reviews.llvm.org/D23889

llvm-svn: 282418

b764aba2

[Coroutines] Part14: Handle coroutines with no suspend points. · bc0ebb38

Gor Nishanov authored Sep 26, 2016

Summary:
If coroutine has no suspend points, remove heap allocation and turn a coroutine into a normal function.

Also, if a pattern is detected that coroutine resumes or destroys itself prior to coro.suspend call, turn the suspend point into a simple jump to resume or cleanup label. This pattern occurs when coroutines are used to propagate errors in functions that return expected<T>.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24408

llvm-svn: 282414

bc0ebb38

[InstCombine] Fixed bug introduced in r282237 · 793c946e

Alexey Bataev authored Sep 26, 2016

The index of the new insertelement instruction was evaluated in the
wrong way, it was considered as the index of the inserted value instead
of index of the position, where the value should be inserted.

llvm-svn: 282401

793c946e

[InstCombine] Teach the udiv folding logic how to handle constant expressions. · a82d52d1

Andrea Di Biagio authored Sep 26, 2016

This patch fixes PR30366.

Function foldUDivShl() worked under the assumption that one of the values
in input to the function was always an instance of llvm::Instruction.
However, function visitUDivOperand() (the only user of foldUDivShl) was
clearly violating that precondition; internally, visitUDivOperand() uses pattern
matches to check the operands of a udiv. Pattern matchers for binary operators
know how to handle both Instruction and ConstantExpr values.

This patch fixes the problem in foldUDivShl(). Now we use pattern matchers
instead of explicit casts to Instruction. The reduced test case from PR30366
has been added to test file InstCombine/udiv-simplify.ll.

Differential Revision: https://reviews.llvm.org/D24565

llvm-svn: 282398

a82d52d1

Sep 24, 2016

ObjCARC: Don't look at users of ConstantData · 11c06ea5

Duncan P. N. Exon Smith authored Sep 24, 2016

Stop looking at users of UndefValue and ConstantPointerNull in the
objective C ARC optimizers.  The other users aren't actually
interesting, since they're not pointing at a particular object.  I
imagine these calls could be optimized through -instcombine... maybe
they already are?

These early returns will be required at some point in the future, with a
WIP patch that asserts when someone accesses a use-list on ConstantData.

llvm-svn: 282338

11c06ea5

Scalar: Ignore ConstantData in processAssumption · 4fd9b7e1

Duncan P. N. Exon Smith authored Sep 24, 2016

Assumptions on UndefValue and ConstantPointerNull aren't relevant to
other users.  Ignore them entirely to avoid wasting cycles walking
through their (possibly extremely extensive (cross-module)) use-lists.

It wasn't clear how to add a specific test for this, and it'll be
covered anyway by an eventual patch that asserts when trying to access
the use-list of an instance of ConstantData.

llvm-svn: 282334

4fd9b7e1

GlobalStatus: Don't walk use-lists of ConstantData · c82c1142

Duncan P. N. Exon Smith authored Sep 24, 2016

Return early from llvm::isSafeToDestroyConstant() whenever the value
`isa<ConstantData>()`.  These constants are shared across the
LLVMContext.  We never really want to delete them here, and walking
their use-lists can be very expensive.

(This is motivated by an eventual goal of removing use-lists entirely
from ConstantData.)

llvm-svn: 282320

c82c1142

Sep 23, 2016

[InstCombine] Fix for PR29124: reduce insertelements to shufflevector · fee9078d

Alexey Bataev authored Sep 23, 2016

If inserting more than one constant into a vector:

define <4 x float> @foo(<4 x float> %x) {
  %ins1 = insertelement <4 x float> %x, float 1.0, i32 1
  %ins2 = insertelement <4 x float> %ins1, float 2.0, i32 2
  ret <4 x float> %ins2
}

InstCombine could reduce that to a shufflevector:

define <4 x float> @goo(<4 x float> %x) {
 %shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32><i32 0, i32 5, i32 6, i32 3>
 ret <4 x float> %shuf
}
Also, InstCombine tries to convert shuffle instruction to single insertelement, if one of the vectors is a constant vector and only a single element from this constant should be used in shuffle, i.e.
shufflevector <4 x float> %v, <4 x float> <float undef, float 1.0, float
undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> ->
insertelement <4 x float> %v, float 1.0, 1

Differential Revision: https://reviews.llvm.org/D24182

llvm-svn: 282237

fee9078d

[InstCombine] fold X urem C -> X < C ? X : X - C when C is big (PR28672) · 30ef70b0

Sanjay Patel authored Sep 22, 2016

We already have the udiv variant of this transform, so I think this is ok for 
InstCombine too even though there is an increase in IR instructions. As the 
tests and TODO comments show, the transform can lead to follow-on combines.

This should fix: https://llvm.org/bugs/show_bug.cgi?id=28672

Differential Revision: https://reviews.llvm.org/D24527

llvm-svn: 282209

30ef70b0

Sep 22, 2016

Revert r282168 "GVN-hoist: fix store past load dependence analysis (PR30216)" · c7957ef8

Hans Wennborg authored Sep 22, 2016

and also the dependent r282175 "GVN-hoist: do not dereference null pointers"

It's causing compiler crashes building Harfbuzz (PR30499).

llvm-svn: 282199

c7957ef8

GVN-hoist: do not dereference null pointers · 1531f30c

Sebastian Pop authored Sep 22, 2016

there may be basic blocks without memory accesses, in which case the
list of accesses is a null pointer.

llvm-svn: 282175

1531f30c

GVN-hoist: fix store past load dependence analysis (PR30216) · 8e6e3318

Sebastian Pop authored Sep 22, 2016

To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.

The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().

Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.

Differential Revision: https://reviews.llvm.org/D24517

llvm-svn: 282168

8e6e3318

GVN-hoist: fix typo · 5d68aa79
Sebastian Pop authored Sep 22, 2016
```
llvm-svn: 282165
```
5d68aa79
[compiler-rt] fix typo in option description [NFC] · 7f0e3153
Etienne Bergeron authored Sep 22, 2016
```
llvm-svn: 282163
```
7f0e3153

GVN-hoist: only hoist relevant scalar instructions · 440f15b7

Sebastian Pop authored Sep 22, 2016

Without this patch, GVN-hoist would think that a branch instruction is a scalar instruction
and would try to value number it. The patch filters out all such kind of irrelevant instructions.

A bit frustrating is that there is no easy way to discard all those very infrequent instructions,
a bit like isa<TerminatorInst> that stands for a large family of instructions. I'm thinking that
checking for those very infrequent other instructions would cost us more in compilation time
than just letting those instructions getting numbered, so I'm still thinking that a simpler check:

if (isa<TerminatorInst>(I))
return false;

is better than listing all the other less frequent instructions.

Differential Revision: https://reviews.llvm.org/D23929

llvm-svn: 282160

440f15b7

Reapplying r281895 (and follow-up r281964) after fixing pr30468. · ba159897

Keith Walker authored Sep 22, 2016

The additional fix is:

When adding debug information to a lowered phi node in mem2reg
check that we have a valid insertion point after the phi for adding
the debug information.

This change addresses the issue in pr30468 where a lowered phi was
added before a catchswitch and no debug information should be added
after the phi in this case.

Differential Revision: https://reviews.llvm.org/D24797

llvm-svn: 282155

ba159897

[RS4GC] Remat in presence of phi and use live value · 82c3717f
Anna Thomas authored Sep 22, 2016
```
Summary:

Reviewers:

Subscribers:

llvm-svn: 282150
```
82c3717f

[EfficiencySanitizer] Using '$' instead of '#' for struct counter name · e74eb4e7

Sagar Thakur authored Sep 22, 2016

For MIPS '#' is the start of comment line. Therefore we get assembler errors if # is used in the structure names.

Differential: D24334
Reviewed by: zhaoqin

llvm-svn: 282141

e74eb4e7

Fix revision 281960 · d1247a68
Dorit Nuzman authored Sep 22, 2016
```
llvm-svn: 282139
```
d1247a68

Sep 21, 2016

[LoopInterchange] Track all dependencies, not just anti dependencies. · 00eb8db3

Chad Rosier authored Sep 21, 2016

Currently, we give up on loop interchange if we encounter a flow dependency
anywhere in the loop list. Worse yet, we don't even track output dependencies.

This patch updates the dependency matrix computation to track flow and output
dependencies in the same way we track anti dependencies.

This improves an internal workload by 2.2x.

Note the loop interchange pass is off by default and it can be enabled with
'-mllvm -enable-loopinterchange'

Differential Revision: https://reviews.llvm.org/D24564

llvm-svn: 282101

00eb8db3

revert 281908 because 281909 got reverted · a4894388
Nico Weber authored Sep 21, 2016
```
llvm-svn: 282097
```
a4894388

[LV] Don't emit unused scalars for uniform instructions · 15869f86

Matthew Simpson authored Sep 21, 2016

If we identify an instruction as uniform after vectorization, we know that we
should only use the value corresponding to the first vector lane of each unroll
iteration. However, when scalarizing such instructions, we still produce values
for the other vector lanes. This patch prevents us from generating the unused
scalars.

Differential Revision: https://reviews.llvm.org/D24275

llvm-svn: 282087

15869f86

Change the basic block weight calculation algorithm to use max instead of voting. · 160fbc3f

Dehao Chen authored Sep 21, 2016

Summary: Now that we have more precise debug info, we should change back to use maximum to get basic block weight.

Reviewers: dnovillo

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D24788

llvm-svn: 282084

160fbc3f

[LV] Rename "Width" to "Lane" (NFC) · a95e8bb7
Matthew Simpson authored Sep 21, 2016
```
llvm-svn: 282083
```
a95e8bb7
Revert r281895 "Add @llvm.dbg.value entries for the phi node created by -mem2reg" · 1049085c
Hans Wennborg authored Sep 21, 2016
```
(And follow-up r281964.)

It caused PR30468.

llvm-svn: 282077
```
1049085c
DeadArgElim: Don't mark swifterror arguments as unused · f62ba103
Arnold Schwaighofer authored Sep 21, 2016
```
Replacing swifterror arguments with undef creates invalid IR.

rdar://28300490

llvm-svn: 282075
```
f62ba103
[LoopInterchange] Various cleanup. NFC. · f7c76f91
Chad Rosier authored Sep 21, 2016
```
llvm-svn: 282071
```
f7c76f91
code cleanup -- commoning IR travsersals · 9780fc14
Xinliang David Li authored Sep 20, 2016
```
llvm-svn: 282034
```
9780fc14