Commits · 7ed2efca6aae0eeb2bc2cb3072d50f590786e990 · Roger Ferrer / llvm-epi-0.8

Mar 27, 2012

Use DW_AT_low_pc for a single entry point into a routine. · 7ed2efca
Eric Christopher authored Mar 27, 2012
```
Fixes PR10105

llvm-svn: 153524
```
7ed2efca

Reapply r153423; the original commit was fine. The failing test, distray, had · 8e6dbccd

Chad Rosier authored Mar 27, 2012

undefined behavior, which Rafael was kind enough to fix.

Original commit message for r153423:
Use the new range metadata in computeMaskedBits and add a new optimization to
instruction simplify that lets us remove an and when loding a boolean value.

llvm-svn: 153521

8e6dbccd

ARMLoadStoreOptimizer invalidates register liveness. · 4acbcb31

Jakob Stoklund Olesen authored Mar 27, 2012

This pass tries to update kill flags, but there are still many bugs.
Passes after the load/store optimizer don't need accurate liveness, so
don't even try.

<rdar://problem/11101911>

llvm-svn: 153519

4acbcb31

Print SSA and liveness tracking flags in MF::print(). · 6c08534a
Jakob Stoklund Olesen authored Mar 27, 2012
```
llvm-svn: 153518
```
6c08534a

Branch folding may invalidate liveness. · d1664a15

Jakob Stoklund Olesen authored Mar 27, 2012

Branch folding can use a register scavenger to update liveness
information when required. Don't do that if liveness information is
already invalid.

llvm-svn: 153517

d1664a15

Invalidate liveness in Thumb2ITBlockPass. · 14459cdc
Jakob Stoklund Olesen authored Mar 27, 2012
```
llvm-svn: 153516
```
14459cdc
fix what looks like a real logic bug, found by PVS-Studio (part of PR12357) · 1cc25e8a
Chris Lattner authored Mar 27, 2012
```
llvm-svn: 153513
```
1cc25e8a

Add an MRI::tracksLiveness() flag. · 9c1ad5cb

Jakob Stoklund Olesen authored Mar 27, 2012

Late optimization passes like branch folding and tail duplication can
transform the machine code in a way that makes it expensive to keep the
register liveness information up to date. There is a fuzzy line between
register allocation and late scheduling where the liveness information
degrades.

The MRI::tracksLiveness() flag makes the line clear: While true,
liveness information is accurate, and can be used for register
scavenging. Once the flag is false, liveness information is not
accurate, and can only be used as a hint.

Late passes generally don't need the liveness information, but they will
sometimes use the register scavenger to help update it. The scavenger
enforces strict correctness, and we have to spend a lot of code to
update register liveness that may never be used.

llvm-svn: 153511

9c1ad5cb

Make a seemingly tiny change to the inliner and fix the generated code · b9e35fbc

Chandler Carruth authored Mar 27, 2012

size bloat. Unfortunately, I expect this to disable the majority of the
benefit from r152737. I'm hopeful at least that it will fix PR12345. To
explain this requires... quite a bit of backstory I'm afraid.

TL;DR: The change in r152737 actually did The Wrong Thing for
linkonce-odr functions. This change makes it do the right thing. The
benefits we saw were simple luck, not any actual strategy. Benchmark
numbers after a mini-blog-post so that I've written down my thoughts on
why all of this works and doesn't work...

To understand what's going on here, you have to understand how the
"bottom-up" inliner actually works. There are two fundamental modes to
the inliner:

1) Standard fixed-cost bottom-up inlining. This is the mode we usually
   think about. It walks from the bottom of the CFG up to the top,
   looking at callsites, taking information about the callsite and the
   called function and computing th expected cost of inlining into that
   callsite. If the cost is under a fixed threshold, it inlines. It's
   a touch more complicated than that due to all the bonuses, weights,
   etc. Inlining the last callsite to an internal function gets higher
   weighth, etc. But essentially, this is the mode of operation.

2) Deferred bottom-up inlining (a term I just made up). This is the
   interesting mode for this patch an r152737. Initially, this works
   just like mode #1, but once we have the cost of inlining into the
   callsite, we don't just compare it with a fixed threshold. First, we
   check something else. Let's give some names to the entities at this
   point, or we'll end up hopelessly confused. We're considering
   inlining a function 'A' into its callsite within a function 'B'. We
   want to check whether 'B' has any callers, and whether it might be
   inlined into those callers. If so, we also check whether inlining 'A'
   into 'B' would block any of the opportunities for inlining 'B' into
   its callers. We take the sum of the costs of inlining 'B' into its
   callers where that inlining would be blocked by inlining 'A' into
   'B', and if that cost is less than the cost of inlining 'A' into 'B',
   then we skip inlining 'A' into 'B'.

Now, in order for #2 to make sense, we have to have some confidence that
we will actually have the opportunity to inline 'B' into its callers
when cheaper, *and* that we'll be able to revisit the decision and
inline 'A' into 'B' if that ever becomes the correct tradeoff. This
often isn't true for external functions -- we can see very few of their
callers, and we won't be able to re-consider inlining 'A' into 'B' if
'B' is external when we finally see more callers of 'B'. There are two
cases where we believe this to be true for C/C++ code: functions local
to a translation unit, and functions with an inline definition in every
translation unit which uses them. These are represented as internal
linkage and linkonce-odr (resp.) in LLVM. I enabled this logic for
linkonce-odr in r152737.

Unfortunately, when I did that, I also introduced a subtle bug. There
was an implicit assumption that the last caller of the function within
the TU was the last caller of the function in the program. We want to
bonus the last caller of the function in the program by a huge amount
for inlining because inlining that callsite has very little cost.
Unfortunately, the last caller in the TU of a linkonce-odr function is
*not* the last caller in the program, and so we don't want to apply this
bonus. If we do, we can apply it to one callsite *per-TU*. Because of
the way deferred inlining works, when it sees this bonus applied to one
callsite in the TU for 'B', it decides that inlining 'B' is of the
*utmost* importance just so we can get that final bonus. It then
proceeds to essentially force deferred inlining regardless of the actual
cost tradeoff.

The result? PR12345: code bloat, code bloat, code bloat. Another result
is getting *damn* lucky on a few benchmarks, and the over-inlining
exposing critically important optimizations. I would very much like
a list of benchmarks that regress after this change goes in, with
bitcode before and after. This will help me greatly understand what
opportunities the current cost analysis is missing.

Initial benchmark numbers look very good. WebKit files that exhibited
the worst of PR12345 went from growing to shrinking compared to Clang
with r152737 reverted.

- Bootstrapped Clang is 3% smaller with this change.
- Bootstrapped Clang -O0 over a single-source-file of lib/Lex is 4%
  faster with this change.

Please let me know about any other performance impact you see. Thanks to
Nico for reporting and urging me to actually fix, Richard Smith, Duncan
Sands, Manuel Klimek, and Benjamin Kramer for talking through the issues
today.

llvm-svn: 153506

b9e35fbc

Prune some includes · 1fcf5bca
Craig Topper authored Mar 27, 2012
```
llvm-svn: 153502
```
1fcf5bca
Remove unnecessary llvm:: qualifications · f6e7e12f
Craig Topper authored Mar 27, 2012
```
llvm-svn: 153500
```
f6e7e12f

Pass the llvm IR pointer value and offset to the constructor of · 8a7633c7

Akira Hatanaka authored Mar 27, 2012

MachinePointerInfo when getStore is called to create a node that stores an
argument passed in register to the stack. Without this change, the post RA 
scheduler will fail to discover the dependencies between the stores
instructions and the instructions that load from a structure passed by value. 

The link to the related discussion is here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048055.html

llvm-svn: 153499

8a7633c7

Fix bug in LowerConstantPool. · 769f69f9
Akira Hatanaka authored Mar 27, 2012
```
llvm-svn: 153498
```
769f69f9
Add T9 to the list of live-in registers of the entry basic block. · 2a36c9f4
Akira Hatanaka authored Mar 27, 2012
```
llvm-svn: 153497
```
2a36c9f4

Retrieve and add the offset of a symbol in applyFixup rather than retrieve and · fe384a2c

Akira Hatanaka authored Mar 27, 2012

set it in MipsMCCodeEmitter::getMachineOpValue. Assert in getMachineOpValue if
MachineOperand MO is of an unexpected type. 

llvm-svn: 153494

fe384a2c

Define function MipsGetSymAndOffset which returns a fixup's symbol and the · a06bc1c6
Akira Hatanaka authored Mar 27, 2012
```
offset applied to it.

llvm-svn: 153493
```
a06bc1c6
Post-ra LICM should take care not to hoist an instruction that would clobber a · 7fede873
Evan Cheng authored Mar 27, 2012
```
register that's read by the preheader terminator.

rdar://11095580

llvm-svn: 153492
```
7fede873
Rewrite computation of Value in adjustFixupValue so that the upper 48-bits are · da728197
Akira Hatanaka authored Mar 27, 2012
```
cleared. No functionality change.

llvm-svn: 153491
```
da728197

During MachineCopyPropagation a register may be the source operand of multiple · 551662bf

Lang Hames authored Mar 27, 2012

copies being considered for removal. Make sure to track all of the copies,
rather than just the most recent encountered, by holding a DenseSet instead of
an unsigned in SrcMap.

No test case - couldn't reduce something with a sane size.

llvm-svn: 153487

551662bf

Reserve hardware registers. · ba5100c1
Akira Hatanaka authored Mar 27, 2012
```
llvm-svn: 153486
```
ba5100c1

ARM has a peephole optimization which looks for a def / use pair. The def · a2b48d98

Evan Cheng authored Mar 26, 2012

produces a 32-bit immediate which is consumed by the use. It tries to 
fold the immediate by breaking it into two parts and fold them into the
immmediate fields of two uses. e.g
       movw    r2, #40885
       movt    r3, #46540
       add     r0, r0, r3
=>
       add.w   r0, r0, #3019898880
       add.w   r0, r0, #30146560
;
However, this transformation is incorrect if the user produces a flag. e.g.
       movw    r2, #40885
       movt    r3, #46540
       adds    r0, r0, r3
=>
       add.w   r0, r0, #3019898880
       adds.w  r0, r0, #30146560
Note the adds.w may not set the carry flag even if the original sequence
would.

rdar://11116189

llvm-svn: 153484

a2b48d98

Add a debug option to dump PBQP graphs during register allocation. · 95e021fa
Lang Hames authored Mar 26, 2012
```
llvm-svn: 153483
```
95e021fa

SCEV fix: Handle loop invariant loads. · 7004e4b9

Andrew Trick authored Mar 26, 2012

Fixes PR11882: NULL dereference in ComputeLoadConstantCompareExitLimit.

llvm-svn: 153480

7004e4b9

Mar 26, 2012
- Use the file in the inlined die rather than the compile unit for · 0925c62c
  Eric Christopher authored Mar 26, 2012
```
backtrace locations.

Testcase forthcoming, but I wanted to get some testing here.

Should fix:

PR12323
PR12314
rdar://11091100

llvm-svn: 153471
```
  0925c62c
- 153465 was incorrect. In this code we wanted to check that the pointer operand... · a8f3562e
  Nadav Rotem authored Mar 26, 2012
```
153465 was incorrect. In this code we wanted to check that the pointer operand is of pointer type (and not vector type).

llvm-svn: 153468
```
  a8f3562e
- Made RuntimeDyldMachO support vanilla i386 · a375943d
  Sean Callanan authored Mar 26, 2012
```
relocations.  The algorithm is the same as
that for x86_64.  Scattered relocations, a
feature present in i386 but not on x86_64,
are not yet supported.

llvm-svn: 153466
```
  a375943d
- PR12357: The pointer was used before it was checked. · e63e59cc
  Nadav Rotem authored Mar 26, 2012
```
llvm-svn: 153465
```
  e63e59cc
- LSR ivchain bug fix: corner case with ConstantExpr. · 14779cc4
  Andrew Trick authored Mar 26, 2012
```
Fixes PR11950.

llvm-svn: 153463
```
  14779cc4
- comment typo · 356a8963
  Andrew Trick authored Mar 26, 2012
```
llvm-svn: 153462
```
  356a8963
- eliminate an unneeded branch, part of PR12357 · b1e2e1e0
  Chris Lattner authored Mar 26, 2012
```
llvm-svn: 153458
```
  b1e2e1e0
- Tidy. · 2b40fdf3
  Eric Christopher authored Mar 26, 2012
```
llvm-svn: 153456
```
  2b40fdf3
- Tidy. · f16bee86
  Eric Christopher authored Mar 26, 2012
```
llvm-svn: 153455
```
  f16bee86
- Revert r153423 as this is causing failures on our internal nightly testers. · 08e57e5c
  Chad Rosier authored Mar 26, 2012
```
Original commit message:
Use the new range metadata in computeMaskedBits and add a new optimization to
instruction simplify that lets us remove an and when loading a boolean value.

llvm-svn: 153452
```
  08e57e5c
- LSR cleanup: potential bug caught by PVS-Studio. · e51feea7
  Andrew Trick authored Mar 26, 2012
```
Thanks Andrey.

llvm-svn: 153451
```
  e51feea7
- [tsan] treat vtable pointer updates in a special way (requires tbaa); fix a... · 6f8a7760
  Kostya Serebryany authored Mar 26, 2012
```
[tsan] treat vtable pointer updates in a special way (requires tbaa); fix a bug (forgot to return true after instrumenting); make sure the tsan tests are run

llvm-svn: 153448
```
  6f8a7760
- No need to do an expensive stable sort for a bunch of integers. · 3e6719c1
  Benjamin Kramer authored Mar 26, 2012
```
llvm-svn: 153438
```
  3e6719c1
- Add missing include of <new> · c0f63804
  Douglas Gregor authored Mar 26, 2012
```
llvm-svn: 153436
```
  c0f63804
- Fix GetMainExecutable on kFreeBSD. · 4547077a
  Anton Korobeynikov authored Mar 26, 2012
```
Patch by Sylvestre Ledru!

llvm-svn: 153435
```
  4547077a
- Prune some includes and forward declarations. · 6e80c280
  Craig Topper authored Mar 26, 2012
```
llvm-svn: 153429
```
  6e80c280
- Add a debug statement. · c1e2dcdb
  Eric Christopher authored Mar 26, 2012
```
llvm-svn: 153428
```
  c1e2dcdb