Commits · 1a0877f99dc8130f3b85ab096bc03059e4c3cd0a · Roger Ferrer / llvm-epi-0.8

Mar 29, 2012

Don't kill the base register when expanding strd. · b6a7a892

Jakob Stoklund Olesen authored Mar 28, 2012

When an strd instruction doesn't get the registers it wants, it can be
expanded into two str instructions. Make sure the first str doesn't kill
the base register in the case where the base and data registers are
identical:

  t2STRi12 %R0<kill>, %R0, 4, pred:14, pred:%noreg
  t2STRi12 %R2<kill>, %R0, 8, pred:14, pred:%noreg

<rdar://problem/11101911>

llvm-svn: 153611

b6a7a892

Preserve implicit defs in ARMLoadStoreOptimizer. · cdee326a

Jakob Stoklund Olesen authored Mar 28, 2012

When a number of sub-register VLRDS instructions are combined into a
VLDM, preserve any super-register implicit defs. This is required to
keep the register scavenger and machine code verifier happy.

Enable machine code verification after ARMLoadStoreOptimizer.
ARM/2012-01-26-CopyPropKills.ll was failing because of this.

llvm-svn: 153610

cdee326a

Mar 28, 2012
- Move getPointerToNamedFunction() from JIT/MCJIT to JITMemoryManager. · bfee542c
  Danil Malyshev authored Mar 28, 2012
```
llvm-svn: 153607
```
  bfee542c
- Handle intrinsics in GlobalsModRef. Fixes pr12351. · 5054ee82
  Rafael Espindola authored Mar 28, 2012
```
llvm-svn: 153604
```
  5054ee82
- Spill DPair registers, not just QPR. · 9e512120
  Jakob Stoklund Olesen authored Mar 28, 2012
```
The arm_neon intrinsics can create virtual registers from the DPair
register class which allows both even-odd and odd-even D-register pairs.

This fixes PR12389.

llvm-svn: 153603
```
  9e512120
- Also verify after ExpandPostRAPseudos. · e433c68d
  Jakob Stoklund Olesen authored Mar 28, 2012
```
llvm-svn: 153599
```
  e433c68d
- Enable machine code verification after the late machine optimization passes. · 341e06f8
  Jakob Stoklund Olesen authored Mar 28, 2012
```
Branch folding invalidates liveness and disables liveness verification
on some targets.

llvm-svn: 153597
```
  341e06f8
- Skip liveness verification when MRI->tracksLiveness() is false. · b21df32c
  Jakob Stoklund Olesen authored Mar 28, 2012
```
Extract the liveness verification into its own method.

This makes it possible to run the machine code verifier after liveness
information is no longer required to be valid.

llvm-svn: 153596
```
  b21df32c
- Revert r153516: "Invalidate liveness in Thumb2ITBlockPass." · 8cb97523
  Jakob Stoklund Olesen authored Mar 28, 2012
```
Revert r153519: "ARMLoadStoreOptimizer invalidates register liveness."

These patches caused miscompilations in povray by turning off branch
folding's updating of live-in lists.

It turns out the the late scheduler depends on the live-in lists, even
if it doesn't need correct kill flags.

<rdar://problem/11139228>

llvm-svn: 153593
```
  8cb97523
- Allow removeLiveIn to be called with a register that isn't live-in. · 8e58c90f
  Jakob Stoklund Olesen authored Mar 28, 2012
```
This avoids the silly double search:

  if (isLiveIn(Reg))
    removeLiveIn(Reg);

llvm-svn: 153592
```
  8e58c90f
- Revert r153521 as it's causing large regressions on the nightly testers. · e27081d3
  Chad Rosier authored Mar 28, 2012
```
Original commit message for r153521 (aka r153423):
Use the new range metadata in computeMaskedBits and add a new optimization to
instruction simplify that lets us remove an and when loding a boolean value.

llvm-svn: 153587
```
  e27081d3
- Fixed commuteInstructions bug where if its called pre-regalloc the subreg indices weren't commuted · 148ebb88
  Pete Cooper authored Mar 28, 2012
```
llvm-svn: 153579
```
  148ebb88
- GlobalOpt: If we have an inbounds GEP from a ConstantAggregateZero global that... · aa9e4a5e
  Benjamin Kramer authored Mar 28, 2012
```
GlobalOpt: If we have an inbounds GEP from a ConstantAggregateZero global that we just determined to be constant, replace all loads from it with a zero value.

llvm-svn: 153576
```
  aa9e4a5e
- Add another note about a missed compare with nsw arithmetic instcombine. · 20b32d2d
  Benjamin Kramer authored Mar 28, 2012
```
llvm-svn: 153574
```
  20b32d2d
- Fixup VST1.32 with writeback instruction. Also re-factor non-writeback version. · 7ce39497
  Richard Barton authored Mar 28, 2012
```
llvm-svn: 153573
```
  7ce39497
- Switch to WeakVHs in the value mapper, and aggressively prune dead basic · 772c88b8
  Chandler Carruth authored Mar 28, 2012
```
blocks in the function cloner. This removes the last case of trivially
dead code that I've been seeing in the wild getting inlined, analyzed,
re-inlined, optimized, only to be deleted. Nukes a FIXME from the
cleanup tests.

llvm-svn: 153572
```
  772c88b8
- More debug output. · 24a62985
  Eric Christopher authored Mar 28, 2012
```
llvm-svn: 153571
```
  24a62985
- Fix the output of the DW_TAG_friend tag to include DW_AT_friend · 7285c7d5
  Eric Christopher authored Mar 28, 2012
```
and not the rest of the member tag.

Fixes PR11695

llvm-svn: 153570
```
  7285c7d5
- Turn off post-RA scheduler by default. · 2c67006c
  Akira Hatanaka authored Mar 28, 2012
```
llvm-svn: 153557
```
  2c67006c
- Fix 80-column violation. · bb2a6da4
  Chad Rosier authored Mar 28, 2012
```
llvm-svn: 153556
```
  bb2a6da4
- Turn on post register allocation scheduler. · 047473e2
  Akira Hatanaka authored Mar 28, 2012
```
llvm-svn: 153554
```
  047473e2
- Sort relocation entries before they are written out to a file. MIPS ABI · 5ba593f5
  Akira Hatanaka authored Mar 28, 2012
```
imposes a constraint that GOT16 referring to a local symbol or HI16 has to be
followed immediately by a matching LO16 relocation.

llvm-svn: 153553
```
  5ba593f5
- Emit all directives except for ".cprestore" during asm printing rather than emit · 34ee3ff8
  Akira Hatanaka authored Mar 28, 2012
```
them as machine instructions. Directives ".set noat" and ".set at" are now
emitted only at the beginning and end of a function except in the case where
they are emitted to enclose .cpload with an immediate operand that doesn't fit
in 16-bit field or unaligned load/stores.

Also, make the following changes:
- Remove function isUnalignedLoadStore and use a switch-case statement to
  determine whether an instruction is an unaligned load or store.

- Define helper function CreateMCInst which generates an instance of an MCInst
  from an opcode and a list of operands.

llvm-svn: 153552
```
  34ee3ff8
- Mark flag neverHasSideEffects of pattern-less instructions that do not have · 1518a5fa
  Akira Hatanaka authored Mar 28, 2012
```
any side effects.

llvm-svn: 153551
```
  1518a5fa
- Add a note about a cute little fabs optimization. · 2735c019
  Benjamin Kramer authored Mar 27, 2012
```
llvm-svn: 153543
```
  2735c019
- Add two missed instcombines related to compares with nsw arithmetic. · f0901459
  Benjamin Kramer authored Mar 27, 2012
```
llvm-svn: 153542
```
  f0901459
Mar 27, 2012

Remove trailing white space. · 52656d10
Akira Hatanaka authored Mar 27, 2012
```
llvm-svn: 153536
```
52656d10
Use a SmallVector and linear lookup instead of a DenseSet - SourceMap values · 5544bf1b
Lang Hames authored Mar 27, 2012
```
will always be tiny sets, so DenseSet is overkill (SmallSet won't work as we
need iteration support). 

llvm-svn: 153529
```
5544bf1b
Add member EmitNOAT and its setter and getter functions to class MipsFunctionInfo. · a25fe221
Akira Hatanaka authored Mar 27, 2012
```
If EmitNOAT is true, directives ".set noat" and ".set at" are emitted at the
beginning and end of a function. 

llvm-svn: 153528
```
a25fe221
Use DW_AT_low_pc for a single entry point into a routine. · 7ed2efca
Eric Christopher authored Mar 27, 2012
```
Fixes PR10105

llvm-svn: 153524
```
7ed2efca

Reapply r153423; the original commit was fine. The failing test, distray, had · 8e6dbccd

Chad Rosier authored Mar 27, 2012

undefined behavior, which Rafael was kind enough to fix.

Original commit message for r153423:
Use the new range metadata in computeMaskedBits and add a new optimization to
instruction simplify that lets us remove an and when loding a boolean value.

llvm-svn: 153521

8e6dbccd

ARMLoadStoreOptimizer invalidates register liveness. · 4acbcb31

Jakob Stoklund Olesen authored Mar 27, 2012

This pass tries to update kill flags, but there are still many bugs.
Passes after the load/store optimizer don't need accurate liveness, so
don't even try.

<rdar://problem/11101911>

llvm-svn: 153519

4acbcb31

Print SSA and liveness tracking flags in MF::print(). · 6c08534a
Jakob Stoklund Olesen authored Mar 27, 2012
```
llvm-svn: 153518
```
6c08534a

Branch folding may invalidate liveness. · d1664a15

Jakob Stoklund Olesen authored Mar 27, 2012

Branch folding can use a register scavenger to update liveness
information when required. Don't do that if liveness information is
already invalid.

llvm-svn: 153517

d1664a15

Invalidate liveness in Thumb2ITBlockPass. · 14459cdc
Jakob Stoklund Olesen authored Mar 27, 2012
```
llvm-svn: 153516
```
14459cdc
fix what looks like a real logic bug, found by PVS-Studio (part of PR12357) · 1cc25e8a
Chris Lattner authored Mar 27, 2012
```
llvm-svn: 153513
```
1cc25e8a

Add an MRI::tracksLiveness() flag. · 9c1ad5cb

Jakob Stoklund Olesen authored Mar 27, 2012

Late optimization passes like branch folding and tail duplication can
transform the machine code in a way that makes it expensive to keep the
register liveness information up to date. There is a fuzzy line between
register allocation and late scheduling where the liveness information
degrades.

The MRI::tracksLiveness() flag makes the line clear: While true,
liveness information is accurate, and can be used for register
scavenging. Once the flag is false, liveness information is not
accurate, and can only be used as a hint.

Late passes generally don't need the liveness information, but they will
sometimes use the register scavenger to help update it. The scavenger
enforces strict correctness, and we have to spend a lot of code to
update register liveness that may never be used.

llvm-svn: 153511

9c1ad5cb

Make a seemingly tiny change to the inliner and fix the generated code · b9e35fbc

Chandler Carruth authored Mar 27, 2012

size bloat. Unfortunately, I expect this to disable the majority of the
benefit from r152737. I'm hopeful at least that it will fix PR12345. To
explain this requires... quite a bit of backstory I'm afraid.

TL;DR: The change in r152737 actually did The Wrong Thing for
linkonce-odr functions. This change makes it do the right thing. The
benefits we saw were simple luck, not any actual strategy. Benchmark
numbers after a mini-blog-post so that I've written down my thoughts on
why all of this works and doesn't work...

To understand what's going on here, you have to understand how the
"bottom-up" inliner actually works. There are two fundamental modes to
the inliner:

1) Standard fixed-cost bottom-up inlining. This is the mode we usually
   think about. It walks from the bottom of the CFG up to the top,
   looking at callsites, taking information about the callsite and the
   called function and computing th expected cost of inlining into that
   callsite. If the cost is under a fixed threshold, it inlines. It's
   a touch more complicated than that due to all the bonuses, weights,
   etc. Inlining the last callsite to an internal function gets higher
   weighth, etc. But essentially, this is the mode of operation.

2) Deferred bottom-up inlining (a term I just made up). This is the
   interesting mode for this patch an r152737. Initially, this works
   just like mode #1, but once we have the cost of inlining into the
   callsite, we don't just compare it with a fixed threshold. First, we
   check something else. Let's give some names to the entities at this
   point, or we'll end up hopelessly confused. We're considering
   inlining a function 'A' into its callsite within a function 'B'. We
   want to check whether 'B' has any callers, and whether it might be
   inlined into those callers. If so, we also check whether inlining 'A'
   into 'B' would block any of the opportunities for inlining 'B' into
   its callers. We take the sum of the costs of inlining 'B' into its
   callers where that inlining would be blocked by inlining 'A' into
   'B', and if that cost is less than the cost of inlining 'A' into 'B',
   then we skip inlining 'A' into 'B'.

Now, in order for #2 to make sense, we have to have some confidence that
we will actually have the opportunity to inline 'B' into its callers
when cheaper, *and* that we'll be able to revisit the decision and
inline 'A' into 'B' if that ever becomes the correct tradeoff. This
often isn't true for external functions -- we can see very few of their
callers, and we won't be able to re-consider inlining 'A' into 'B' if
'B' is external when we finally see more callers of 'B'. There are two
cases where we believe this to be true for C/C++ code: functions local
to a translation unit, and functions with an inline definition in every
translation unit which uses them. These are represented as internal
linkage and linkonce-odr (resp.) in LLVM. I enabled this logic for
linkonce-odr in r152737.

Unfortunately, when I did that, I also introduced a subtle bug. There
was an implicit assumption that the last caller of the function within
the TU was the last caller of the function in the program. We want to
bonus the last caller of the function in the program by a huge amount
for inlining because inlining that callsite has very little cost.
Unfortunately, the last caller in the TU of a linkonce-odr function is
*not* the last caller in the program, and so we don't want to apply this
bonus. If we do, we can apply it to one callsite *per-TU*. Because of
the way deferred inlining works, when it sees this bonus applied to one
callsite in the TU for 'B', it decides that inlining 'B' is of the
*utmost* importance just so we can get that final bonus. It then
proceeds to essentially force deferred inlining regardless of the actual
cost tradeoff.

The result? PR12345: code bloat, code bloat, code bloat. Another result
is getting *damn* lucky on a few benchmarks, and the over-inlining
exposing critically important optimizations. I would very much like
a list of benchmarks that regress after this change goes in, with
bitcode before and after. This will help me greatly understand what
opportunities the current cost analysis is missing.

Initial benchmark numbers look very good. WebKit files that exhibited
the worst of PR12345 went from growing to shrinking compared to Clang
with r152737 reverted.

- Bootstrapped Clang is 3% smaller with this change.
- Bootstrapped Clang -O0 over a single-source-file of lib/Lex is 4%
  faster with this change.

Please let me know about any other performance impact you see. Thanks to
Nico for reporting and urging me to actually fix, Richard Smith, Duncan
Sands, Manuel Klimek, and Benjamin Kramer for talking through the issues
today.

llvm-svn: 153506

b9e35fbc

Prune some includes · 1fcf5bca
Craig Topper authored Mar 27, 2012
```
llvm-svn: 153502
```
1fcf5bca
Remove unnecessary llvm:: qualifications · f6e7e12f
Craig Topper authored Mar 27, 2012
```
llvm-svn: 153500
```
f6e7e12f