Commits · fd7475e90678e7a0293e07e87f82115625d69a37 · Roger Ferrer / llvm-epi-0.8

Oct 23, 2011

Now that we have comparison on probabilities, add some static functions · fd7475e9

Chandler Carruth authored Oct 23, 2011

to get important constant branch probabilities and use them for finding
the best branch out of a set of possibilities.

llvm-svn: 142762

fd7475e9

Remove a commented out line of code that snuck by my auditing. · 446210b6
Chandler Carruth authored Oct 23, 2011
```
llvm-svn: 142761
```
446210b6
Print branch probabilities as percentages. · cc0ed6ba
Benjamin Kramer authored Oct 23, 2011
```
50% is much more readable than 5.000000e-01.

llvm-svn: 142752
```
cc0ed6ba
Add compare operators to BranchProbability and use it to determine if an edge is hot. · 929f53f6
Benjamin Kramer authored Oct 23, 2011
```
llvm-svn: 142751
```
929f53f6

Completely re-write the algorithm behind MachineBlockPlacement based on · bd1be4d0

Chandler Carruth authored Oct 23, 2011

discussions with Andy. Fundamentally, the previous algorithm is both
counter productive on several fronts and prioritizing things which
aren't necessarily the most important: static branch prediction.

The new algorithm uses the existing loop CFG structure information to
walk through the CFG itself to layout blocks. It coalesces adjacent
blocks within the loop where the CFG allows based on the most likely
path taken. Finally, it topologically orders the block chains that have
been formed. This allows it to choose a (mostly) topologically valid
ordering which still priorizes fallthrough within the structural
constraints.

As a final twist in the algorithm, it does violate the CFG when it
discovers a "hot" edge, that is an edge that is more than 4x hotter than
the competing edges in the CFG. These are forcibly merged into
a fallthrough chain.

Future transformations that need te be added are rotation of loop exit
conditions to be fallthrough, and better isolation of cold block chains.
I'm also planning on adding statistics to model how well the algorithm
does at laying out blocks based on the probabilities it receives.

The old tests mostly still pass, and I have some new tests to add, but
the nested loops are still behaving very strangely. This almost seems
like working-as-intended as it rotated the exit branch to be
fallthrough, but I'm not convinced this is actually the best layout. It
is well supported by the probabilities for loops we currently get, but
those are pretty broken for nested loops, so this may change later.

llvm-svn: 142743

bd1be4d0

Add X86 RORX instruction · 980d5983
Craig Topper authored Oct 23, 2011
```
llvm-svn: 142741
```
980d5983

The element insertion code in scalar replacement doesn't handle incorrect · 057fbb1a

Cameron Zwarich authored Oct 23, 2011

element types, even though the element extraction code does. It is surprising
that this bug has been here for so long. Fixes <rdar://problem/10318778>.

llvm-svn: 142740

057fbb1a

Add X86 MULX instruction for disassembler. · e94d277d
Craig Topper authored Oct 23, 2011
```
llvm-svn: 142738
```
e94d277d
Remove some duplicate specifying of neverHasSideEffects and mayLoad from X86 multiply instructions. · 7412aa98
Craig Topper authored Oct 22, 2011
```
llvm-svn: 142737
```
7412aa98

Oct 22, 2011

A non-escaping malloc in the entry block is not unlike an alloca. Do dead-store · 32f8051d
Nick Lewycky authored Oct 22, 2011
```
elimination on them too.

llvm-svn: 142735
```
32f8051d

Make SCEV's brute force analysis stronger in two ways. Firstly, we should be · a6674c7f

Nick Lewycky authored Oct 22, 2011

able to constant fold load instructions where the argument is a constant.
Second, we should be able to watch multiple PHI nodes through the loop; this
patch only supports PHIs in loop headers, more can be done here.

With this patch, we now constant evaluate:
  static const int arr[] = {1, 2, 3, 4, 5};
  int test() {
    int sum = 0;
    for (int i = 0; i < 5; ++i) sum += arr[i];
    return sum;
  }

llvm-svn: 142731

a6674c7f

Move various generated tables into read-only memory, fixing up const correctness along the way. · 0d6d0988
Benjamin Kramer authored Oct 22, 2011
```
llvm-svn: 142726
```
0d6d0988

Fix pr11193. · e649d665

Nadav Rotem authored Oct 22, 2011

SHL inserts zeros from the right, thus even when the original
sign_extend_inreg value was of 1-bit, we need to sra.

llvm-svn: 142724

e649d665

The different flavors of ARM have different valid subsets of registers. Check · 94e6643f

Bill Wendling authored Oct 22, 2011

that the set of callee-saved registers is correct for the specific platform.
<rdar://problem/10313708> & ctor_dtor_count & ctor_dtor_count-2

llvm-svn: 142706

94e6643f

Assembly parsing for 4-register sequential variant of VLD2. · 11c0b347
Jim Grosbach authored Oct 21, 2011
```
llvm-svn: 142704
```
11c0b347
Assembly parsing for 2-register sequential variant of VLD2. · 118b38cb
Jim Grosbach authored Oct 21, 2011
```
llvm-svn: 142691
```
118b38cb

Make sure that the landing pads themselves have no PHI instructions in them. · b1c43088

Bill Wendling authored Oct 21, 2011

The assumption in the back-end is that PHIs are not allowed at the start of the
landing pad block for SjLj exceptions.
<rdar://problem/10313708>

llvm-svn: 142689

b1c43088

Oct 21, 2011

Extend the floating point heuristic to consider NaN checks unlikely. · 606a50a9
Benjamin Kramer authored Oct 21, 2011
```
llvm-svn: 142687
```
606a50a9
Remap blockaddress correctly when inlining a function. Fixes PR10162. · 688db1d6
Eli Friedman authored Oct 21, 2011
```
llvm-svn: 142684
```
688db1d6
Use LLVMBool for a function that logically returns a boolean value. · 500ebeb8
Owen Anderson authored Oct 21, 2011
```
llvm-svn: 142683
```
500ebeb8
Assembly parsing for 4-register variant of VLD1. · 846bcff7
Jim Grosbach authored Oct 21, 2011
```
llvm-svn: 142682
```
846bcff7

BranchProbabilityInfo: floating point equality is unlikely. · 1e731a10

Benjamin Kramer authored Oct 21, 2011

This is from the same paper from Ball and Larus as the rest of the currently implemented heuristics.

llvm-svn: 142677

1e731a10

Assembly parsing for 3-register variant of VLD1. · c4360fe5
Jim Grosbach authored Oct 21, 2011
```
llvm-svn: 142675
```
c4360fe5
STABS symbols are debug symbols. · cef56419
Owen Anderson authored Oct 21, 2011
```
llvm-svn: 142673
```
cef56419
Minor simplification: use ShuffleVectorInst::getMaskValue instead of a more expensive helper. · 303c81c7
Eli Friedman authored Oct 21, 2011
```
llvm-svn: 142672
```
303c81c7

Extend instcombine's shufflevector simplification to handle more cases where... · ce818277

Eli Friedman authored Oct 21, 2011

Extend instcombine's shufflevector simplification to handle more cases where the input and output vectors have different sizes.  Patch by Xiaoyi Guo.

llvm-svn: 142671

ce818277

ARM VLD parsing and encoding. · 2f2e3c47

Jim Grosbach authored Oct 21, 2011

Next step in the ongoing saga of NEON load/store assmebly parsing. Handle
VLD1 instructions that take a two-register register list.

Adjust the instruction definitions to only have the single encoded register
as an operand. The super-register from the pseudo is kept as an implicit def,
so passes which come after pseudo-expansion still know that the instruction
defines the other subregs.

llvm-svn: 142670

2f2e3c47

Don't automatically set the "fc" bits on MSR instructions if the user didn't... · 03a173eb

Owen Anderson authored Oct 21, 2011

Don't automatically set the "fc" bits on MSR instructions if the user didn't ask for them.  This is a divergence from gas' behavior, but it is correct per the documentation and allows us to forge ahead with roundtrip testing.

llvm-svn: 142669

03a173eb

Bind libObject API for obtaining the section containing a Symbol. · 07bfdbb2
Owen Anderson authored Oct 21, 2011
```
llvm-svn: 142667
```
07bfdbb2

Expand the coverage of the libObject C bindings to include more SectionRef... · f239db40

Owen Anderson authored Oct 21, 2011

Expand the coverage of the libObject C bindings to include more SectionRef accessors as well as Symbol iterators.

llvm-svn: 142661

f239db40

Fix pr11194. When promoting and splitting integers we need to use · 5e00bb5f

Nadav Rotem authored Oct 21, 2011

ZExtPromotedInteger and SExtPromotedInteger based on the operation we legalize.

SetCC return type needs to be legalized via PromoteTargetBoolean.

llvm-svn: 142660

5e00bb5f

Nuke an #if0 that got accidentally left in. · e6d88c9a
Jim Grosbach authored Oct 21, 2011
```
llvm-svn: 142658
```
e6d88c9a
whitespace. · 20cb505e
Jim Grosbach authored Oct 21, 2011
```
llvm-svn: 142657
```
20cb505e
Remove some outdated comments. · e3013dd6
Jim Grosbach authored Oct 21, 2011
```
llvm-svn: 142653
```
e3013dd6
1. Fix the widening of SETCC in WidenVecOp_SETCC. Use the correct return CC type. · d315157f
Nadav Rotem authored Oct 21, 2011
```
2. Fix a typo in CONCAT_VECTORS which exposed the bug in #1.

llvm-svn: 142648
```
d315157f
Fix build on mingw-w64. · b27f11e0
Anton Korobeynikov authored Oct 21, 2011
```
Patch by Ruben Van Boxem!

llvm-svn: 142646
```
b27f11e0

Add loop aligning to MachineBlockPlacement based on review discussion so · 8b9737cb

Chandler Carruth authored Oct 21, 2011

it's a bit more plausible to use this instead of CodePlacementOpt. The
code for this was shamelessly stolen from CodePlacementOpt, and then
trimmed down a bit. There doesn't seem to be much utility in returning
true/false from this pass as we may or may not have rewritten all of the
blocks. Also, the statistic of counting how many loops were aligned
doesn't seem terribly important so I removed it. If folks would like it
to be included, I'm happy to add it back.

This was probably the most egregious of the missing features, and now
I'm going to start gathering some performance numbers and looking at
specific loop structures that have different layout between the two.

Test is updated to include both basic loop alignment and nested loop
alignment.

llvm-svn: 142645

8b9737cb

Remove intrinsics for X86 BLSI, BLSMSK, and BLSR intrinsics and replace with... · 039a7906
Craig Topper authored Oct 21, 2011
```
Remove intrinsics for X86 BLSI, BLSMSK, and BLSR intrinsics and replace with custom isel lowering code.

llvm-svn: 142642
```
039a7906

Implement a block placement pass based on the branch probability and · 10281425

Chandler Carruth authored Oct 21, 2011

block frequency analyses. This differs substantially from the existing
block-placement pass in LLVM:

1) It operates on the Machine-IR in the CodeGen layer. This exposes much
   more (and more precise) information and opportunities. Also, the
   results are more stable due to fewer transforms ocurring after the
   pass runs.
2) It uses the generalized probability and frequency analyses. These can
   model static heuristics, code annotation derived heuristics as well
   as eventual profile loading. By basing the optimization on the
   analysis interface it can work from any (or a combination) of these
   inputs.
3) It uses a more aggressive algorithm, both building chains from tho
   bottom up to maximize benefit, and using an SCC-based walk to layout
   chains of blocks in a profitable ordering without O(N^2) iterations
   which the old pass involves.

The pass is currently gated behind a flag, and not enabled by default
because it still needs to grow some important features. Most notably, it
needs to support loop aligning and careful layout of loop structures
much as done by hand currently in CodePlacementOpt. Once it supports
these, and has sufficient testing and quality tuning, it should replace
both of these passes.

Thanks to Nick Lewycky and Richard Smith for help authoring & debugging
this, and to Jakob, Andy, Eric, Jim, and probably a few others I'm
forgetting for reviewing and answering all my questions. Writing
a backend pass is *sooo* much better now than it used to be. =D

llvm-svn: 142641

10281425

Remove a now dead function, fixing -Wunused-function warnings from · 00115378
Chandler Carruth authored Oct 21, 2011
```
Clang.

llvm-svn: 142631
```
00115378