Commits · 7276397f41c1f8867cdbebf6a6b062809989faee · Roger Ferrer / llvm-epi-0.8

Dec 02, 2011

make sure ScheduleDAGInstrs::EmitSchedule does not crash when the first... · 42018202

Hal Finkel authored Dec 02, 2011

make sure ScheduleDAGInstrs::EmitSchedule does not crash when the first instruction in Sequence is a Noop

llvm-svn: 145677

42018202

Dec 01, 2011
- CodeGen: fix CMake build · c19f0b73
  Dylan Noblesmith authored Dec 01, 2011
```
Missing file from r145629.

llvm-svn: 145634
```
  c19f0b73
- Add a deterministic finite automaton based packetizer for VLIW architectures · 08ebdc1e
  Anshuman Dasgupta authored Dec 01, 2011
```
llvm-svn: 145629
```
  08ebdc1e
Nov 29, 2011
- If fast-isel fails, remove dead instructions generated during the failed · 46addb9e
  Chad Rosier authored Nov 29, 2011
```
attempt.  

llvm-svn: 145425
```
  46addb9e
- build/CMake: Finish removal of add_llvm_library_dependencies. · 539d0a8a
  Daniel Dunbar authored Nov 29, 2011
```
llvm-svn: 145420
```
  539d0a8a
- On MachO, the pointer to the personality function should always be in the · e4cc3327
  Bill Wendling authored Nov 29, 2011
```
non_lazy_symbol_pointers section (__IMPORT,__pointers). Ignore the 'hidden' part
since that will place it in the wrong section.
<rdar://problem/10443720>

llvm-svn: 145356
```
  e4cc3327
Nov 28, 2011

Make SelectionDAG::InferPtrAlignment use llvm::ComputeMaskedBits instead of... · e7ab1a2f

Eli Friedman authored Nov 28, 2011

Make SelectionDAG::InferPtrAlignment use llvm::ComputeMaskedBits instead of duplicating the logic for globals.  Make llvm::ComputeMaskedBits handle GlobalVariables slightly more aggressively, to match what InferPtrAlignment knew how to do.

llvm-svn: 145304

e7ab1a2f

Revert r145273 and fix in SelectionDAG::InferPtrAlignment() instead. · 4a5b2040

Evan Cheng authored Nov 28, 2011

Conservatively returns zero when the GV does not specify an alignment nor is it
initialized. Previously it returns ABI alignment for type of the GV. However, if
the type is a "packed" type, then the under-specified alignments is attached to
the load / store instructions. In that case, the alignment of the type cannot be
trusted.
rdar://10464621

llvm-svn: 145300

4a5b2040

DAG combine should not increase alignment of loads / stores with alignment less · a4b6404c

Evan Cheng authored Nov 28, 2011

than ABI alignment. These are loads / stores from / to "packed" data structures.
Their alignments are intentionally under-specified.

rdar://10301431

llvm-svn: 145273

a4b6404c

80-column. · 61e8d102
Chad Rosier authored Nov 28, 2011
```
llvm-svn: 145267
```
61e8d102
Remove dead llvm.eh.sjlj.dispatchsetup intrinsic. · 5ebc95ff
Bill Wendling authored Nov 28, 2011
```
llvm-svn: 145263
```
5ebc95ff

Nov 27, 2011

Prevent rotating the blocks of a loop (and thus getting a backedge to be · 4f567207

Chandler Carruth authored Nov 27, 2011

fallthrough) in cases where we might fail to rotate an exit to an outer
loop onto the end of the loop chain.

Having *some* rotation, but not performing this rotation, is the primary
fix of thep performance regression with -enable-block-placement for
Olden/em3d (a whopping 30% regression). Still working on reducing the
test case that actually exercises this and the new rotation strategy out
of this code, but I want to check if this regresses other test cases
first as that may indicate it isn't the correct fix.

llvm-svn: 145195

4f567207

Take two on rotating the block ordering of loops. My previous attempt · 03adbd46

Chandler Carruth authored Nov 27, 2011

was centered around the premise of laying out a loop in a chain, and
then rotating that chain. This is good for preserving contiguous layout,
but bad for actually making sane rotations. In order to keep it safe,
I had to essentially make it impossible to rotate deeply nested loops.
The information needed to correctly reason about a deeply nested loop is
actually available -- *before* we layout the loop. We know the inner
loops are already fused into chains, etc. We lose information the moment
we actually lay out the loop.

The solution was the other alternative for this algorithm I discussed
with Benjamin and some others: rather than rotating the loop
after-the-fact, try to pick a profitable starting block for the loop's
layout, and then use our existing layout logic. I was worried about the
complexity of this "pick" step, but it turns out such complexity is
needed to handle all the important cases I keep teasing out of benchmarks.

This is, I'm afraid, a bit of a work-in-progress. It is still
misbehaving on some likely important cases I'm investigating in Olden.
It also isn't really tested. I'm going to try to craft some interesting
nested-loop test cases, but it's likely to be extremely time consuming
and I don't want to go there until I'm sure I'm testing the correct
behavior. Sadly I can't come up with a way of getting simple, fine
grained test cases for this logic. We need complex loop structures to
even trigger much of it.

llvm-svn: 145183

03adbd46

Fix an impressive type-o / spell-o Duncan noticed. · 9e466841
Chandler Carruth authored Nov 27, 2011
```
llvm-svn: 145181
```
9e466841

Rework a bit of the implementation of loop block rotation to not rely so · a0545809

Chandler Carruth authored Nov 27, 2011

heavily on AnalyzeBranch. That routine doesn't behave as we want given
that rotation occurs mid-way through re-ordering the function. Instead
merely check that there are not unanalyzable branching constructs
present, and then reason about the CFG via successor lists. This
actually simplifies my mental model for all of this as well.

The concrete result is that we now will rotate more loop chains. I've
added a test case from Olden highlighting the effect. There is still
a bit more to do here though in order to regain all of the performance
in Olden.

llvm-svn: 145179

a0545809

Introduce a loop block rotation optimization to the new block placement · 9ffb97e6

Chandler Carruth authored Nov 27, 2011

pass. This is designed to achieve one of the important optimizations
that the old code placement pass did, but more simply.

This is a somewhat rough and *very* conservative version of the
transform. We could get a lot fancier here if there are profitable cases
to do so. In particular, this only looks for a single pattern, it
insists that the loop backedge being rotated away is the last backedge
in the chain, and it doesn't provide any means of doing better in-loop
placement due to the rotation. However, it appears that it will handle
the important loops I am finding in the LLVM test suite.

llvm-svn: 145158

9ffb97e6

Move code into anonymous namespaces. · 7ba71be3
Benjamin Kramer authored Nov 26, 2011
```
llvm-svn: 145154
```
7ba71be3

Nov 24, 2011

Fix a silly use-after-free issue. A much earlier version of this code · 7adee1a0

Chandler Carruth authored Nov 24, 2011

need lots of fanciness around retaining a reference to a Chain's slot in
the BlockToChain map, but that's all gone now. We can just go directly
to allocating the new chain (which will update the mapping for us) and
using it.

Somewhat gross mechanically generated test case replicates the issue
Duncan spotted when actually testing this out.

llvm-svn: 145120

7adee1a0

When adding blocks to the list of those which no longer have any CFG · d394bafd

Chandler Carruth authored Nov 24, 2011

conflicts, we should only be adding the first block of the chain to the
list, lest we try to merge into the middle of that chain. Most of the
places we were doing this we already happened to be looking at the first
block, but there is no reason to assume that, and in some cases it was
clearly wrong.

I've added a couple of tests here. One already worked, but I like having
an explicit test for it. The other is reduced from a test case Duncan
reduced for me and used to crash. Now it is handled correctly.

llvm-svn: 145119

d394bafd

Nov 23, 2011

Relax an invariant that block placement was trying to assert a bit · 99fe42fb

Chandler Carruth authored Nov 23, 2011

further. This invariant just wasn't going to work in the face of
unanalyzable branches; we need to be resillient to the phenomenon of
chains poking into a loop and poking out of a loop. In fact, we already
were, we just needed to not assert on it.

This was found during a bootstrap with block placement turned on.

llvm-svn: 145100

99fe42fb

Handle the case of a no-return invoke correctly. It actually still has · 8c68f1f3

Chandler Carruth authored Nov 23, 2011

successors, they just are all landing pad successors. We handle this the
same way as no successors. Comments attached for the next person to wade
through here and another lovely test case courtesy of Benjamin Kramer's
bugpoint reduction.

llvm-svn: 145098

8c68f1f3

Enable stack protectors for all arrays, not just char arrays. rdar://5875909 · ebb44646
Bob Wilson authored Nov 23, 2011
```
Patch by Bill Wendling.

llvm-svn: 145097
```
ebb44646

Fix PR11422. · 02845410

Jakob Stoklund Olesen authored Nov 23, 2011

This was a bug in keeping track of the available domains when merging
domain values.

The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr
to the integer domain which is only available in AVX2.

Also add an assertion to catch future attempts at emitting AVX2
instructions.

llvm-svn: 145096

02845410

Fix a crash in block placement due to an inner loop that happened to be · 4a87aa0c

Chandler Carruth authored Nov 23, 2011

reversed in the function's original ordering, and we happened to
encounter it while handling an outer unnatural CFG structure.

Thanks to the test case reduced from GCC's source by Benjamin Kramer.
This may also fix a crasher in gzip that Duncan reduced for me, but
I haven't yet gotten to testing that one.

llvm-svn: 145094

4a87aa0c

Nov 22, 2011

Fix a devilish miscompile exposed by block placement. The · ee54feb6

Chandler Carruth authored Nov 22, 2011

updateTerminator code didn't correctly handle EH terminators in one very
specific case. AnalyzeBranch would find no terminator instruction, and
so the fallback in updateTerminator is to assume fallthrough. This is
correct, but the destination of the fallthrough was assumed to be the
first successor.

This is *almost always* true, but in certain cases the loop
transformations will cause the landing pad to be the first successor!
Instead of this brittle logic, actually look through the successors for
a non-landing-pad accessor, and to assert if more than one is found.

This will hopefully fix some (if not all) of the self host miscompiles
with block placement. Thanks to Benjamin Kramer for reporting, Nick
Lewycky for an initial stab at a reduction, and Duncan for endless
advice on EH (which I know nothing about) as well as reviewing the
actual fix.

llvm-svn: 145062

ee54feb6

Fix an obvious omission in the SelectionDAGBuilder where we were · e2530dc8

Chandler Carruth authored Nov 22, 2011

dropping weights on the floor for invokes. This was impeding my writing
further test cases for invoke when interacting with probabilities and
block placement.

No test case as there doesn't appear to be a way to test this stuff. =/
Suggestions for a test case of course welcome. I hope to be able to add
test cases that indirectly cover this eventually by adding probabilities
to the exceptional edge and reordering blocks as a result.

llvm-svn: 145060

e2530dc8

If a register is both an early clobber and part of a tied use, handle the use · 2021f382
Rafael Espindola authored Nov 22, 2011
```
before the clobber so that we copy the value if needed.

Fixes pr11415.

llvm-svn: 145056
```
2021f382

Nov 20, 2011

The logic for breaking the CFG in the presence of hot successors didn't · 18dfac38

Chandler Carruth authored Nov 20, 2011

properly account for the *global* probability of the edge being taken.
This manifested as a very large number of unconditional branches to
blocks being merged against the CFG even though they weren't
particularly hot within the CFG.

The fix is to check whether the edge being merged is both locally hot
relative to other successors for the source block, and globally hot
compared to other (unmerged) predecessors of the destination block.

This introduces a new crasher on GCC single-source, but it's currently
behind a flag, and Ben has offered to work on the reduction. =]

llvm-svn: 145010

18dfac38

Nov 19, 2011

Move the handling of unanalyzable branches out of the loop-driven chain · f3dc9eff

Chandler Carruth authored Nov 19, 2011

formation phase and into the initial walk of the basic blocks. We
essentially pre-merge all blocks where unanalyzable fallthrough exists,
as we won't be able to update the terminators effectively after any
reorderings. This is quite a bit more principled as there may be CFGs
where the second half of the unanalyzable pair has some analyzable
predecessor that gets placed first. Then it may get placed next,
implicitly breaking the unanalyzable branch even though we never even
looked at the part that isn't analyzable. I've included a test case that
triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
some more general ones as I dig into related issues.

Also, to make this new scheme work we have to be able to handle branches
into the middle of a chain, so add this check. We always fallback on the
incoming ordering.

Finally, this starts to really underscore a known limitation of the
current implementation -- we don't consider broken predecessors when
merging successors. This can caused major missed opportunities, and is
something I'm planning on looking at next (modulo more bug reports).

llvm-svn: 144994

f3dc9eff

Nov 18, 2011

DISubrange supports unsigned lower/upper array bounds, so let's not fake it in... · 107e8ec3

Devang Patel authored Nov 17, 2011

DISubrange supports unsigned lower/upper array bounds, so let's not fake it in the end while emitting DWARF. If a FE needs to encode signed lower/upper array bounds then we need to extend DISubrange or ad DISignedSubrange. 

llvm-svn: 144937

107e8ec3

Nov 17, 2011

When fast iseling a GEP, accumulate the offset rather than emitting a series of · f83ab704

Chad Rosier authored Nov 17, 2011

ADDs.  MaxOffs is used as a threshold to limit the size of the offset. Tradeoffs
being: (1) If we can't materialize the large constant then we'll cause fast-isel
to bail. (2) Too large of an offset can't be directly encoded in the ADD
resulting in a MOV+ADD.  Generally not a bad thing because otherwise we would
have had ADD+ADD, but on Thumb this turns into a MOVS+MOVT+ADD. Working on a fix
for that. (3) Conversely, too low of a threshold we'll miss opportunities to 
coalesce ADDs.
rdar://10412592

llvm-svn: 144886

f83ab704

Make sure to replace the chain properly when DAGCombining a... · ff1eaa75

Eli Friedman authored Nov 16, 2011

Make sure to replace the chain properly when DAGCombining a LOAD+EXTRACT_VECTOR_ELT into a single LOAD.  Fixes PR10747/PR11393.

llvm-svn: 144863

ff1eaa75

Nov 16, 2011

Add fast-isel stats to determine who's doing all the work, the · ff40b1e1
Chad Rosier authored Nov 16, 2011
```
target-independent selector or the target-specific selector.

llvm-svn: 144833
```
ff40b1e1

Fix the stats collection for fast-isel. The failed count was only accounting · cfd0d10e

Chad Rosier authored Nov 16, 2011

for a single miss and not all predecessor instructions that get selected by
the selection DAG instruction selector.  This is still not exact (e.g., over
states misses when folded/dead instructions are present), but it is a step in
the right direction.

llvm-svn: 144832

cfd0d10e

Disable expensive two-address optimizations at -O0. rdar://10453055 · 822ddde5
Evan Cheng authored Nov 16, 2011
```
llvm-svn: 144806
```
822ddde5
Disable the assertion again. Looks like fastisel is still generating bad kill markers. · 624eb2af
Evan Cheng authored Nov 16, 2011
```
llvm-svn: 144804
```
624eb2af

Sink codegen optimization level into MCCodeGenInfo along side relocation model · ecb2908b

Evan Cheng authored Nov 16, 2011

and code model. This eliminates the need to pass OptLevel flag all over the
place and makes it possible for any codegen pass to use this information.

llvm-svn: 144788

ecb2908b

Record landing pads with a SmallSetVector to avoid multiple entries. · cca9aa58

Bob Wilson authored Nov 16, 2011

There may be many invokes that share one landing pad, and the previous code
would record the landing pad once for each invoke.  Besides the wasted
effort, a pair of volatile loads gets inserted every time the landing pad is
processed.  The rest of the code can get optimized away when a landing pad
is processed repeatedly, but the volatile loads remain, resulting in code like:

LBB35_18:
Ltmp483:
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r4, [r7, #-72]
        ldr     r2, [r7, #-68]

llvm-svn: 144787

cca9aa58

Update the SP in the SjLj jmpbuf whenever it changes. <rdar://problem/10444602 > · 643e63c4

Bob Wilson authored Nov 16, 2011

This same basic code was in the older version of the SjLj exception handling,
but it was removed in the recent revisions to that code.  It needs to be there.

llvm-svn: 144782

643e63c4

Revert r144568 now that r144730 has fixed the fast-isel kill marker bug. · 4ac36c8e
Evan Cheng authored Nov 16, 2011
```
llvm-svn: 144776
```
4ac36c8e