Commits · 4f56720754c4ce64b37c62c0c935b4993adcb15e · Roger Ferrer / llvm-epi-0.8

Nov 27, 2011

Prevent rotating the blocks of a loop (and thus getting a backedge to be · 4f567207

Chandler Carruth authored Nov 27, 2011

fallthrough) in cases where we might fail to rotate an exit to an outer
loop onto the end of the loop chain.

Having *some* rotation, but not performing this rotation, is the primary
fix of thep performance regression with -enable-block-placement for
Olden/em3d (a whopping 30% regression). Still working on reducing the
test case that actually exercises this and the new rotation strategy out
of this code, but I want to check if this regresses other test cases
first as that may indicate it isn't the correct fix.

llvm-svn: 145195

4f567207

Take two on rotating the block ordering of loops. My previous attempt · 03adbd46

Chandler Carruth authored Nov 27, 2011

was centered around the premise of laying out a loop in a chain, and
then rotating that chain. This is good for preserving contiguous layout,
but bad for actually making sane rotations. In order to keep it safe,
I had to essentially make it impossible to rotate deeply nested loops.
The information needed to correctly reason about a deeply nested loop is
actually available -- *before* we layout the loop. We know the inner
loops are already fused into chains, etc. We lose information the moment
we actually lay out the loop.

The solution was the other alternative for this algorithm I discussed
with Benjamin and some others: rather than rotating the loop
after-the-fact, try to pick a profitable starting block for the loop's
layout, and then use our existing layout logic. I was worried about the
complexity of this "pick" step, but it turns out such complexity is
needed to handle all the important cases I keep teasing out of benchmarks.

This is, I'm afraid, a bit of a work-in-progress. It is still
misbehaving on some likely important cases I'm investigating in Olden.
It also isn't really tested. I'm going to try to craft some interesting
nested-loop test cases, but it's likely to be extremely time consuming
and I don't want to go there until I'm sure I'm testing the correct
behavior. Sadly I can't come up with a way of getting simple, fine
grained test cases for this logic. We need complex loop structures to
even trigger much of it.

llvm-svn: 145183

03adbd46

Revert r145180 as it is causing test failures on all the bots. · 37ab257b

Chandler Carruth authored Nov 27, 2011

Original commit message:
Fixed ObjectFile functions:
- getSymbolOffset() renamed as getSymbolFileOffset()
- getSymbolFileOffset(), getSymbolAddress(), getRelocationAddress() returns same result for ELFObjectFile, MachOObjectFile and COFFObjectFile.
- added getRelocationOffset()
- fixed MachOObjectFile::getSymbolSize()
- fixed MachOObjectFile::getSymbolSection()
- fixed MachOObjectFile::getSymbolOffset() for symbols without section data.

llvm-svn: 145182

37ab257b

Fix an impressive type-o / spell-o Duncan noticed. · 9e466841
Chandler Carruth authored Nov 27, 2011
```
llvm-svn: 145181
```
9e466841

Fixed ObjectFile functions: · 2631f93f

Danil Malyshev authored Nov 27, 2011

- getSymbolOffset() renamed as getSymbolFileOffset()
- getSymbolFileOffset(), getSymbolAddress(), getRelocationAddress() returns same result for ELFObjectFile, MachOObjectFile and COFFObjectFile.
- added getRelocationOffset()
- fixed MachOObjectFile::getSymbolSize()
- fixed MachOObjectFile::getSymbolSection()
- fixed MachOObjectFile::getSymbolOffset() for symbols without section data.

llvm-svn: 145180

2631f93f

Rework a bit of the implementation of loop block rotation to not rely so · a0545809

Chandler Carruth authored Nov 27, 2011

heavily on AnalyzeBranch. That routine doesn't behave as we want given
that rotation occurs mid-way through re-ordering the function. Instead
merely check that there are not unanalyzable branching constructs
present, and then reason about the CFG via successor lists. This
actually simplifies my mental model for all of this as well.

The concrete result is that we now will rotate more loop chains. I've
added a test case from Olden highlighting the effect. There is still
a bit more to do here though in order to regain all of the performance
in Olden.

llvm-svn: 145179

a0545809

Eli managed to kill off llvm.membarrier in llvm 3.0 also, this means · 0bcbde46
Chris Lattner authored Nov 27, 2011
```
that mainline needs no autoupgrade logic for intrinsics yet, woohoo!

llvm-svn: 145178
```
0bcbde46
The llvm.atomic intrinsics *were* removed in LLVM 3.0 (in r141333), remove the · 410f3d7f
Chris Lattner authored Nov 27, 2011
```
autoupgrade logic for 2.9 and before.

llvm-svn: 145176
```
410f3d7f

remove autoupgrade support for old forms of llvm.prefetch and the old · ee471c48

Chris Lattner authored Nov 27, 2011

trampoline forms.  Both of these were correct in LLVM 3.0, and we don't
need to support LLVM 2.9 and earlier in mainline.

llvm-svn: 145174

ee471c48

remove asmparsing and documentation support for "volatile load", which was... · bc639298

Chris Lattner authored Nov 27, 2011

remove asmparsing and documentation support for "volatile load", which was only produced by LLVM 2.9 and earlier.  LLVM 3.0 and later prefers "load volatile".

llvm-svn: 145172

bc639298

remove autoupgrade support for really old-style debug info intrinsics. · 90ef78c0

Chris Lattner authored Nov 27, 2011

I think this is the last of autoupgrade that can be removed in 3.1.
Can the atomic upgrade stuff also go?

llvm-svn: 145169

90ef78c0

remove some old autoupgrade logic · 6aa6c0c3
Chris Lattner authored Nov 27, 2011
```
llvm-svn: 145167
```
6aa6c0c3
remove autoupgrade support for LLVM 2.9 exception stuff. Mainline supports · db891539
Chris Lattner authored Nov 27, 2011
```
LLVM 3.0 and later.

llvm-svn: 145165
```
db891539
remove support for reading llvm 2.9 .bc files. LLVM 3.1 is only compatible back to 3.0 · 1c9e5678
Chris Lattner authored Nov 27, 2011
```
llvm-svn: 145164
```
1c9e5678
Add several new instructions supported by the latest MicroBlaze. · 97b3da54
Wesley Peck authored Nov 27, 2011
```
These instructions are not generated by the backend yet, this will come in a later commit.

llvm-svn: 145161
```
97b3da54
Optimize comparison against 0 in conditional instructions. · d2e2e178
Wesley Peck authored Nov 27, 2011
```
Fix a couple of 80-column violations.

llvm-svn: 145159
```
d2e2e178

Introduce a loop block rotation optimization to the new block placement · 9ffb97e6

Chandler Carruth authored Nov 27, 2011

pass. This is designed to achieve one of the important optimizations
that the old code placement pass did, but more simply.

This is a somewhat rough and *very* conservative version of the
transform. We could get a lot fancier here if there are profitable cases
to do so. In particular, this only looks for a single pattern, it
insists that the loop backedge being rotated away is the last backedge
in the chain, and it doesn't provide any means of doing better in-loop
placement due to the rotation. However, it appears that it will handle
the important loops I am finding in the LLVM test suite.

llvm-svn: 145158

9ffb97e6

Move code into anonymous namespaces. · 7ba71be3
Benjamin Kramer authored Nov 26, 2011
```
llvm-svn: 145154
```
7ba71be3

Nov 26, 2011

Merge 128-bit and 256-bit X86ISD node types for VPERMILPS and VPERMILPD.... · 51280d56

Craig Topper authored Nov 26, 2011

Merge 128-bit and 256-bit X86ISD node types for VPERMILPS and VPERMILPD. Simplify some shuffle lowering code since V1 can never be UNDEF due to canonalizing that occurs when shuffle nodes are created.

llvm-svn: 145153

51280d56

Rename a couple of options and fix some simple typos. · 69d50404
Wesley Peck authored Nov 26, 2011
```
llvm-svn: 145152
```
69d50404

Collapse X86ISD node types for PUNPCKH*, PUNPCKL*, UNPCKLP*, and UNPCKHP* to... · 7704bd7a

Craig Topper authored Nov 26, 2011

Collapse X86ISD node types for PUNPCKH*, PUNPCKL*, UNPCKLP*, and UNPCKHP* to not be type specific. Now we just have integer high and low and floating point high and low. Pattern matching will choose the correct instruction based on the vector type.

llvm-svn: 145148

7704bd7a

Fix APFloat::convert so that it handles narrowing conversions correctly; it · a84ad7d0

Eli Friedman authored Nov 26, 2011

was returning incorrect values in rare cases, and incorrectly marking
exact conversions as inexact in some more common cases. Fixes PR11406, and a
missed optimization in test/CodeGen/X86/fp-stack-O0.ll.

llvm-svn: 145141

a84ad7d0

Nov 25, 2011
- This patch contains support for encoding FMA4 instructions and · 0f9a1f5e
  Bruno Cardoso Lopes authored Nov 25, 2011
```
tablegen patterns for scalar FMA4 operations and intrinsic. Also
add tests for vfmaddsd.

Patch by Jan Sjodin

llvm-svn: 145133
```
  0f9a1f5e
- ARMLoadStoreOptimizer.cpp: Fix MSVC(Debug) build. · 989eaf6e
  NAKAMURA Takumi authored Nov 25, 2011
```
llvm-svn: 145129
```
  989eaf6e
Nov 24, 2011

Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit... · d65a4444

Craig Topper authored Nov 24, 2011

Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64.

llvm-svn: 145126

d65a4444

Remove AVX2 specific X86ISD node types for PUNPCKH/L and instead just reuse... · d2646674

Craig Topper authored Nov 24, 2011

Remove AVX2 specific X86ISD node types for PUNPCKH/L and instead just reuse the 128-bit versions and let the vector type distinguish.

llvm-svn: 145125

d2646674

Devirtualize Pass::getPassID, overriding it isn't useful and it gets called a lot. · 8a2d1436
Benjamin Kramer authored Nov 24, 2011
```
While at it pull the trivial ctor in line.

llvm-svn: 145124
```
8a2d1436
Make ConstantRange::truncate a bit more efficient. · 6709e050
Benjamin Kramer authored Nov 24, 2011
```
llvm-svn: 145122
```
6709e050
X86: alias cqo to cqto. · 651db373
Benjamin Kramer authored Nov 24, 2011
```
llvm-svn: 145121
```
651db373

Fix a silly use-after-free issue. A much earlier version of this code · 7adee1a0

Chandler Carruth authored Nov 24, 2011

need lots of fanciness around retaining a reference to a Chain's slot in
the BlockToChain map, but that's all gone now. We can just go directly
to allocating the new chain (which will update the mapping for us) and
using it.

Somewhat gross mechanically generated test case replicates the issue
Duncan spotted when actually testing this out.

llvm-svn: 145120

7adee1a0

When adding blocks to the list of those which no longer have any CFG · d394bafd

Chandler Carruth authored Nov 24, 2011

conflicts, we should only be adding the first block of the chain to the
list, lest we try to merge into the middle of that chain. Most of the
places we were doing this we already happened to be looking at the first
block, but there is no reason to assume that, and in some cases it was
clearly wrong.

I've added a couple of tests here. One already worked, but I like having
an explicit test for it. The other is reduced from a test case Duncan
reduced for me and used to crash. Now it is handled correctly.

llvm-svn: 145119

d394bafd

Nov 23, 2011

This patch makes the following changes necessary for MIPS' direct code emission. · 049e9e4d

Akira Hatanaka authored Nov 23, 2011

- lower unaligned loads/stores.
- encode the size operand of instructions INS and EXT.
- emit relocation information needed for JAL (jump-and-link).  

llvm-svn: 145113

049e9e4d

This patch addresses gp relative fixups/relocations for jump tables. · f5ddf13f
Akira Hatanaka authored Nov 23, 2011
```
llvm-svn: 145112
```
f5ddf13f
Correctly byte-swap APInts with bit-widths greater than 64. · 4f9a8081
Richard Smith authored Nov 23, 2011
```
llvm-svn: 145111
```
4f9a8081
Validate the return type when checking if a function is malloc. · 6e013bf9
Benjamin Kramer authored Nov 23, 2011
```
Fixes PR11426. Not sure if a test case with a "wrong" malloc would be useful.

llvm-svn: 145106
```
6e013bf9

Fix a crash in which a multiplication was being reported as being both negative · 81a2af12

Duncan Sands authored Nov 23, 2011

and positive: positive, because it could be directly computed to be positive;
negative, because the nsw flags means it is either negative or undefined (the
multiplication always overflowed).

llvm-svn: 145104

81a2af12

X86: Use btq for bit tests if the immediate can't be encoded in 32 bits. · ebcb4518

Benjamin Kramer authored Nov 23, 2011

Before:
	movabsq	$4294967296, %rax       ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00]
	testq	%rax, %rdi              ## encoding: [0x48,0x85,0xf8]
	jne	LBB0_2                  ## encoding: [0x75,A]

After:
	btq	$32, %rdi               ## encoding: [0x48,0x0f,0xba,0xe7,0x20]
	jb	LBB0_2                  ## encoding: [0x72,A]

btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off
saving one register and a giant movabsq.

llvm-svn: 145103

ebcb4518

Relax an invariant that block placement was trying to assert a bit · 99fe42fb

Chandler Carruth authored Nov 23, 2011

further. This invariant just wasn't going to work in the face of
unanalyzable branches; we need to be resillient to the phenomenon of
chains poking into a loop and poking out of a loop. In fact, we already
were, we just needed to not assert on it.

This was found during a bootstrap with block placement turned on.

llvm-svn: 145100

99fe42fb

I added several lines in X86 code generator that allow to choose · 779ba6d7

Elena Demikhovsky authored Nov 23, 2011

VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask.

The patch was reviewed by Bruno.

llvm-svn: 145099

779ba6d7

Handle the case of a no-return invoke correctly. It actually still has · 8c68f1f3

Chandler Carruth authored Nov 23, 2011

successors, they just are all landing pad successors. We handle this the
same way as no successors. Comments attached for the next person to wade
through here and another lovely test case courtesy of Benjamin Kramer's
bugpoint reduction.

llvm-svn: 145098

8c68f1f3