Commits · 6a144a2227f8aaf6d4c49bc22f5424a34b778166 · Roger Ferrer / llvm-epi-0.8

Nov 27, 2011

Upgrade syntax of tests using volatile instructions to use 'load volatile'... · 6a144a22

Chris Lattner authored Nov 27, 2011

Upgrade syntax of tests using volatile instructions to use 'load volatile' instead of 'volatile load', which is archaic.

llvm-svn: 145171

6a144a22

remove autoupgrade support for really old-style debug info intrinsics. · 90ef78c0

Chris Lattner authored Nov 27, 2011

I think this is the last of autoupgrade that can be removed in 3.1.
Can the atomic upgrade stuff also go?

llvm-svn: 145169

90ef78c0

remove some old autoupgrade logic · 6aa6c0c3
Chris Lattner authored Nov 27, 2011
```
llvm-svn: 145167
```
6aa6c0c3
remove support for reading llvm 2.9 .bc files. LLVM 3.1 is only compatible back to 3.0 · 1c9e5678
Chris Lattner authored Nov 27, 2011
```
llvm-svn: 145164
```
1c9e5678
Add several new instructions supported by the latest MicroBlaze. · 97b3da54
Wesley Peck authored Nov 27, 2011
```
These instructions are not generated by the backend yet, this will come in a later commit.

llvm-svn: 145161
```
97b3da54

Introduce a loop block rotation optimization to the new block placement · 9ffb97e6

Chandler Carruth authored Nov 27, 2011

pass. This is designed to achieve one of the important optimizations
that the old code placement pass did, but more simply.

This is a somewhat rough and *very* conservative version of the
transform. We could get a lot fancier here if there are profitable cases
to do so. In particular, this only looks for a single pattern, it
insists that the loop backedge being rotated away is the last backedge
in the chain, and it doesn't provide any means of doing better in-loop
placement due to the rotation. However, it appears that it will handle
the important loops I am finding in the LLVM test suite.

llvm-svn: 145158

9ffb97e6

Nov 26, 2011

FileCheck-ize this test and make it more precise. This is in preparation · f156f0cf
Chandler Carruth authored Nov 26, 2011
```
for adding other tests.

llvm-svn: 145143
```
f156f0cf

Fix APFloat::convert so that it handles narrowing conversions correctly; it · a84ad7d0

Eli Friedman authored Nov 26, 2011

was returning incorrect values in rare cases, and incorrectly marking
exact conversions as inexact in some more common cases. Fixes PR11406, and a
missed optimization in test/CodeGen/X86/fp-stack-O0.ll.

llvm-svn: 145141

a84ad7d0

Nov 25, 2011

This patch contains support for encoding FMA4 instructions and · 0f9a1f5e

Bruno Cardoso Lopes authored Nov 25, 2011

tablegen patterns for scalar FMA4 operations and intrinsic. Also
add tests for vfmaddsd.

Patch by Jan Sjodin

llvm-svn: 145133

0f9a1f5e

Nov 24, 2011

Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit... · d65a4444

Craig Topper authored Nov 24, 2011

Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64.

llvm-svn: 145126

d65a4444

X86: alias cqo to cqto. · 651db373
Benjamin Kramer authored Nov 24, 2011
```
llvm-svn: 145121
```
651db373

Fix a silly use-after-free issue. A much earlier version of this code · 7adee1a0

Chandler Carruth authored Nov 24, 2011

need lots of fanciness around retaining a reference to a Chain's slot in
the BlockToChain map, but that's all gone now. We can just go directly
to allocating the new chain (which will update the mapping for us) and
using it.

Somewhat gross mechanically generated test case replicates the issue
Duncan spotted when actually testing this out.

llvm-svn: 145120

7adee1a0

When adding blocks to the list of those which no longer have any CFG · d394bafd

Chandler Carruth authored Nov 24, 2011

conflicts, we should only be adding the first block of the chain to the
list, lest we try to merge into the middle of that chain. Most of the
places we were doing this we already happened to be looking at the first
block, but there is no reason to assume that, and in some cases it was
clearly wrong.

I've added a couple of tests here. One already worked, but I like having
an explicit test for it. The other is reduced from a test case Duncan
reduced for me and used to crash. Now it is handled correctly.

llvm-svn: 145119

d394bafd

Nov 23, 2011

Correctly byte-swap APInts with bit-widths greater than 64. · 4f9a8081
Richard Smith authored Nov 23, 2011
```
llvm-svn: 145111
```
4f9a8081

Fix a crash in which a multiplication was being reported as being both negative · 81a2af12

Duncan Sands authored Nov 23, 2011

and positive: positive, because it could be directly computed to be positive;
negative, because the nsw flags means it is either negative or undefined (the
multiplication always overflowed).

llvm-svn: 145104

81a2af12

X86: Use btq for bit tests if the immediate can't be encoded in 32 bits. · ebcb4518

Benjamin Kramer authored Nov 23, 2011

Before:
	movabsq	$4294967296, %rax       ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00]
	testq	%rax, %rdi              ## encoding: [0x48,0x85,0xf8]
	jne	LBB0_2                  ## encoding: [0x75,A]

After:
	btq	$32, %rdi               ## encoding: [0x48,0x0f,0xba,0xe7,0x20]
	jb	LBB0_2                  ## encoding: [0x72,A]

btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off
saving one register and a giant movabsq.

llvm-svn: 145103

ebcb4518

test/CodeGen/X86/block-placement.ll: Add explicit -mtriple=i686-linux. X86... · 0b3e9964

NAKAMURA Takumi authored Nov 23, 2011

test/CodeGen/X86/block-placement.ll: Add explicit -mtriple=i686-linux. X86 Win32 CodeGen does not support EH yet.

llvm-svn: 145101

0b3e9964

Relax an invariant that block placement was trying to assert a bit · 99fe42fb

Chandler Carruth authored Nov 23, 2011

further. This invariant just wasn't going to work in the face of
unanalyzable branches; we need to be resillient to the phenomenon of
chains poking into a loop and poking out of a loop. In fact, we already
were, we just needed to not assert on it.

This was found during a bootstrap with block placement turned on.

llvm-svn: 145100

99fe42fb

I added several lines in X86 code generator that allow to choose · 779ba6d7

Elena Demikhovsky authored Nov 23, 2011

VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask.

The patch was reviewed by Bruno.

llvm-svn: 145099

779ba6d7

Handle the case of a no-return invoke correctly. It actually still has · 8c68f1f3

Chandler Carruth authored Nov 23, 2011

successors, they just are all landing pad successors. We handle this the
same way as no successors. Comments attached for the next person to wade
through here and another lovely test case courtesy of Benjamin Kramer's
bugpoint reduction.

llvm-svn: 145098

8c68f1f3

Enable stack protectors for all arrays, not just char arrays. rdar://5875909 · ebb44646
Bob Wilson authored Nov 23, 2011
```
Patch by Bill Wendling.

llvm-svn: 145097
```
ebb44646

Fix PR11422. · 02845410

Jakob Stoklund Olesen authored Nov 23, 2011

This was a bug in keeping track of the available domains when merging
domain values.

The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr
to the integer domain which is only available in AVX2.

Also add an assertion to catch future attempts at emitting AVX2
instructions.

llvm-svn: 145096

02845410

Fix a crash in block placement due to an inner loop that happened to be · 4a87aa0c

Chandler Carruth authored Nov 23, 2011

reversed in the function's original ordering, and we happened to
encounter it while handling an outer unnatural CFG structure.

Thanks to the test case reduced from GCC's source by Benjamin Kramer.
This may also fix a crasher in gzip that Duncan reduced for me, but
I haven't yet gotten to testing that one.

llvm-svn: 145094

4a87aa0c

[asan] do not instrument threadlocal globals, this is buggy · 8b5c7a56
Kostya Serebryany authored Nov 23, 2011
```
llvm-svn: 145092
```
8b5c7a56

Nov 22, 2011

add basic PPC register-pressure feedback; adjust the vaarg test to match the... · 6f0ae783
Hal Finkel authored Nov 22, 2011
```
add basic PPC register-pressure feedback; adjust the vaarg test to match the new register-allocation pattern

llvm-svn: 145065
```
6f0ae783

Fix a devilish miscompile exposed by block placement. The · ee54feb6

Chandler Carruth authored Nov 22, 2011

updateTerminator code didn't correctly handle EH terminators in one very
specific case. AnalyzeBranch would find no terminator instruction, and
so the fallback in updateTerminator is to assume fallthrough. This is
correct, but the destination of the fallthrough was assumed to be the
first successor.

This is *almost always* true, but in certain cases the loop
transformations will cause the landing pad to be the first successor!
Instead of this brittle logic, actually look through the successors for
a non-landing-pad accessor, and to assert if more than one is found.

This will hopefully fix some (if not all) of the self host miscompiles
with block placement. Thanks to Benjamin Kramer for reporting, Nick
Lewycky for an initial stab at a reduction, and Duncan for endless
advice on EH (which I know nothing about) as well as reviewing the
actual fix.

llvm-svn: 145062

ee54feb6

Add triple to the test. · c55e1af1
Rafael Espindola authored Nov 22, 2011
```
llvm-svn: 145057
```
c55e1af1
If a register is both an early clobber and part of a tied use, handle the use · 2021f382
Rafael Espindola authored Nov 22, 2011
```
before the clobber so that we copy the value if needed.

Fixes pr11415.

llvm-svn: 145056
```
2021f382

Nov 21, 2011
- Fix crasher in GVN due to my recent capture tracking changes. · 063ae589
  Nick Lewycky authored Nov 21, 2011
```
llvm-svn: 145047
```
  063ae589
- Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled. · 6270d072
  Craig Topper authored Nov 21, 2011
```
llvm-svn: 145028
```
  6270d072
- Test case for r145026 · d12d6f4b
  Craig Topper authored Nov 21, 2011
```
llvm-svn: 145027
```
  d12d6f4b
- Make LowerSIGN_EXTEND_INREG split 256-bit vectors when AVX1 is enabled and use... · a065238c
  Craig Topper authored Nov 21, 2011
```
Make LowerSIGN_EXTEND_INREG split 256-bit vectors when AVX1 is enabled and use AVX2 shifts when AVX2 is enabled.

llvm-svn: 145022
```
  a065238c
Nov 20, 2011

test/CodeGen/X86/block-placement.ll: Relax expressions for Win32. · 76dfa038
NAKAMURA Takumi authored Nov 20, 2011
```
llvm-svn: 145011
```
76dfa038

The logic for breaking the CFG in the presence of hot successors didn't · 18dfac38

Chandler Carruth authored Nov 20, 2011

properly account for the *global* probability of the edge being taken.
This manifested as a very large number of unconditional branches to
blocks being merged against the CFG even though they weren't
particularly hot within the CFG.

The fix is to check whether the edge being merged is both locally hot
relative to other successors for the source block, and globally hot
compared to other (unmerged) predecessors of the destination block.

This introduces a new crasher on GCC single-source, but it's currently
behind a flag, and Ben has offered to work on the reduction. =]

llvm-svn: 145010

18dfac38

XFAIL this test until I figure out what indvars is doing here (or find someone who does) · 650c09aa
Benjamin Kramer authored Nov 20, 2011
```
llvm-svn: 145008
```
650c09aa

Add some comments to the latest test case I added here to document what · 20df3953

Chandler Carruth authored Nov 20, 2011

is actually being tested. Also add some FileCheck goodness to much more
carefully ensure that the result is the desired result. Before this test
would only have failed through an assert failure if the underlying fix
were reverted.

Also, add some weight metadata and a comment explaining exactly what is
going on to a trick section of the test case. Originally, we were
getting very unlucky and trying to form a block chain that isn't
actually profitable. I'm working on a fix to avoid forming these
unprofitable chains, and that would also have masked any failure from
this test case. The easy solution is to add some metadata that makes it
*really* profitable to form the bad chain here.

llvm-svn: 145006

20df3953

Add code for lowering v32i8 shifts by a splat to AVX2 immediate shift... · e79761df

Craig Topper authored Nov 20, 2011

Add code for lowering v32i8 shifts by a splat to AVX2 immediate shift instructions. Remove 256-bit splat handling from LowerShift as it was already handled by PerformShiftCombine.

llvm-svn: 145005

e79761df

Nov 19, 2011

Use 256-bit vcmpeqd for creating an all ones vector when AVX2 is enabled. · a3a65836
Craig Topper authored Nov 19, 2011
```
llvm-svn: 145004
```
a3a65836

Move the handling of unanalyzable branches out of the loop-driven chain · f3dc9eff

Chandler Carruth authored Nov 19, 2011

formation phase and into the initial walk of the basic blocks. We
essentially pre-merge all blocks where unanalyzable fallthrough exists,
as we won't be able to update the terminators effectively after any
reorderings. This is quite a bit more principled as there may be CFGs
where the second half of the unanalyzable pair has some analyzable
predecessor that gets placed first. Then it may get placed next,
implicitly breaking the unanalyzable branch even though we never even
looked at the part that isn't analyzable. I've included a test case that
triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
some more general ones as I dig into related issues.

Also, to make this new scheme work we have to be able to handle branches
into the middle of a chain, so add this check. We always fallback on the
incoming ordering.

Finally, this starts to really underscore a known limitation of the
current implementation -- we don't consider broken predecessors when
merging successors. This can caused major missed opportunities, and is
something I'm planning on looking at next (modulo more bug reports).

llvm-svn: 144994

f3dc9eff

Test cases for SSSE3/AVX integer horizontal add/sub. · 6d77f4ae
Craig Topper authored Nov 19, 2011
```
llvm-svn: 144990
```
6d77f4ae