Commits · 700c58b1c55fbe8e7962abc0af6a364e9059c6a6 · Roger Ferrer / llvm-epi

Aug 21, 2014

Fix a bug around truncating vector in const prop. · 950844fa
Jiangning Liu authored Aug 21, 2014
```
In constant folding stage, "TRUNC" can't handle vector data type.

llvm-svn: 216149
```
950844fa
Revert r216066, "Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type". · deb4b5fc
Jiangning Liu authored Aug 21, 2014
```
llvm-svn: 216147
```
deb4b5fc

[PeepholeOptimizer] Take advantage of the isInsertSubreg property in the · 68962300

Quentin Colombet authored Aug 21, 2014

advanced copy optimization.

This is the final step patch toward transforming:
udiv    r0, r0, r2
udiv    r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16
bx      lr

into:
udiv    r0, r0, r2
udiv    r1, r1, r3
bx      lr

Indeed, thanks to this patch, this optimization is able to look through
vmov.32 d16[0], r0
vmov.32 d16[1], r1

and is able to rewrite the following sequence:
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16

into simple generic GPR copies that the coalescer managed to remove.

<rdar://problem/12702965>

llvm-svn: 216144

68962300

[ARM] Mark VSETLNi32 with the InsertSubreg property and implement the related · 84f15bd1

Quentin Colombet authored Aug 21, 2014

target hook.

This patch teaches the compiler that:
dX = VSETLNi32 dY, rZ, imm
is the same as:
dX = INSERT_SUBREG dY, rZ, translateImmToSubIdx(imm)

<rdar://problem/12702965>

llvm-svn: 216143

84f15bd1

[LoopVectorize] Up the maximum unroll factor to 4 for AArch64 · a88896b5
James Molloy authored Aug 21, 2014
```
Only for Cortex-A57 and Cyclone for now, where it has shown wins.

llvm-svn: 216141
```
a88896b5

[LoopVectorizer] Limit unroll factor in the presence of nested reductions. · 82c995d4

James Molloy authored Aug 20, 2014

If we have a scalar reduction, we can increase the critical path length if the loop we're unrolling is inside another loop. Limit, by default to 2, so the critical path only gets increased by one reduction operation.

llvm-svn: 216140

82c995d4

Add isInsertSubreg property. · 7e3da667

Quentin Colombet authored Aug 20, 2014

This patch adds a new property: isInsertSubreg and the related target hooks:
TargetIntrInfo::getInsertSubregInputs and
TargetInstrInfo::getInsertSubregLikeInputs to specify that a target specific
instruction is a (kind of) INSERT_SUBREG.

The approach is similar to r215394.

<rdar://problem/12702965>

llvm-svn: 216139

7e3da667

Lower thumbv4t & thumbv5 lo->lo copies through a push-pop sequence · 44937d98

Jon Roelofs authored Aug 20, 2014

On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to
be avoided. This patch trades simplicity for implementation time at the expense
of performance... As they say: correctness first, then performance.

See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few
ideas on how to make this better.

llvm-svn: 216138

44937d98

Mention the right target hook in the comment on isExtractSubreg property. · a5674906
Quentin Colombet authored Aug 20, 2014
```
llvm-svn: 216137
```
a5674906

[PeepholeOptimizer] Take advantage of the isExtractSubreg property in the · 67639df1

Quentin Colombet authored Aug 20, 2014

advanced copy optimization.

This patch is a step toward transforming:
udiv	r0, r0, r2
udiv	r1, r1, r3
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

into:
udiv	r0, r0, r2
udiv	r1, r1, r3
bx	lr

Indeed, thanks to this patch, this optimization is able to look through
vmov r0, r1, d16
but it does not understand yet
vmov.32 d16[0], r0
vmov.32 d16[1], r1

Comming patches will fix that and update the related test case.

<rdar://problem/12702965>

llvm-svn: 216136

67639df1

New InstCombine pattern: (icmp ult/ule (A + C1), C3) | (icmp ult/ule (A + C2),... · 1a4e73d7

Yi Jiang authored Aug 20, 2014

New InstCombine pattern: (icmp ult/ule (A + C1), C3) | (icmp ult/ule (A + C2), C3)  to (icmp ult/ule ((A & ~(C1 ^ C2)) + max(C1, C2)), C3) under certain condition

llvm-svn: 216135

1a4e73d7

Don't allow MCStreamer::EmitIntValue to output 0-byte integers. · e5864c69

Alexey Samsonov authored Aug 20, 2014

It makes no sense and can hide bugs. In particular, it lead
to left shift by 64 bits, which is an undefined behavior,
properly reported by UBSan.

llvm-svn: 216134

e5864c69

[ARM] Mark VMOVRRD with the ExtractSubreg property and implement the related · deb82eab

Quentin Colombet authored Aug 20, 2014

target hook.

This patch teaches the compiler that:
rX, rY = VMOVRRD dZ
is the same as:
rX = EXTRACT_SUBREG dZ, ssub_0
rY = EXTRACT_SUBREG dZ, ssub_1

<rdar://problem/12702965>

llvm-svn: 216132

deb82eab

Aug 20, 2014

Fix undefined behavior (left shift of negative value) in SystemZ backend. · fffd56ec
Alexey Samsonov authored Aug 20, 2014
```
This bug is reported by UBSan.

llvm-svn: 216131
```
fffd56ec

Add isExtractSubreg property. · 7e75cbaf

Quentin Colombet authored Aug 20, 2014

This patch adds a new property: isExtractSubreg and the related target hooks:
TargetIntrInfo::getExtractSubregInputs and
TargetInstrInfo::getExtractSubregLikeInputs to specify that a target specific
instruction is a (kind of) EXTRACT_SUBREG.

The approach is similar to r215394.

<rdar://problem/12702965>

llvm-svn: 216130

7e75cbaf

Fix null reference creation in SelectionDAG constructor. · e229ec5b

Alexey Samsonov authored Aug 20, 2014

Store TargetSelectionDAGInfo as a pointer instead of a reference:
getSelectionDAGInfo() may not be implemented for certain backends
(e.g. it's not currently implemented for R600).

This bug is reported by UBSan.

llvm-svn: 216129

e229ec5b

Fix undefined behavior (left shift of negative value) in Hexagon backend. · 2651ae65
Alexey Samsonov authored Aug 20, 2014
```
This bug is reported by UBSan.

llvm-svn: 216125
```
2651ae65
Cleanup: Delete seemingly unused reference to MachineDominatorTree from ScheduleDAGInstrs. · ea0aee62
Alexey Samsonov authored Aug 20, 2014
```
llvm-svn: 216124
```
ea0aee62

Don't prevent a vselect of constants from becoming a single load (PR20648). · bba72c7c

Sanjay Patel authored Aug 20, 2014

Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648

This patch checks the operands of a vselect to see if all values are constants.
If yes, bail out of any further attempts to create a blend or shuffle because
SelectionDAGLegalize knows how to turn this kind of vselect into a single load.

This already happens for machines without SSE4.1, so the added checks just send
more targets down that path.

Differential Revision: http://reviews.llvm.org/D4934

llvm-svn: 216121

bba72c7c

X86: Add missing triples from r216119 · 7bb10f8a
Duncan P. N. Exon Smith authored Aug 20, 2014
```
llvm-svn: 216120
```
7bb10f8a

X86: Align the stack on word boundaries in LowerFormalArguments() · b1826353

Duncan P. N. Exon Smith authored Aug 20, 2014

The goal of the patch is to implement section 3.2.3 of the AMD64 ABI
correctly.  The controlling sentence is, "The size of each argument gets
rounded up to eightbytes.  Therefore the stack will always be eightbyte
aligned." The equivalent sentence in the i386 ABI page 37 says, "At all
times, the stack pointer should point to a word-aligned area."  For both
architectures, the stack pointer is not being rounded up to the nearest
eightbyte or word between the last normal argument and the first
variadic argument.

Patch by Thomas Jablin!

llvm-svn: 216119

b1826353

Fix null reference creation in ScheduleDAGInstrs constructor call. · 8968e6d1

Alexey Samsonov authored Aug 20, 2014

Both MachineLoopInfo and MachineDominatorTree may be null in ScheduleDAGMI
constructor call. It is undefined behavior to take references to these values.

This bug is reported by UBSan.

llvm-svn: 216118

8968e6d1

Do not insert a tail call when returning multiple values on X86 · d750723d

Keno Fischer authored Aug 20, 2014

Summary: This fixes http://llvm.org/bugs/show_bug.cgi?id=19530.
The problem is that X86ISelLowering erroneously thought the third call
was eligible for tail call elimination.
It would have been if it's return value was actually the one returned
by the calling function, but here that is not the case and
additional values are being returned.

Test Plan: Test case from the original bug report is included.

Reviewers: rafael

Reviewed By: rafael

Subscribers: rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D4968

llvm-svn: 216117

d750723d

Fix undefined behavior (left shift by 64 bits) in ScaledNumber::toString(). · 314c643b
Alexey Samsonov authored Aug 20, 2014
```
This bug is reported by UBSan.

llvm-svn: 216116
```
314c643b

critical-anti-dependency breaker: don't use reg def info from kill insts (PR20308) · f3cfeef2

Sanjay Patel authored Aug 20, 2014

In PR20308 ( http://llvm.org/bugs/show_bug.cgi?id=20308 ), the critical-anti-dependency breaker
caused a miscompile because it broke a WAR hazard using a register that it thinks is available
based on info from a kill inst. Until PR18663 is solved, we shouldn't use any def/use info from
a kill because they are really just nops.

This patch adds guard checks for kills around calls to ScanInstruction() where the DefIndices
array is set. For good measure, add an assert in ScanInstruction() so we don't hit this bug again.

The test case is a reduced version of the code from the bug report.

Differential Revision: http://reviews.llvm.org/D4977

llvm-svn: 216114

f3cfeef2

[PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of · 03e43f8e

Quentin Colombet authored Aug 20, 2014

the isRegSequence property.

This is a follow-up of r215394 and r215404, which respectively introduces the
isRegSequence property and uses it for ARM.

Thanks to the property introduced by the previous commits, this patch is able
to optimize the following sequence:
vmov	d0, r2, r3
vmov	d1, r0, r1
vmov	r0, s0
vmov	r1, s2
udiv	r0, r1, r0
vmov	r1, s1
vmov	r2, s3
udiv	r1, r2, r1
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

into:
udiv	r0, r0, r2
udiv	r1, r1, r3
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

This patch refactors how the copy optimizations are done in the peephole
optimizer. Prior to this patch, we had one copy-related optimization that
replaced a copy or bitcast by a generic, more suitable (in terms of register
file), copy.

With this patch, the peephole optimizer features two copy-related optimizations:
1. One for rewriting generic copies to generic copies:
PeepholeOptimizer::optimizeCoalescableCopy.
2. One for replacing non-generic copies with generic copies:
PeepholeOptimizer::optimizeUncoalescableCopy.

The goals of these two optimizations are slightly different: one rewrite the
operand of the instruction (#1), the other kills off the non-generic instruction
and replace it by a (sequence of) generic instruction(s).

Both optimizations rely on the ValueTracker introduced in r212100.

The ValueTracker has been refactored to use the information from the
TargetInstrInfo for non-generic instruction. As part of the refactoring, we
switched the tracking from the index of the definition to the actual register
(virtual or physical). This one change is to provide better consistency with
register related APIs and to ease the use of the TargetInstrInfo.

Moreover, this patch introduces a new helper class CopyRewriter used to ease the
rewriting of generic copies (i.e., #1).

Finally, this patch adds a dead code elimination pass right after the peephole
optimizer to get rid of dead code that may appear after rewriting.

This is related to <rdar://problem/12702965>.

Review: http://reviews.llvm.org/D4874
llvm-svn: 216088

03e43f8e

Tweak CFGPrinter to wrap very long names. · 2223f8ed

Andrew Trick authored Aug 20, 2014

I added wrapping to the CFGPrinter a while back so the -view-cfg
output is actually viewable. I've since enountered very long mangled
names with the same problem, so I'm slightly tweaking this code to
work in that case.

llvm-svn: 216087

2223f8ed

Remove unused field. · 1c509715
Rafael Espindola authored Aug 20, 2014
```
llvm-svn: 216086
```
1c509715

[FastISel][AArch64] Don't fold the sign-/zero-extend from i1 into the compare. · e1bb055e

Juergen Ributzka authored Aug 20, 2014

This fixes a bug I introduced in a previous commit (r216033). Sign-/Zero-
extension from i1 cannot be folded into the ADDS/SUBS instructions. Instead both
operands have to be sign-/zero-extended with separate instructions.

Related to <rdar://problem/17913111>.

llvm-svn: 216073

e1bb055e

Quick fix for an use after free. · 061beab6
Rafael Espindola authored Aug 20, 2014
```
llvm-svn: 216071
```
061beab6
Add note to LangRef about how function arguments can be unnamed and · 2661dfc7
Dan Liew authored Aug 20, 2014
```
how this affects the numbering of unnamed temporaries.

llvm-svn: 216070
```
2661dfc7
Silencing a -Wcast-qual warning. NFC. · 47497258
Aaron Ballman authored Aug 20, 2014
```
llvm-svn: 216068
```
47497258

Silencing an MSVC C4334 warning ('<<' : result of 32-bit shift implicitly... · bf6ee221

Aaron Ballman authored Aug 20, 2014

Silencing an MSVC C4334 warning ('<<' : result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)). NFC.

llvm-svn: 216067

bf6ee221

Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type · f841b3b7

Jiangning Liu authored Aug 20, 2014

legalization stage. With those two optimizations, fewer signed/zero extension
instructions can be inserted, and then we can expose more opportunities to
Machine CSE pass in back-end.

llvm-svn: 216066

f841b3b7

[x32] Fix FrameIndex check in SelectLEA64_32Addr · 01a4e0a1

Pavel Chupin authored Aug 20, 2014

Summary:
Fixes http://llvm.org/bugs/show_bug.cgi?id=20016 reproducible on new
lea-5.ll case.
Also use RSP/RBP for x32 lea to save 1 byte used for 0x67 prefix in
ESP/EBP case.

Test Plan: lea tests modified to include x32/nacl and new test added

Reviewers: nadav, dschuff, t.p.northover

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D4929

llvm-svn: 216065

01a4e0a1

ARM: Fix codegen for rbit intrinsic · c655f0c8

Yi Kong authored Aug 20, 2014

LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic.
According to ARM ARM, rbit only takes register as argument, not immediate.
The correct instruction should be rbit <Rd>, <Rm>.

The bug was originally introduced in r211057.

Differential Revision: http://reviews.llvm.org/D4980

llvm-svn: 216064

c655f0c8

Update projects lists. · 5d536073
Bill Wendling authored Aug 20, 2014
```
llvm-svn: 216048
```
5d536073
Add libcxxabi to the projects. · e971d6fd
Bill Wendling authored Aug 20, 2014
```
llvm-svn: 216047
```
e971d6fd

InstCombine: Annotate sub with nuw when we prove it's safe · 42158f3e

David Majnemer authored Aug 20, 2014

We can prove that a 'sub' can be a 'sub nuw' if the left-hand side is
negative and the right-hand side is non-negative.

llvm-svn: 216045

42158f3e

Fix an off by 1 bug that prevented SmallPtrSet from using all of its 'small'... · 298f6380

Craig Topper authored Aug 20, 2014

Fix an off by 1 bug that prevented SmallPtrSet from using all of its 'small' capacity. Then fix the early return in the move constructor that prevented 'small' moves from clearing the NumElements in the moved from object. The directed test missed this because it was always testing large moves due to the off by 1 bug.

llvm-svn: 216044

298f6380