Commits · 1baf2ea2d1e2445b10061ae5cd401159edf0e247 · Roger Ferrer / llvm-epi-0.8

Jul 13, 2013
- [mips] Add instruction itinerary classes for mult, seb and slt instructions. · 1baf2ea2
  Akira Hatanaka authored Jul 12, 2013
```
llvm-svn: 186222
```
  1baf2ea2
- Remove extraneous braces. · 3931cf94
  Eric Christopher authored Jul 12, 2013
```
llvm-svn: 186212
```
  3931cf94
Jul 12, 2013

R600: Remove unsafe type punning. No intended functionality change. · c22c790f
Benjamin Kramer authored Jul 12, 2013
```
llvm-svn: 186196
```
c22c790f
X86 cost model: Add cost for vectorized gather/scather · 6042a261
Arnold Schwaighofer authored Jul 12, 2013
```
radar://14351991

llvm-svn: 186189
```
6042a261

ARM cost model: Add cost for gather/scather · da2b3118

Arnold Schwaighofer authored Jul 12, 2013

Fixes a 35% degradation compared to unvectorized code in
MiBench/automotive-susan and an equally serious regression on a private
image processing benchmark.

radar://14351991

llvm-svn: 186188

da2b3118

TargetTransformInfo: address calculation parameter for gather/scather · 9da9a43a

Arnold Schwaighofer authored Jul 12, 2013

Address calculation for gather/scather in vectorized code can incur a
significant cost making vectorization unbeneficial. Add infrastructure to add
cost.
Tests and cost model for targets will be in follow-up commits.

radar://14351991

llvm-svn: 186187

9da9a43a

R600/SI: Add support for f64 kernel arguments · ccae60ac

Tom Stellard authored Jul 12, 2013



Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186182

ccae60ac

R600/SI: Implement select and compares for SI · 4e1100ab

Tom Stellard authored Jul 12, 2013



Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186181

4e1100ab

R600/SI: Add fsqrt pattern for SI · 8ed7b45d

Tom Stellard authored Jul 12, 2013



Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186180

8ed7b45d

R600/SI: Add double precision fsub pattern for SI · 2a6a6105

Tom Stellard authored Jul 12, 2013



Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186179

2a6a6105

R600/SI: SI support for 64bit ConstantFP · ab8a8c84

Tom Stellard authored Jul 12, 2013



Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186178

ab8a8c84

R600/SI: Add initial double precision support for SI · 7512c080

Tom Stellard authored Jul 12, 2013



Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186177

7512c080

X86: Shrink certain forms of movsx. · 068a2253

Benjamin Kramer authored Jul 12, 2013

In particular:
movsbw %al, %ax   --> cbtw
movswl %ax, %eax  --> cwtl
movslq %eax, %rax --> cltq

According to Intel's manual those have the same performance characteristics but
come with a smaller encoding.

llvm-svn: 186174

068a2253

X86: fold SSE2/AVX2 logical shift by immediate amount into zero vector when possible · fda967fd
Stephen Lin authored Jul 12, 2013
```
Patch by Andrea Di Biagio

llvm-svn: 186165
```
fda967fd
Add support for Mips break and syscall insructions. The corresponding test cases are added. · bcf1ca08
Vladimir Medic authored Jul 12, 2013
```
llvm-svn: 186151
```
bcf1ca08

[SystemZ] Optimize sign-extends of vector setccs · 6d4bd283

Richard Sandiford authored Jul 12, 2013

Normal (sext (setcc ...)) sequences are optimised into
(select_cc ..., -1, 0) by DAGCombiner::visitSIGN_EXTEND.
However, this is deliberately not done for vectors, and after
vector type legalization we have (sext_inreg (setcc ...)) instead.

I wondered about trying to extend DAGCombiner to handle this case too,
but it seemed to be a loss on some other targets I tried, even those for
which SETCC isn't "legal" and SELECT_CC is.

llvm-svn: 186149

6d4bd283

[SystemZ] Fix parsing of inline asm registers · b820405b

Richard Sandiford authored Jul 12, 2013

GPR and FPR constraints like "{r2}" and "{f2}" weren't handled correctly
because the name-to-regno mapping depends on the value type and
(because of that) the internal names in RegStrings are not the
same as the AsmName.

CC constraints like "{cc}" didn't work either because there was no
associated register class.

llvm-svn: 186148

b820405b

[SystemZ] Improve spilling of LGDR and LDGR · 3f0edc29

Richard Sandiford authored Jul 12, 2013

If the source of these instructions is spilled we should load the destination.
If the destination is spilled we should store the source.

llvm-svn: 186147

3f0edc29

Target/X86: Add explicit Win64 and System V/x86-64 calling conventions. · e8f297ca

Charles Davis authored Jul 12, 2013

Summary:
This patch adds explicit calling convention types for the Win64 and
System V/x86-64 ABIs. This allows code to override the default, and use
the Win64 convention on a target that wants to use SysV (and
vice-versa). This is needed to implement the `ms_abi` and `sysv_abi` GNU
attributes.

Reviewers:

CC:

llvm-svn: 186144

e8f297ca

Jul 11, 2013

PPC: Add some missing V_SET0 patterns · 47150817

Hal Finkel authored Jul 11, 2013

We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for
v8i16 (which occurs in the test case) or v16i8. The same was true for
V_SETALLONES (so I added the associated patterns for those as well).

Another bug found by llvm-stress.

llvm-svn: 186108

47150817

PPCDAGToDAGISel::isRunOfOnes should return false on zero · ff3ea806

Hal Finkel authored Jul 11, 2013

This fixes a bug (found by csmith) at -O0 where we attempt to create a RLWIMI
with an out-of-range operand. Most uses of the isRunOfOnes function are guarded
by a condition that the value is not zero. This was not true in two places, and
in both places a zero input would result in an out-of-rage MB value (= 32).

To fix this, isRunOfOnes returns false on a zero input (and I've remove one
now-redundant guard).

llvm-svn: 186101

ff3ea806

[SystemZ] Use zeroing form of RISBG for shift-and-AND sequences · ea9b6aa2
Richard Sandiford authored Jul 11, 2013
```
Extend r186072 to handle shifts and ANDs.

llvm-svn: 186073
```
ea9b6aa2

[SystemZ] Use zeroing form of RISBG for some AND sequences · 84f54a3b

Richard Sandiford authored Jul 11, 2013

RISBG can handle some ANDs for which no AND IMMEDIATE exists.
It also acts as a three-operand AND for some cases where an
AND IMMEDIATE could be used instead.

It might be worth adding a pass to replace RISBG with AND IMMEDIATE
in cases where the register operands end up being the same and where
AND IMMEDIATE is smaller.

llvm-svn: 186072

84f54a3b

[SystemZ] Allow 8-bit operands to RISBG · 67ddcd6d

Richard Sandiford authored Jul 11, 2013

RISBG has three 8-bit operands (I3, I4 and I5).  I'd originally
restricted all three to 6 bits, since that's the only range we intended
to use at the time.  However, the top bit of I4 acts as a "zero" flag for
RISBG, while the top bit of I3 acts as a "test" flag for RNSBG & co.
This patch therefore allows them to have the full 8-bit range.
I've left the fifth operand as a 6-bit value for now since the
upper 2 bits have no defined meaning.

llvm-svn: 186070

67ddcd6d

Jul 10, 2013

Replacing an empty switch with its moral equivalent. No functional changes intended. · f04bbd8b
Aaron Ballman authored Jul 10, 2013
```
llvm-svn: 186017
```
f04bbd8b
Simplify code. · 9ae47078
Craig Topper authored Jul 10, 2013
```
llvm-svn: 186013
```
9ae47078

R600/SI: Initial local memory support · 49812b5b

Michel Danzer authored Jul 10, 2013



Enough for the radeonsi driver to use it for calculating derivatives.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186012

49812b5b

R600/SI: Add pattern for the AMDGPU.barrier.local intrinsic · 1f87df36

Michel Danzer authored Jul 10, 2013



lit test coverage to follow in the next commit.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186011

1f87df36

R600/SI: Add intrinsic for retrieving the current thread ID · 8d69617b
Michel Danzer authored Jul 10, 2013
```
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186010
```
8d69617b
R600/SI: Initial support for LDS/GDS instructions · 1c45430e
Michel Danzer authored Jul 10, 2013
```
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186009
```
1c45430e
R600/SI: Add intrinsics for texture sampling with user derivatives · 83f87c4c
Michel Danzer authored Jul 10, 2013
```
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186008
```
83f87c4c

PPC: Add a better comment about the i64 FI fixup · 7ab3db52

Hal Finkel authored Jul 10, 2013

In discussing this change with Bill Schmidt, it was decided that the original
comment about negative FIs was incorrect. We'll still exclude them for now, but
now with a more-accurate explanation.

llvm-svn: 186005

7ab3db52

Reverting commit r185999 due to buildboot failure. · 524ad0e4
Vladimir Medic authored Jul 10, 2013
```
llvm-svn: 186000
```
524ad0e4
Add support for Mips break and syscall insructions. The corresponding test cases are added. · e84de1e1
Vladimir Medic authored Jul 10, 2013
```
llvm-svn: 185999
```
e84de1e1
Fix typo · 2a644732
Stephen Lin authored Jul 10, 2013
```
llvm-svn: 185995
```
2a644732

Explicitly define ARMISelLowering::isFMAFasterThanFMulAndFAdd. No functionality change. · dd502028

Stephen Lin authored Jul 10, 2013

Currently ARM is the only backend that supports FMA instructions (for at least some subtargets) but does not implement this virtual, so FMAs are never generated except from explicit fma intrinsic calls. Apparently this is due to the fact that it supports both fused (one rounding step) and unfused (two rounding step) multiply + add instructions. This patch clarifies that this the case without changing behavior by implementing the virtual function to simply return false, as the default TargetLoweringBase version does.

It is possible that some cpus perform the fused version faster than the unfused version and vice-versa, so the function implementation should be revisited if hard data is found.

llvm-svn: 185994

dd502028

ARM: Fix incorrect pack pattern for thumb2 · ebcad2e0

Jim Grosbach authored Jul 09, 2013

Propagate the fix from r185712 to Thumb2 codegen as well. Original
commit message applies here as well:

A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and
packs them in the bottom half of "x". An arithmetic and logic shift are
only equivalent in this context if the shift amount is 16. We would be
shifting in ones into the bottom 16bits instead of zeros if "y" is
negative.

rdar://14338767

llvm-svn: 185982

ebcad2e0

Jul 09, 2013

[PowerPC] Better fix for PR16556. · 41221693

Bill Schmidt authored Jul 09, 2013

A more complete example of the bug in PR16556 was recently provided,
showing that the previous fix was not sufficient.  The previous fix is
reverted herein.

The real problem is that ReplaceNodeResults() uses LowerFP_TO_INT as
custom lowering for FP_TO_SINT during type legalization, without
checking whether the input type is handled by that routine.
LowerFP_TO_INT requires the input to be f32 or f64, so we fail when
the input is ppcf128.

I'm leaving the test case from the initial fix (r185821) in place, and
adding the new test as another crash-only check.

llvm-svn: 185959

41221693

AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all · 73de7bf5

Stephen Lin authored Jul 09, 2013

in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:

1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.

2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.

3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.

The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.

llvm-svn: 185956

73de7bf5

· 52cf8e44

Ulrich Weigand authored Jul 09, 2013

[PowerPC] Revert r185476 and fix up TLS variant kinds

In the commit message to r185476 I wrote:

>The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD
>correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD.
>This causes some confusion with the asm parser, since VK_PPC_TLSGD
>is output as @tlsgd, which is then read back in as VK_TLSGD.
>
>To avoid this confusion, this patch removes the PowerPC-specific
>modifiers and uses the generic modifiers throughout.  (The only
>drawback is that the generic modifiers are printed in upper case
>while the usual convention on PowerPC is to use lower-case modifiers.
>But this is just a cosmetic issue.)

This was unfortunately incorrect, there is is fact another,
serious drawback to using the default VK_TLSLD/VK_TLSGD
variant kinds: using these causes ELFObjectWriter::RelocNeedsGOT
to return true, which in turn causes the ELFObjectWriter to emit
an undefined reference to _GLOBAL_OFFSET_TABLE_.

This is a problem on powerpc64, because it uses the TOC instead
of the GOT, and the linker does not provide _GLOBAL_OFFSET_TABLE_,
so the symbol remains undefined.  This means shared libraries
using TLS built with the integrated assembler are currently
broken.

While the whole RelocNeedsGOT / _GLOBAL_OFFSET_TABLE_ situation
probably ought to be properly fixed at some point, for now I'm
simply reverting the r185476 commit.  Now this in turn exposes
the breakage of handling @tlsgd/@tlsld in the asm parser that
this check-in was originally intended to fix.

To avoid this regression, I'm also adding a different fix for
this problem: while common code now parses @tlsgd as VK_TLSGD,
a special hack in the asm parser translates this code to the
platform-specific VK_PPC_TLSGD that the back-end now expects.
While this is not really pretty, it's self-contained and
shouldn't hurt anything else for now.  One the underlying
problem is fixed, this hack can be reverted again.

llvm-svn: 185945

52cf8e44