Commits · 8e06e2a8c4778420f7ea30593f3b8677835f9286 · Roger Ferrer / llvm-epi-0.8

Apr 10, 2013

R600/SI: adjust writemask to only the used components · 8e06e2a8

Christian Konig authored Apr 10, 2013



Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 179165

8e06e2a8

R600/SI: remove image sample writemask · 4ace6632

Christian Konig authored Apr 10, 2013



Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 179164

4ace6632

Cleanup PPCInstrInfo::DefinesPredicate · af822018

Hal Finkel authored Apr 10, 2013

Implement suggestions made by Bill Schmidt in post-commit review. Thanks!

llvm-svn: 179162

af822018

PPC: Prep for if conversion of bctr[l] · 500b0045

Hal Finkel authored Apr 10, 2013

This adds in-principle support for if-converting the bctr[l] instructions.
These instructions are used for indirect branching. It seems, however, that the
current if converter will never actually predicate these. To do so, it would
need the ability to hoist a few setup insts. out of the conditionally-executed
block. For example, code like this:
  void foo(int a, int (*bar)()) { if (a != 0) bar(); }
becomes:
        ...
        beq 0, .LBB0_2
        std 2, 40(1)
        mr 12, 4
        ld 3, 0(4)
        ld 11, 16(4)
        ld 2, 8(4)
        mtctr 3
        bctrl
        ld 2, 40(1)
.LBB0_2:
        ...
and it would be safe to do all of this unconditionally with a predicated
beqctrl instruction.

llvm-svn: 179156

500b0045

__sincosf_stret returns sinf / cosf in bits 0:31 and 32:63 of xmm0, not in · ac0469c5
Evan Cheng authored Apr 10, 2013
```
xmm0 / xmm1.

rdar://13599493

llvm-svn: 179141
```
ac0469c5

Mips specific inline asm operand modifier 'D' · b04e357d

Jack Carter authored Apr 09, 2013

Modifier 'D' is to use the second word of a double integer.

We had previously implemented the pure register varient of 
the modifier and this patch implements the memory reference.



#include "stdio.h"

int b[8] = {0,1,2,3,4,5,6,7};
void main()
{
    int i;
    
    // The first word. Notice, no 'D'
    {asm (
    "lw    %0,%1;"
    : "=r" (i)
    : "m" (*(b+4))
    );}
    
    printf("%d\n",i);

    // The second word
    {asm (
    "lw    %0,%D1;"
    : "=r" (i)
    : "m" (*(b+4))
    );}
    
    printf("%d\n",i);
}

llvm-svn: 179135

b04e357d

Allow PPC B and BLR to be if-converted into some predicated forms · 5711eca1

Hal Finkel authored Apr 09, 2013

This enables us to form predicated branches (which are the same conditional
branches we had before) and also a larger set of predicated returns (including
instructions like bdnzlr which is a conditional return and loop-counter
decrement all in one).

At the moment, if conversion does not capture all possible opportunities. A
simple example is provided in early-ret2.ll, where if conversion forms one
predicated return, and then the PPCEarlyReturn pass picks up the other one. So,
at least for now, we'll keep both mechanisms.

llvm-svn: 179134

5711eca1

Apr 09, 2013

Cleanup. No functional change intended. · 18785857
Chad Rosier authored Apr 09, 2013
```
llvm-svn: 179129
```
18785857
Cleanup. No functional change intended. · 10d1d1cc
Chad Rosier authored Apr 09, 2013
```
llvm-svn: 179125
```
10d1d1cc
Revert r179115 as it looks to have killed the ASan tests. · e8d8288d
Chad Rosier authored Apr 09, 2013
```
llvm-svn: 179120
```
e8d8288d

This patch enables llvm to switch between compiling for mips32/mips64 · 1595f36d

Reed Kotler authored Apr 09, 2013

and mips16 on a per function basis.

Because this patch is somewhat involved I have provide an overview of the
key pieces of it.

The patch is written so as to not change the behavior of the non mixed
mode. We have tested this a lot but it is something new to switch subtargets
so we don't want any chance of regression in the mainline compiler until
we have more confidence in this.

Mips32/64 are very different from Mip16 as is the case of ARM vs Thumb1.
For that reason there are derived versions of the register info, frame info, 
instruction info and instruction selection classes.

Now we register three separate passes for instruction selection.
One which is used to switch subtargets (MipsModuleISelDAGToDAG.cpp) and then
one for each of the current subtargets (Mips16ISelDAGToDAG.cpp and
MipsSEISelDAGToDAG.cpp).

When the ModuleISel pass runs, it determines if there is a need to switch
subtargets and if so, the owning pointers in MipsTargetMachine are
appropriately changed.

When 16Isel or SEIsel is run, they will return immediately without doing
any work if the current subtarget mode does not apply to them.

In addition, MipsAsmPrinter needs to be reset on a function basis.

The pass BasicTargetTransformInfo is substituted with a null pass since the
pass is immutable and really needs to be a function pass for it to be
used with changing subtargets. This will be fixed in a follow on patch.

llvm-svn: 179118

1595f36d

[ms-inline asm] Use parsePrimaryExpr in lieu of parseExpression if we need to · a08f30f0

Chad Rosier authored Apr 09, 2013

parse an identifier.  Otherwise, parseExpression may parse multiple tokens,
which makes it impossible to properly compute an immediate displacement.
An example of such a case is the source operand (i.e., [Symbol + ImmDisp]) in
the below example:

 __asm mov eax, [Symbol + ImmDisp]

The existing test cases exercise this patch.
rdar://13611297

llvm-svn: 179115

a08f30f0

Cleanup PPCEarlyReturn · 21aad9a8

Hal Finkel authored Apr 09, 2013

Some general cleanup and only scan the end of a BB for branches (once we're
done with the terminators and debug values, then there should not be any other
branches). These address post-commit review suggestions by Bill Schmidt.

No functionality change intended.

llvm-svn: 179112

21aad9a8

[ms-inline asm] Maintain a StringRef to reference a symbol in a parsed operand, · e81309b3

Chad Rosier authored Apr 09, 2013

rather than deriving the StringRef from the Start and End SMLocs.

Using the Start and End SMLocs works fine for operands such as [Symbol], but
not for operands such as [Symbol + ImmDisp].  All existing test cases that
reference a variable exercise this patch.
rdar://13602265

llvm-svn: 179109

e81309b3

Use virtual base registers on PPC · b5899d57

Hal Finkel authored Apr 09, 2013

On PowerPC, non-vector loads and stores have r+i forms; however, in functions
with large stack frames these were not being used to access slots far from the
stack pointer because such slots were out of range for the signed 16-bit
immediate offset field. This increases register pressure because we need a
separate register for each offset (when the r+r form is used). By enabling
virtual base registers, we can deal with large stack frames without unduly
increasing register pressure.

llvm-svn: 179105

b5899d57

Extract a function. · c910feb4
Jakob Stoklund Olesen authored Apr 09, 2013
```
llvm-svn: 179086
```
c910feb4

Compute correct frame sizes for SPARC v9 64-bit frames. · 2cfe46fd

Jakob Stoklund Olesen authored Apr 09, 2013

The save area is twice as big and there is no struct return slot. The
stack pointer is always 16-byte aligned (after adding the bias).

Also eliminate the stack adjustment instructions around calls when the
function has a reserved stack frame.

llvm-svn: 179083

2cfe46fd

Apr 08, 2013

X86 cost model: Model cost for uitofp and sitofp on SSE2 · f47d2d7f

Arnold Schwaighofer authored Apr 08, 2013

The costs are overfitted so that I can still use the legalization factor.

For example the following kernel has about half the throughput vectorized than
unvectorized when compiled with SSE2. Before this patch we would vectorize it.

unsigned short A[1024];
double B[1024];
void f() {
  int i;
  for (i = 0; i < 1024; ++i) {
    B[i] = (double) A[i];
  }
}

radar://13599001

llvm-svn: 179033

f47d2d7f

[ms-inline asm] Add support for ImmDisp [ Symbol ] memory operands. · fce4fab1
Chad Rosier authored Apr 08, 2013
```
rdar://13521249

llvm-svn: 179030
```
fce4fab1

Generate PPC early conditional returns · b5aa7e54

Hal Finkel authored Apr 08, 2013

PowerPC has a conditional branch to the link register (return) instruction: BCLR.
This should be used any time when we'd otherwise have a conditional branch to a
return. This adds a small pass, PPCEarlyReturn, which runs just prior to the
branch selection pass (and, importantly, after block placement) to generate
these conditional returns when possible. It will also eliminate unconditional
branches to returns (these happen rarely; most of the time these have already
been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice
optimization for small functions that do not maintain a stack frame.

llvm-svn: 179026

b5aa7e54

R600: Control Flow support for pre EG gen · 5f11dd39
Vincent Lejeune authored Apr 08, 2013
```
llvm-svn: 179020
```
5f11dd39

AArch64: remove barriers from AArch64 atomic operations. · 15410e98

Tim Northover authored Apr 08, 2013

I've managed to convince myself that AArch64's acquire/release
instructions are sufficient to guarantee C++11's required semantics,
even in the sequentially-consistent case.

llvm-svn: 179005

15410e98

ARM: Remove unused variable. · d56a324e
Benjamin Kramer authored Apr 08, 2013
```
llvm-svn: 179001
```
d56a324e

Cleanup and improve PPC fsel generation · 81f8799f

Hal Finkel authored Apr 07, 2013

First, we should not cheat: fsel-based lowering of select_cc is a
finite-math-only optimization (the ISA manual, section F.3 of v2.06, makes
this clear, as does a note in our own README).

This also adds fsel-based lowering of EQ and NE condition codes. As it turned
out, fsel generation was covered by a grand total of zero regression test
cases. I've added some test cases to cover the existing behavior (which is now
finite-math only), as well as the new EQ cases.

llvm-svn: 179000

81f8799f

Apr 07, 2013

Implement LowerCall_64 for the SPARC v9 64-bit ABI. · a30f4832
Jakob Stoklund Olesen authored Apr 07, 2013
```
There is still no support for byval arguments (which I don't think are
needed) and varargs.

llvm-svn: 178993
```
a30f4832
PPC rotate instructions don't have unmodeled side effcts · 7795e47b
Hal Finkel authored Apr 07, 2013
```
llvm-svn: 178982
```
7795e47b
Most PPC M[TF]CR instructions do not have side effects · b47a69ac
Hal Finkel authored Apr 07, 2013
```
llvm-svn: 178978
```
b47a69ac
PPC pre-increment load instructions do not have side effects · d71cc3a7
Hal Finkel authored Apr 07, 2013
```
A few were missed in r178972.

llvm-svn: 178973
```
d71cc3a7
PPC pre-increment load instructions do not have side effects · 6efd45e9
Hal Finkel authored Apr 07, 2013
```
llvm-svn: 178972
```
6efd45e9
PPC MCRF instruction does not have side effects · 933e8f03
Hal Finkel authored Apr 07, 2013
```
llvm-svn: 178971
```
933e8f03
PPC FMR instruction does not have side effects · 94072b98
Hal Finkel authored Apr 07, 2013
```
llvm-svn: 178970
```
94072b98

Implement LowerReturn_64 for SPARC v9. · edaf66b0

Jakob Stoklund Olesen authored Apr 06, 2013

Integer return values are sign or zero extended by the callee, and
structs up to 32 bytes in size can be returned in registers.

The CC_Sparc64 CallingConv definition is shared between
LowerFormalArguments_64 and LowerReturn_64. Function arguments and
return values are passed in the same registers.

The inreg flag is also used for return values. This is required to handle
C functions returning structs containing floats and ints:

  struct ifp {
    int i;
    float f;
  };

  struct ifp f(void);

LLVM IR:

  define inreg { i32, float } @f() {
     ...
     ret { i32, float } %retval
  }

The ABI requires that %retval.i is returned in the high bits of %i0
while %retval.f goes in %f1.

Without the inreg return value attribute, %retval.i would go in %i0 and
%retval.f would go in %f3 which is a more efficient way of returning
%multiple values, but it is not ABI compliant for returning C structs.

llvm-svn: 178966

edaf66b0

Apr 06, 2013

SPARC v9 stack pointer bias. · 03d9f7fd

Jakob Stoklund Olesen authored Apr 06, 2013

64-bit SPARC v9 processes use biased stack and frame pointers, so the
current function's stack frame is located at %sp+BIAS .. %fp+BIAS where
BIAS = 2047.

This makes more local variables directly accessible via [%fp+simm13]
addressing.

llvm-svn: 178965

03d9f7fd

Implement PPCInstrInfo::FoldImmediate · d61d4f80

Hal Finkel authored Apr 06, 2013

There are certain PPC instructions into which we can fold a zero immediate
operand. We can detect such cases by looking at the register class required
by the using operand (so long as it is not otherwise constrained).

llvm-svn: 178961

d61d4f80

PPC ISEL is a select and never has side effects · 8fc33e5d
Hal Finkel authored Apr 06, 2013
```
llvm-svn: 178960
```
8fc33e5d

Complete formal arguments for the SPARC v9 64-bit ABI. · 1c9a95ab

Jakob Stoklund Olesen authored Apr 06, 2013

All arguments are formally assigned to stack positions and then promoted
to floating point and integer registers. Since there are more floating
point registers than integer registers, this can cause situations where
floating point arguments are assigned to registers after integer
arguments that where assigned to the stack.

Use the inreg flag to indicate 32-bit fragments of structs containing
both float and int members.

The three-way shadowing between stack, integer, and floating point
registers requires custom argument lowering. The good news is that
return values are passed in the exact same way, and we can share the
code.

Still missing:

 - Update LowerReturn to handle structs returned in registers.
 - LowerCall.
 - Variadic functions.

llvm-svn: 178958

1c9a95ab

R600/SI: Add support for buffer stores v2 · 754f80ff

Tom Stellard authored Apr 05, 2013



v2:
  - Use the ADDR64 bit

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178931

754f80ff

R600/SI: Use same names for corresponding MUBUF operands and encoding fields · 6db08eb4

Tom Stellard authored Apr 05, 2013



The code emitter knows how to encode operands whose name matches one of
the encoding fields.  If there is no match, the code emitter relies on
the order of the operand and field definitions to determine how operands
should be encoding.  Matching by order makes it easy to accidentally break
the instruction encodings, so we prefer to match by name.

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178930

6db08eb4

R600: Add RV670 processor · 60174bb9

Tom Stellard authored Apr 05, 2013



This is an R600 GPU with double support.

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178929

60174bb9

R600/SI: Add processor types for each SI variant · 2f21c7e5
Tom Stellard authored Apr 05, 2013
```
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178928
```
2f21c7e5