Commits · 99ee0f4790654e6df4609f5b9895db0c37dcefa6 · Roger Ferrer / llvm-epi-0.8

Mar 07, 2013

R600/SI: rework input interpolation v2 · 99ee0f47

Christian Konig authored Mar 07, 2013



v2: update CMakeLists.txt as well

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176626

99ee0f47

R600/SI: remove SI_vs_load_buffer_index · aa9f4e6d

Christian Konig authored Mar 07, 2013



Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176625

aa9f4e6d

R600/SI: remove SGPR address space v2 · 189357c6

Christian Konig authored Mar 07, 2013



v2: fix R600 regressions

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176624

189357c6

R600/SI: add proper formal parameter handling for SI · 2c8f6d53

Christian Konig authored Mar 07, 2013



Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176623

2c8f6d53

R600/SI: remove shader type intrinsic · 3625055b

Christian Konig authored Mar 07, 2013



Just encode the type as target specific attribute.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176622

3625055b

R600/SI: switch types of SGPRs to v*i8 · 2214f14a

Christian Konig authored Mar 07, 2013



Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176621

2214f14a

R600/SI: fix unused variable warning · a0ed6572

Christian Konig authored Mar 07, 2013



Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176620

a0ed6572

Fix two remaining issue after fixing PR15355 when CMOV is not available · d5cac37d

Michael Liao authored Mar 07, 2013

- Phi nodes should be replaced/updated after lowering CMOV into branch
  because 'mainMBB' updating operand in Phi node is changed.
- Add EFLAGS in livein before lowering the 2nd CMOV. It's necessary as
  we will reuse the EFLAGS generated before the 1st lowered CMOV, which
  won't clobber EFLAGS. However, we need explicitly specify that.
- '-attr=-cmov' test case are added.

llvm-svn: 176598

d5cac37d

Mar 06, 2013

[mips] Custom-legalize BR_JT. · 0f693a8a

Akira Hatanaka authored Mar 06, 2013

In N64-static, GOT address is needed to compute the branch address.

llvm-svn: 176580

0f693a8a

Fix PR15355 · da22b30b

Michael Liao authored Mar 06, 2013

- Clear 'mayStore' flag when loading from the atomic variable before the
  spin loop
- Clear kill flag from one use to multiple use in registers forming the
  address to that atomic variable
- don't use a physical register as live-in register in BB (neither entry
  nor landing pad.) by copying it into virtual register

(patch by Cameron Zwarich)

llvm-svn: 176538

da22b30b

[mips] Remove android calling convention. · 1454ed8a

Akira Hatanaka authored Mar 05, 2013

This calling convention was added just to handle functions which return vector
of floats. The fix committed in r165585 solves the problem.

llvm-svn: 176530

1454ed8a

Mar 05, 2013

[mips] Fix MipsCC::analyzeReturn so that, in soft-float mode, fp128 gets · e092f729
Akira Hatanaka authored Mar 05, 2013
```
returned in registers $2 and $4.

llvm-svn: 176527
```
e092f729
[mips] Fix MipsTargetLowering::LowerCallResult and LowerReturn to correctly · 5f3ba9e5
Akira Hatanaka authored Mar 05, 2013
```
handle fp128 returns.

llvm-svn: 176523
```
5f3ba9e5
[mips] Fix MipsTargetLowering::LowerCall to pass fp128 arguments in floating · 3b7391d1
Akira Hatanaka authored Mar 05, 2013
```
point registers.

llvm-svn: 176521
```
3b7391d1
[mips] Correct handling of fp128 (long double) formals and read long double · 4b634fa3
Akira Hatanaka authored Mar 05, 2013
```
parameters from floating point registers if target is mips64 hard float.

llvm-svn: 176520
```
4b634fa3

Add more functions to the TLI. · b904e6e4

Meador Inge authored Mar 05, 2013



This patch adds many more functions to the target library information.
All of the functions being added were discovered while doing the migration
of the simplify-libcalls attribute annotation functionality to the
functionattrs pass.  As a part of that work the attribute annotation logic
will query TLI to determine if a function should be annotated or not.

Signed-off-by: Meador Inge <meadori@codesourcery.com>
llvm-svn: 176514

b904e6e4

reverting patch 176508. · 457801f7
Jyotsna Verma authored Mar 05, 2013
```
llvm-svn: 176513
```
457801f7
Hexagon: Add support for lowering block address. · 7179e712
Jyotsna Verma authored Mar 05, 2013
```
llvm-svn: 176508
```
7179e712
R600: Do not predicate vector op · fe32bd87
Vincent Lejeune authored Mar 05, 2013
```
llvm-svn: 176507
```
fe32bd87
Hexagon: Expand addc, adde, subc and sube. · 0eeea14e
Jyotsna Verma authored Mar 05, 2013
```
llvm-svn: 176505
```
0eeea14e
Update cmake build. · 5dc83180
Benjamin Kramer authored Mar 05, 2013
```
llvm-svn: 176501
```
5dc83180
Hexagon: Use MO operand flags to mark constant extended instructions. · f1214a8a
Jyotsna Verma authored Mar 05, 2013
```
llvm-svn: 176500
```
f1214a8a
Hexagon: Add encoding bits to the TFR64 instructions. · f4e324f4
Jyotsna Verma authored Mar 05, 2013
```
Set imMoveImm, isAsCheapAsAMove flags for TFRI instructions.

llvm-svn: 176499
```
f4e324f4

R600: initial scheduler code · 68b6b6dd

Vincent Lejeune authored Mar 05, 2013

This is a skeleton for a pre-RA MachineInstr scheduler strategy. Currently
it only tries to expose more parallelism for ALU instructions (this also
makes the distribution of GPR channels more uniform and increases the
chances of ALU instructions to be packed together in a single VLIW group).
Also it tries to reduce clause switching by grouping instruction of the
same kind (ALU/FETCH/CF) together.

Vincent Lejeune:
 - Support for VLIW4 Slot assignement
 - Recomputation of ScheduleDAG to get more parallelism opportunities

Tom Stellard:
 - Fix assertion failure when trying to determine an instruction's slot
   based on its destination register's class
 - Fix some compiler warnings

Vincent Lejeune: [v2]
 - Remove recomputation of ScheduleDAG (will be provided in a later patch)
 - Improve estimation of an ALU clause size so that heuristic does not emit cf
 instructions at the wrong position.
 - Make schedule heuristic smarter using SUnit Depth
 - Take constant read limitations into account

Vincent Lejeune: [v3]
 - Fix some uninitialized values in ConstPair
 - Add asserts to ensure an ALU slot is always populated

llvm-svn: 176498

68b6b6dd

R600: Remove LowerConstCopyPass and lower CONST_COPY right after ISel. · 0b72f102

Vincent Lejeune authored Mar 05, 2013

Maintaining CONST_COPY Instructions until Pre Emit may prevent some ifcvt case
and taking them in account for scheduling is difficult for no real benefit.

llvm-svn: 176488

0b72f102

R600: Turn BUILD_VECTOR into Reg_Sequence · 3b6f20e9
Vincent Lejeune authored Mar 05, 2013
```
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 176487
```
3b6f20e9

R600: CONST_ADDRESS node is not marked as mayLoad anymore · 10a5e477

Vincent Lejeune authored Mar 05, 2013

Reviewed-by: Tom Stellard <thomas.stellard at amd.com>

mayLoad complexify scheduling and does not bring any usefull info
as the location is not writeable at all.

llvm-svn: 176486

10a5e477

R600: Use MUL_IEEE for trig/fdiv intrinsic · a199d01e
Vincent Lejeune authored Mar 05, 2013
```
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 176485
```
a199d01e
R600: Add support for indirect addressing of non default const buffer · 743dca04
Vincent Lejeune authored Mar 05, 2013
```
NOTE: This is a candidate for the Mesa stable branch.
llvm-svn: 176484
```
743dca04

The current X86 NOP padding uses one long NOP followed by the remainder in · 4c8979cd

David Sehr authored Mar 05, 2013

one-byte NOPs.  If the processor actually executes those NOPs, as it sometimes
does with aligned bundling, this can have a performance impact.  From my
micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve
one-byte NOPs is about 20% worse than a 15 followed by a 12.  This patch
changes NOP emission to emit as many 15-byte (the maximum) as possible followed
by at most one shorter NOP.

llvm-svn: 176464

4c8979cd

Mar 04, 2013

[mips] Print move instructions. · c7828356
Akira Hatanaka authored Mar 04, 2013
```
"move $4, $5" is printed instead of "or $4, $5, $zero".

llvm-svn: 176455
```
c7828356

Mips specific inline assembler constraint 'R' · 0e149b04

Jack Carter authored Mar 04, 2013

'R' An address that can be sued in a non-macro load or store.
This patch includes a positive test case.

llvm-svn: 176452

0e149b04

Bypass Slow Divides · 485296d1

Preston Gurd authored Mar 04, 2013

* Only apply divide bypass optimization when not optimizing for size. 
* Fixed bug caused by constant for 0 value of type Int32,
  used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
  64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.

Patch by Tyler Nowicki!

llvm-svn: 176442

485296d1

R600: Clean up datalayout strings so they better match hardware capabilities · b2f2f960
Tom Stellard authored Mar 04, 2013
```
llvm-svn: 176439
```
b2f2f960
Mips ISD typo · 434874db
Jia Liu authored Mar 04, 2013
```
llvm-svn: 176426
```
434874db

Mar 02, 2013

ARM: Creating a vector from a lane of another. · a3c5c769

Jim Grosbach authored Mar 02, 2013

The VDUP instruction source register doesn't allow a non-constant lane
index, so make sure we don't construct a ARM::VDUPLANE node asking it to
do so.

rdar://13328063
http://llvm.org/bugs/show_bug.cgi?id=13963

llvm-svn: 176413

a3c5c769

Clean up code format a bit. · c6f1914e
Jim Grosbach authored Mar 02, 2013
```
llvm-svn: 176412
```
c6f1914e
Tidy up. Trailing whitespace. · 54efea0a
Jim Grosbach authored Mar 02, 2013
```
llvm-svn: 176411
```
54efea0a
ARM NEON: Fix v2f32 float intrinsics · 99cba969
Arnold Schwaighofer authored Mar 02, 2013
```
Mark them as expand, they are not legal as our backend does not match them.

llvm-svn: 176410
```
99cba969

X86 cost model: Adjust cost for custom lowered vector multiplies · 20ef54f4

Arnold Schwaighofer authored Mar 02, 2013

This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403

20ef54f4