Commits · b904e6e467f7571d9eef7a95878572bb6628f9fc · Roger Ferrer / llvm-epi-0.8

Mar 05, 2013

Add more functions to the TLI. · b904e6e4

Meador Inge authored Mar 05, 2013



This patch adds many more functions to the target library information.
All of the functions being added were discovered while doing the migration
of the simplify-libcalls attribute annotation functionality to the
functionattrs pass.  As a part of that work the attribute annotation logic
will query TLI to determine if a function should be annotated or not.

Signed-off-by: Meador Inge <meadori@codesourcery.com>
llvm-svn: 176514

b904e6e4

reverting patch 176508. · 457801f7
Jyotsna Verma authored Mar 05, 2013
```
llvm-svn: 176513
```
457801f7
Hexagon: Add support for lowering block address. · 7179e712
Jyotsna Verma authored Mar 05, 2013
```
llvm-svn: 176508
```
7179e712
R600: Do not predicate vector op · fe32bd87
Vincent Lejeune authored Mar 05, 2013
```
llvm-svn: 176507
```
fe32bd87
Hexagon: Expand addc, adde, subc and sube. · 0eeea14e
Jyotsna Verma authored Mar 05, 2013
```
llvm-svn: 176505
```
0eeea14e
Update cmake build. · 5dc83180
Benjamin Kramer authored Mar 05, 2013
```
llvm-svn: 176501
```
5dc83180
Hexagon: Use MO operand flags to mark constant extended instructions. · f1214a8a
Jyotsna Verma authored Mar 05, 2013
```
llvm-svn: 176500
```
f1214a8a
Hexagon: Add encoding bits to the TFR64 instructions. · f4e324f4
Jyotsna Verma authored Mar 05, 2013
```
Set imMoveImm, isAsCheapAsAMove flags for TFRI instructions.

llvm-svn: 176499
```
f4e324f4

R600: initial scheduler code · 68b6b6dd

Vincent Lejeune authored Mar 05, 2013

This is a skeleton for a pre-RA MachineInstr scheduler strategy. Currently
it only tries to expose more parallelism for ALU instructions (this also
makes the distribution of GPR channels more uniform and increases the
chances of ALU instructions to be packed together in a single VLIW group).
Also it tries to reduce clause switching by grouping instruction of the
same kind (ALU/FETCH/CF) together.

Vincent Lejeune:
 - Support for VLIW4 Slot assignement
 - Recomputation of ScheduleDAG to get more parallelism opportunities

Tom Stellard:
 - Fix assertion failure when trying to determine an instruction's slot
   based on its destination register's class
 - Fix some compiler warnings

Vincent Lejeune: [v2]
 - Remove recomputation of ScheduleDAG (will be provided in a later patch)
 - Improve estimation of an ALU clause size so that heuristic does not emit cf
 instructions at the wrong position.
 - Make schedule heuristic smarter using SUnit Depth
 - Take constant read limitations into account

Vincent Lejeune: [v3]
 - Fix some uninitialized values in ConstPair
 - Add asserts to ensure an ALU slot is always populated

llvm-svn: 176498

68b6b6dd

R600: Remove LowerConstCopyPass and lower CONST_COPY right after ISel. · 0b72f102

Vincent Lejeune authored Mar 05, 2013

Maintaining CONST_COPY Instructions until Pre Emit may prevent some ifcvt case
and taking them in account for scheduling is difficult for no real benefit.

llvm-svn: 176488

0b72f102

R600: Turn BUILD_VECTOR into Reg_Sequence · 3b6f20e9
Vincent Lejeune authored Mar 05, 2013
```
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 176487
```
3b6f20e9

R600: CONST_ADDRESS node is not marked as mayLoad anymore · 10a5e477

Vincent Lejeune authored Mar 05, 2013

Reviewed-by: Tom Stellard <thomas.stellard at amd.com>

mayLoad complexify scheduling and does not bring any usefull info
as the location is not writeable at all.

llvm-svn: 176486

10a5e477

R600: Use MUL_IEEE for trig/fdiv intrinsic · a199d01e
Vincent Lejeune authored Mar 05, 2013
```
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 176485
```
a199d01e
R600: Add support for indirect addressing of non default const buffer · 743dca04
Vincent Lejeune authored Mar 05, 2013
```
NOTE: This is a candidate for the Mesa stable branch.
llvm-svn: 176484
```
743dca04

The current X86 NOP padding uses one long NOP followed by the remainder in · 4c8979cd

David Sehr authored Mar 05, 2013

one-byte NOPs.  If the processor actually executes those NOPs, as it sometimes
does with aligned bundling, this can have a performance impact.  From my
micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve
one-byte NOPs is about 20% worse than a 15 followed by a 12.  This patch
changes NOP emission to emit as many 15-byte (the maximum) as possible followed
by at most one shorter NOP.

llvm-svn: 176464

4c8979cd

Mar 04, 2013

[mips] Print move instructions. · c7828356
Akira Hatanaka authored Mar 04, 2013
```
"move $4, $5" is printed instead of "or $4, $5, $zero".

llvm-svn: 176455
```
c7828356

Mips specific inline assembler constraint 'R' · 0e149b04

Jack Carter authored Mar 04, 2013

'R' An address that can be sued in a non-macro load or store.
This patch includes a positive test case.

llvm-svn: 176452

0e149b04

Bypass Slow Divides · 485296d1

Preston Gurd authored Mar 04, 2013

* Only apply divide bypass optimization when not optimizing for size. 
* Fixed bug caused by constant for 0 value of type Int32,
  used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
  64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.

Patch by Tyler Nowicki!

llvm-svn: 176442

485296d1

R600: Clean up datalayout strings so they better match hardware capabilities · b2f2f960
Tom Stellard authored Mar 04, 2013
```
llvm-svn: 176439
```
b2f2f960
Mips ISD typo · 434874db
Jia Liu authored Mar 04, 2013
```
llvm-svn: 176426
```
434874db

Mar 02, 2013

ARM: Creating a vector from a lane of another. · a3c5c769

Jim Grosbach authored Mar 02, 2013

The VDUP instruction source register doesn't allow a non-constant lane
index, so make sure we don't construct a ARM::VDUPLANE node asking it to
do so.

rdar://13328063
http://llvm.org/bugs/show_bug.cgi?id=13963

llvm-svn: 176413

a3c5c769

Clean up code format a bit. · c6f1914e
Jim Grosbach authored Mar 02, 2013
```
llvm-svn: 176412
```
c6f1914e
Tidy up. Trailing whitespace. · 54efea0a
Jim Grosbach authored Mar 02, 2013
```
llvm-svn: 176411
```
54efea0a
ARM NEON: Fix v2f32 float intrinsics · 99cba969
Arnold Schwaighofer authored Mar 02, 2013
```
Mark them as expand, they are not legal as our backend does not match them.

llvm-svn: 176410
```
99cba969

X86 cost model: Adjust cost for custom lowered vector multiplies · 20ef54f4

Arnold Schwaighofer authored Mar 02, 2013

This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403

20ef54f4

Added FIXME for future Hexagon cleanup. · 63474629
Andrew Trick authored Mar 02, 2013
```
llvm-svn: 176400
```
63474629

Mar 01, 2013

[mips] Fix inefficient code generation. · ece459bb

Akira Hatanaka authored Mar 01, 2013

This patch eliminates the need to emit a constant move instruction when this
pattern is matched:

(select (setgt a, Constant), T, F)

The pattern above effectively turns into this:

(conditional-move (setlt a, Constant + 1), F, T)

llvm-svn: 176384

ece459bb

Fix indentation. · a4c03415
Akira Hatanaka authored Mar 01, 2013
```
llvm-svn: 176380
```
a4c03415

Fix PR10475 · 6af16fc3

Michael Liao authored Mar 01, 2013

- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
  but TLI.getShiftAmountTy() so far only return scalar type. As a
  result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
  TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
  return target-specificed scalar type or the same vector type as the
  1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
  type.

llvm-svn: 176364

6af16fc3

Add support for using non-pic code for arm and thumb1 when emitting the sjlj · 9660343b

Chad Rosier authored Mar 01, 2013

dispatch code.  As far as I can tell the thumb2 code is behaving as expected.
I was able to compile and run the associated test case for both arm and thumb1.
rdar://13066352

llvm-svn: 176363

9660343b

Hexagon: Add constant extender support framework. · 84256437
Jyotsna Verma authored Mar 01, 2013
```
llvm-svn: 176358
```
84256437

R600/SI: handle all registers in copyPhysReg v2 · d0e3da18

Christian Konig authored Mar 01, 2013



v2: based on Michels patch, but now allows copying of all registers sizes.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176346

d0e3da18

R600/SI: remove S_MOV immediate patterns · 1f344cda

Christian Konig authored Mar 01, 2013



They won't match anyway.

Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176345

1f344cda

R600/SI: remove GPR*AlignEncode · 84652964

Christian Konig authored Mar 01, 2013



It's much easier to specify the encoding with tablegen directly.

Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176344

84652964

R600/SI: fix warning about overloaded virtual · 01fd1f6b
Christian Konig authored Mar 01, 2013
```
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176343
```
01fd1f6b
R600/SI: fix inserting waits for unordered defines · 862fd9fa
Christian Konig authored Mar 01, 2013
```
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176342
```
862fd9fa
GCC thinks that this variable might be used uninitialized (it isn't). · 2cb41d37
Duncan Sands authored Mar 01, 2013
```
llvm-svn: 176341
```
2cb41d37
[mips] Remove unused option. Fix 80-column violations. · e9e588dd
Akira Hatanaka authored Mar 01, 2013
```
llvm-svn: 176330
```
e9e588dd
[mips] Add the capability to search delay slot filling instructions in · 8f7bfb39
Akira Hatanaka authored Mar 01, 2013
```
successor basic blocks.

Currently this is off by default.

llvm-svn: 176329
```
8f7bfb39
[mips] Do not add SecondLastInst to list BranchInstrs if there is only one · 28dc83ce
Akira Hatanaka authored Mar 01, 2013
```
terminator.

No functionality change.

llvm-svn: 176326
```
28dc83ce