Commits · 41bb7131b39adc7d6d8de60c13776401aff5a9a9 · Roger Ferrer / llvm-epi-0.8

Mar 06, 2013

InstCombine: Don't shrink allocas when combining with a bitcast. · 95d2eb95

Jim Grosbach authored Mar 06, 2013

When considering folding a bitcast of an alloca into the alloca itself,
make sure we don't shrink the amount of memory being allocated, or
things rapidly go sideways.

rdar://13324424

llvm-svn: 176547

95d2eb95

Fix PR15355 · da22b30b

Michael Liao authored Mar 06, 2013

- Clear 'mayStore' flag when loading from the atomic variable before the
  spin loop
- Clear kill flag from one use to multiple use in registers forming the
  address to that atomic variable
- don't use a physical register as live-in register in BB (neither entry
  nor landing pad.) by copying it into virtual register

(patch by Cameron Zwarich)

llvm-svn: 176538

da22b30b

Use dyn_cast instead of isa && cast. No functionality change. · b7129f21
Jakub Staszak authored Mar 06, 2013
```
llvm-svn: 176537
```
b7129f21

[mips] Remove android calling convention. · 1454ed8a

Akira Hatanaka authored Mar 05, 2013

This calling convention was added just to handle functions which return vector
of floats. The fix committed in r165585 solves the problem.

llvm-svn: 176530

1454ed8a

Mar 05, 2013

[mips] Fix MipsCC::analyzeReturn so that, in soft-float mode, fp128 gets · e092f729
Akira Hatanaka authored Mar 05, 2013
```
returned in registers $2 and $4.

llvm-svn: 176527
```
e092f729
[mips] Fix MipsTargetLowering::LowerCallResult and LowerReturn to correctly · 5f3ba9e5
Akira Hatanaka authored Mar 05, 2013
```
handle fp128 returns.

llvm-svn: 176523
```
5f3ba9e5
[mips] Fix MipsTargetLowering::LowerCall to pass fp128 arguments in floating · 3b7391d1
Akira Hatanaka authored Mar 05, 2013
```
point registers.

llvm-svn: 176521
```
3b7391d1
[mips] Correct handling of fp128 (long double) formals and read long double · 4b634fa3
Akira Hatanaka authored Mar 05, 2013
```
parameters from floating point registers if target is mips64 hard float.

llvm-svn: 176520
```
4b634fa3

Add more functions to the TLI. · b904e6e4

Meador Inge authored Mar 05, 2013



This patch adds many more functions to the target library information.
All of the functions being added were discovered while doing the migration
of the simplify-libcalls attribute annotation functionality to the
functionattrs pass.  As a part of that work the attribute annotation logic
will query TLI to determine if a function should be annotated or not.

Signed-off-by: Meador Inge <meadori@codesourcery.com>
llvm-svn: 176514

b904e6e4

reverting patch 176508. · 457801f7
Jyotsna Verma authored Mar 05, 2013
```
llvm-svn: 176513
```
457801f7
Hexagon: Add support for lowering block address. · 7179e712
Jyotsna Verma authored Mar 05, 2013
```
llvm-svn: 176508
```
7179e712
R600: Do not predicate vector op · fe32bd87
Vincent Lejeune authored Mar 05, 2013
```
llvm-svn: 176507
```
fe32bd87
Hexagon: Expand addc, adde, subc and sube. · 0eeea14e
Jyotsna Verma authored Mar 05, 2013
```
llvm-svn: 176505
```
0eeea14e
Update cmake build. · 5dc83180
Benjamin Kramer authored Mar 05, 2013
```
llvm-svn: 176501
```
5dc83180
Hexagon: Use MO operand flags to mark constant extended instructions. · f1214a8a
Jyotsna Verma authored Mar 05, 2013
```
llvm-svn: 176500
```
f1214a8a
Hexagon: Add encoding bits to the TFR64 instructions. · f4e324f4
Jyotsna Verma authored Mar 05, 2013
```
Set imMoveImm, isAsCheapAsAMove flags for TFRI instructions.

llvm-svn: 176499
```
f4e324f4

R600: initial scheduler code · 68b6b6dd

Vincent Lejeune authored Mar 05, 2013

This is a skeleton for a pre-RA MachineInstr scheduler strategy. Currently
it only tries to expose more parallelism for ALU instructions (this also
makes the distribution of GPR channels more uniform and increases the
chances of ALU instructions to be packed together in a single VLIW group).
Also it tries to reduce clause switching by grouping instruction of the
same kind (ALU/FETCH/CF) together.

Vincent Lejeune:
 - Support for VLIW4 Slot assignement
 - Recomputation of ScheduleDAG to get more parallelism opportunities

Tom Stellard:
 - Fix assertion failure when trying to determine an instruction's slot
   based on its destination register's class
 - Fix some compiler warnings

Vincent Lejeune: [v2]
 - Remove recomputation of ScheduleDAG (will be provided in a later patch)
 - Improve estimation of an ALU clause size so that heuristic does not emit cf
 instructions at the wrong position.
 - Make schedule heuristic smarter using SUnit Depth
 - Take constant read limitations into account

Vincent Lejeune: [v3]
 - Fix some uninitialized values in ConstPair
 - Add asserts to ensure an ALU slot is always populated

llvm-svn: 176498

68b6b6dd

R600: Remove LowerConstCopyPass and lower CONST_COPY right after ISel. · 0b72f102

Vincent Lejeune authored Mar 05, 2013

Maintaining CONST_COPY Instructions until Pre Emit may prevent some ifcvt case
and taking them in account for scheduling is difficult for no real benefit.

llvm-svn: 176488

0b72f102

R600: Turn BUILD_VECTOR into Reg_Sequence · 3b6f20e9
Vincent Lejeune authored Mar 05, 2013
```
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 176487
```
3b6f20e9

R600: CONST_ADDRESS node is not marked as mayLoad anymore · 10a5e477

Vincent Lejeune authored Mar 05, 2013

Reviewed-by: Tom Stellard <thomas.stellard at amd.com>

mayLoad complexify scheduling and does not bring any usefull info
as the location is not writeable at all.

llvm-svn: 176486

10a5e477

R600: Use MUL_IEEE for trig/fdiv intrinsic · a199d01e
Vincent Lejeune authored Mar 05, 2013
```
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 176485
```
a199d01e
R600: Add support for indirect addressing of non default const buffer · 743dca04
Vincent Lejeune authored Mar 05, 2013
```
NOTE: This is a candidate for the Mesa stable branch.
llvm-svn: 176484
```
743dca04
Remove unused #includes. · a69d0aaa
Bill Wendling authored Mar 05, 2013
```
llvm-svn: 176467
```
a69d0aaa

The current X86 NOP padding uses one long NOP followed by the remainder in · 4c8979cd

David Sehr authored Mar 05, 2013

one-byte NOPs.  If the processor actually executes those NOPs, as it sometimes
does with aligned bundling, this can have a performance impact.  From my
micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve
one-byte NOPs is about 20% worse than a 15 followed by a 12.  This patch
changes NOP emission to emit as many 15-byte (the maximum) as possible followed
by at most one shorter NOP.

llvm-svn: 176464

4c8979cd

Mar 04, 2013

Check isDiscardableIfUnused, rather than hasLocalLinkage, when bumping · 30be8a30

Lang Hames authored Mar 04, 2013

GlobalValue linkage up to ExternalLinkage in the ExtractGV pass. This
prevents linkonce and linkonce_odr symbols from being DCE'd.

llvm-svn: 176459

30be8a30

[mips] Print move instructions. · c7828356
Akira Hatanaka authored Mar 04, 2013
```
"move $4, $5" is printed instead of "or $4, $5, $zero".

llvm-svn: 176455
```
c7828356

Mips specific inline assembler constraint 'R' · 0e149b04

Jack Carter authored Mar 04, 2013

'R' An address that can be sued in a non-macro load or store.
This patch includes a positive test case.

llvm-svn: 176452

0e149b04

Bypass Slow Divides · 485296d1

Preston Gurd authored Mar 04, 2013

* Only apply divide bypass optimization when not optimizing for size. 
* Fixed bug caused by constant for 0 value of type Int32,
  used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
  64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.

Patch by Tyler Nowicki!

llvm-svn: 176442

485296d1

R600: Clean up datalayout strings so they better match hardware capabilities · b2f2f960
Tom Stellard authored Mar 04, 2013
```
llvm-svn: 176439
```
b2f2f960
Mips ISD typo · 434874db
Jia Liu authored Mar 04, 2013
```
llvm-svn: 176426
```
434874db

Mar 02, 2013

ARM: Creating a vector from a lane of another. · a3c5c769

Jim Grosbach authored Mar 02, 2013

The VDUP instruction source register doesn't allow a non-constant lane
index, so make sure we don't construct a ARM::VDUPLANE node asking it to
do so.

rdar://13328063
http://llvm.org/bugs/show_bug.cgi?id=13963

llvm-svn: 176413

a3c5c769

Clean up code format a bit. · c6f1914e
Jim Grosbach authored Mar 02, 2013
```
llvm-svn: 176412
```
c6f1914e
Tidy up. Trailing whitespace. · 54efea0a
Jim Grosbach authored Mar 02, 2013
```
llvm-svn: 176411
```
54efea0a
ARM NEON: Fix v2f32 float intrinsics · 99cba969
Arnold Schwaighofer authored Mar 02, 2013
```
Mark them as expand, they are not legal as our backend does not match them.

llvm-svn: 176410
```
99cba969

recommit r172363 & r171325 (reverted in r172756) · 589443bd

Nuno Lopes authored Mar 02, 2013

This adds minimalistic support for PHI nodes to llvm.objectsize() evaluation

fingers crossed so that it does break clang boostrap again..

llvm-svn: 176408

589443bd

add getUnderlyingObjectSize() · 6e3d4601

Nuno Lopes authored Mar 02, 2013

this is similar to getObjectSize(), but doesnt subtract the offset
tweak the BasicAA code accordingly (per PR14988)

llvm-svn: 176407

6e3d4601

X86 cost model: Adjust cost for custom lowered vector multiplies · 20ef54f4

Arnold Schwaighofer authored Mar 02, 2013

This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403

20ef54f4

Added FIXME for future Hexagon cleanup. · 63474629
Andrew Trick authored Mar 02, 2013
```
llvm-svn: 176400
```
63474629

PR14448 - prevent the loop vectorizer from vectorizing the same loop twice. · 739e37a0

Nadav Rotem authored Mar 02, 2013

The LoopVectorizer often runs multiple times on the same function due to inlining.
When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches.
With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time.

PR14448.

llvm-svn: 176399

739e37a0

Modify {Call,Invoke}Inst::addAttribute to take an AttrKind. · 1b97a9c8
Peter Collingbourne authored Mar 02, 2013
```
llvm-svn: 176397
```
1b97a9c8