Commits · a69d0aaa71dd74e28a2e5990166819e15fb8f31f · Roger Ferrer / llvm-epi-0.8

Mar 05, 2013

Remove unused #includes. · a69d0aaa
Bill Wendling authored Mar 05, 2013
```
llvm-svn: 176467
```
a69d0aaa

The current X86 NOP padding uses one long NOP followed by the remainder in · 4c8979cd

David Sehr authored Mar 05, 2013

one-byte NOPs.  If the processor actually executes those NOPs, as it sometimes
does with aligned bundling, this can have a performance impact.  From my
micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve
one-byte NOPs is about 20% worse than a 15 followed by a 12.  This patch
changes NOP emission to emit as many 15-byte (the maximum) as possible followed
by at most one shorter NOP.

llvm-svn: 176464

4c8979cd

Mar 04, 2013

Check isDiscardableIfUnused, rather than hasLocalLinkage, when bumping · 30be8a30

Lang Hames authored Mar 04, 2013

GlobalValue linkage up to ExternalLinkage in the ExtractGV pass. This
prevents linkonce and linkonce_odr symbols from being DCE'd.

llvm-svn: 176459

30be8a30

[mips] Print move instructions. · c7828356
Akira Hatanaka authored Mar 04, 2013
```
"move $4, $5" is printed instead of "or $4, $5, $zero".

llvm-svn: 176455
```
c7828356

Mips specific inline assembler constraint 'R' · 0e149b04

Jack Carter authored Mar 04, 2013

'R' An address that can be sued in a non-macro load or store.
This patch includes a positive test case.

llvm-svn: 176452

0e149b04

Reapply r176381, writing the CHECKs in a more forgiving manner to account for · 4e1db8d7
Eli Bendersky authored Mar 04, 2013
```
running llvm-objdump on Darwin.

llvm-svn: 176443
```
4e1db8d7

Bypass Slow Divides · 485296d1

Preston Gurd authored Mar 04, 2013

* Only apply divide bypass optimization when not optimizing for size. 
* Fixed bug caused by constant for 0 value of type Int32,
  used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
  64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.

Patch by Tyler Nowicki!

llvm-svn: 176442

485296d1

R600: Clean up datalayout strings so they better match hardware capabilities · b2f2f960
Tom Stellard authored Mar 04, 2013
```
llvm-svn: 176439
```
b2f2f960
Mips ISD typo · 434874db
Jia Liu authored Mar 04, 2013
```
llvm-svn: 176426
```
434874db

Mar 02, 2013

ARM: Creating a vector from a lane of another. · a3c5c769

Jim Grosbach authored Mar 02, 2013

The VDUP instruction source register doesn't allow a non-constant lane
index, so make sure we don't construct a ARM::VDUPLANE node asking it to
do so.

rdar://13328063
http://llvm.org/bugs/show_bug.cgi?id=13963

llvm-svn: 176413

a3c5c769

Clean up code format a bit. · c6f1914e
Jim Grosbach authored Mar 02, 2013
```
llvm-svn: 176412
```
c6f1914e
Tidy up. Trailing whitespace. · 54efea0a
Jim Grosbach authored Mar 02, 2013
```
llvm-svn: 176411
```
54efea0a
ARM NEON: Fix v2f32 float intrinsics · 99cba969
Arnold Schwaighofer authored Mar 02, 2013
```
Mark them as expand, they are not legal as our backend does not match them.

llvm-svn: 176410
```
99cba969

recommit r172363 & r171325 (reverted in r172756) · 589443bd

Nuno Lopes authored Mar 02, 2013

This adds minimalistic support for PHI nodes to llvm.objectsize() evaluation

fingers crossed so that it does break clang boostrap again..

llvm-svn: 176408

589443bd

add getUnderlyingObjectSize() · 6e3d4601

Nuno Lopes authored Mar 02, 2013

this is similar to getObjectSize(), but doesnt subtract the offset
tweak the BasicAA code accordingly (per PR14988)

llvm-svn: 176407

6e3d4601

X86 cost model: Adjust cost for custom lowered vector multiplies · 20ef54f4

Arnold Schwaighofer authored Mar 02, 2013

This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403

20ef54f4

Added FIXME for future Hexagon cleanup. · 63474629
Andrew Trick authored Mar 02, 2013
```
llvm-svn: 176400
```
63474629

PR14448 - prevent the loop vectorizer from vectorizing the same loop twice. · 739e37a0

Nadav Rotem authored Mar 02, 2013

The LoopVectorizer often runs multiple times on the same function due to inlining.
When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches.
With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time.

PR14448.

llvm-svn: 176399

739e37a0

Modify {Call,Invoke}Inst::addAttribute to take an AttrKind. · 1b97a9c8
Peter Collingbourne authored Mar 02, 2013
```
llvm-svn: 176397
```
1b97a9c8
CMake: Always include the CheckCXXCompilerFlag in HandleLLVMOptions.cmake. · 643aa0e0
Jordan Rose authored Mar 02, 2013
```
Previously we relied on it being included by config-ix.cmake.

llvm-svn: 176396
```
643aa0e0

Revert "Rewrite a test to count emitted instructions without using -stats" · ee45c03f

Michael Gottesman authored Mar 02, 2013

This reverts commit aac7922b8fe7ae733d3fe6697e6789fd730315dc. I am reverting the
commit since it broke the phase 1 public buildbot for a few hours.

http://lab.llvm.org:8013/builders/clang-x86_64-darwin11-nobootstrap-RA/builds/2137

llvm-svn: 176394

ee45c03f

Remove duplicate line and move another closer to its actual use · b1caf3c3
Eli Bendersky authored Mar 01, 2013
```
llvm-svn: 176391
```
b1caf3c3

MIsched machine model: tablegen subtarget emitter improvement. · 3821d9d0

Andrew Trick authored Mar 01, 2013

Fix the way resources are counted. I'm taking some time to cleanup the
way MachineScheduler handles in-order machine resources. Eventually
we'll need more PPC/Atom test cases in tree.

llvm-svn: 176390

3821d9d0

Mar 01, 2013

In llvm::MemoryBuffer::getFile() remove an unnecessary stat call check. · db4443f7

Argyrios Kyrtzidis authored Mar 01, 2013

The sys::fs::is_directory() check is unnecessary because, if the filename is
a directory, the function will fail anyway with the same error code returned.
Remove the check to avoid an unnecessary stat call.

Someone needs to review on windows and see if the check is necessary there or not.

llvm-svn: 176386

db4443f7

Fix my email address in CREDITS.TXT. · 1ed16946
Stefanus Du Toit authored Mar 01, 2013
```
Checking to see if svn notifications also use correct address now.

llvm-svn: 176385
```
1ed16946

[mips] Fix inefficient code generation. · ece459bb

Akira Hatanaka authored Mar 01, 2013

This patch eliminates the need to emit a constant move instruction when this
pattern is matched:

(select (setgt a, Constant), T, F)

The pattern above effectively turns into this:

(conditional-move (setlt a, Constant + 1), F, T)

llvm-svn: 176384

ece459bb

Removed extraneous #include "LLVMContextImpl.h" from lib/IR/Module.cpp · 3cec0108
Jean-Luc Duprat authored Mar 01, 2013
```
llvm-svn: 176382
```
3cec0108

Rewrite a test to count emitted instructions without using -stats · 0091e2ff

Eli Bendersky authored Mar 01, 2013

Also removed the comments of "should produce..." because they completely
don't match the actually produced output.

llvm-svn: 176381

0091e2ff

Fix indentation. · a4c03415
Akira Hatanaka authored Mar 01, 2013
```
llvm-svn: 176380
```
a4c03415
Set properties for f128 type. · 3d055580
Akira Hatanaka authored Mar 01, 2013
```
llvm-svn: 176378
```
3d055580

Rewrite a test to check actual output rather than intermediate implementation · 10ab5e72

Eli Bendersky authored Mar 01, 2013

detail.

The was this test was written, it was relying on an implementation detail
(fixups) and hence was very brittle (relying, among other things, on the
exact ordering of statistics printed by MC).

The test was rewritten to check a more observable output difference. While it
doesn't cover 100% of the things the original test covered, it's a good
practice to write regression tests this way. If we want to check that
internal details and invariants hold, such tests should be expressed as unit
tests.

llvm-svn: 176377

10ab5e72

No need to force-create clang-tools-extra lit.site.cfg · 510c3415

Edwin Vane authored Mar 01, 2013

The make (all) target takes care of creating lit configs and auto-generating
tests. The problem with the original 'lit.site.cfg' target is it's not
recursive and doesn't fully create everything necessary for testing
clang-tools-extra.

llvm-svn: 176374

510c3415

Add regression tests (WORKSFORME) · d10584e3

Michael Liao authored Mar 01, 2013

- These tests wont't crash on trunk but would be better to add them so that
  they don't break again in the future.

llvm-svn: 176369

d10584e3

Generate an error message instead of asserting or segfaulting when we can't · b3864609
Chad Rosier authored Mar 01, 2013
```
handle indirect register inputs.
rdar://13322011

llvm-svn: 176367
```
b3864609
LoopVectorize: Don't hang forever if a PHI only has skipped PHI uses. · 12f98fae
Benjamin Kramer authored Mar 01, 2013
```
Fixes PR15384.

llvm-svn: 176366
```
12f98fae

Cache the result of Function::getIntrinsicID() in a DenseMap attached to the LLVMContext. · 516d7039

Michael Ilseman authored Mar 01, 2013

This reduces the time actually spent doing string to ID conversion and shows a 10% improvement in compile time for a particularly bad case that involves ARM Neon intrinsics (these have many overloads).

Patch by Jean-Luc Duprat!

llvm-svn: 176365

516d7039

Fix PR10475 · 6af16fc3

Michael Liao authored Mar 01, 2013

- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
  but TLI.getShiftAmountTy() so far only return scalar type. As a
  result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
  TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
  return target-specificed scalar type or the same vector type as the
  1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
  type.

llvm-svn: 176364

6af16fc3

Add support for using non-pic code for arm and thumb1 when emitting the sjlj · 9660343b

Chad Rosier authored Mar 01, 2013

dispatch code.  As far as I can tell the thumb2 code is behaving as expected.
I was able to compile and run the associated test case for both arm and thumb1.
rdar://13066352

llvm-svn: 176363

9660343b

R600/SI: fix sampler tests after fixing wait insertions · 3c547703

Christian Konig authored Mar 01, 2013



Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176359

3c547703

Hexagon: Add constant extender support framework. · 84256437
Jyotsna Verma authored Mar 01, 2013
```
llvm-svn: 176358
```
84256437