Commits · f5cca68c2c625e545ed5e47238fc176328517a1c · Roger Ferrer / llvm-epi-0.8

Dec 31, 2012
- Fix LICM's memory promotion optimization to preserve TBAA tags when · f5cca68c
  Chris Lattner authored Dec 31, 2012
```
promoting a store in a loop.  This was noticed when working on PR14753,
but isn't directly related.

llvm-svn: 171281
```
  f5cca68c
- teach instcombine to preserve TBAA tag when merging two stores, part of · eeefe1bc
  Chris Lattner authored Dec 31, 2012
```
PR14753

llvm-svn: 171279
```
  eeefe1bc
- Transform (A == C1 || A == C2) into (A & ~(C1 ^ C2)) == C1 · ea2b9b9d
  Jakub Staszak authored Dec 31, 2012
```
if C1 and C2 differ only with one bit.
Fixes PR14708.

llvm-svn: 171270
```
  ea2b9b9d
Dec 30, 2012
- LoopVectorizer: Fix a bug in the code that updates the loop exiting block. · 0b37f143
  Nadav Rotem authored Dec 30, 2012
```
LCSSA PHIs may have undef values. The vectorizer updates values that are used by outside users such as PHIs.
The bug happened because undefs are not loop values. This patch handles these PHIs.

PR14725

llvm-svn: 171251
```
  0b37f143
- Tests: rewrite 'opt ... %s' to 'opt ... < %s' so that opt does not emit a ModuleID · 56bf2e18
  Dmitri Gribenko authored Dec 30, 2012
```
This is done to avoid odd test failures, like the one fixed in r171243.

llvm-svn: 171250
```
  56bf2e18
- Tests: rewrite 'opt ... %s' to 'opt ... < %s' so that opt does not emit a ModuleID · b137c9e5
  Dmitri Gribenko authored Dec 30, 2012
```
This is done to avoid odd test failures, like the one fixed in r171243.

llvm-svn: 171246
```
  b137c9e5
- llvm/test/Transforms/GVN/null-aliases-nothing.ll: Fix a RUN line not to emit ModuleID. · 5a495a5c
  NAKAMURA Takumi authored Dec 30, 2012
```
Larry Evans reported it fails if source tree contains "load", like "download".

llvm-svn: 171243
```
  5a495a5c
Dec 28, 2012

Fix a stunning oversight in the inline cost analysis. It was never · 86ed5308

Chandler Carruth authored Dec 28, 2012

propagating one of the values it simplified to a constant across
a myriad of instructions. Notably, ptrtoint instructions when we had
a constant pointer (say, 0) didn't propagate that, blocking a massive
number of down-stream optimizations.

This was uncovered when investigating why we fail to inline and delete
the boilerplate in:

  void f() {
    std::vector<int> v;
    v.push_back(1);
  }

It turns out most of the efforts I've made thus far to improve the
analysis weren't making it far purely because of this. After this is
fixed, the store-to-load forwarding patch enables LLVM to optimize the
above to an empty function. We still can't nuke a second push_back, but
for different reasons.

There is a very real chance this will cause somewhat noticable changes
in inlining behavior, so please let me know if you see regressions (or
improvements!) because of this patch.

llvm-svn: 171196

86ed5308

Teach the inline cost analysis about calls that can be simplified and · 753e21d0

Chandler Carruth authored Dec 28, 2012

how to propagate constants through insert and extract value
instructions.

With the recent improvements to instsimplify, this allows inline cost
analysis to constant fold through intrinsic functions, including notably
the with.overflow intrinsic math routines which often show up inside of
STL abstractions. This is yet another piece in the puzzle of breaking
down the code for:

  void f() {
    std::vector<int> v;
    v.push_back(1);
  }

But it still isn't enough. There are a pile of bugs in inline cost still
blocking this.

llvm-svn: 171195

753e21d0

Teach instsimplify to use the constant folder where appropriate for · f6182155

Chandler Carruth authored Dec 28, 2012

constant folding calls. Add the initial tests for this which show that
now instsimplify can simplify blindingly obvious code patterns expressed
with both intrinsics and library calls.

llvm-svn: 171194

f6182155

Dec 27, 2012

If all of the write objects are identified then we can vectorize the loop even... · 5350cd31

Nadav Rotem authored Dec 26, 2012

If all of the write objects are identified then we can vectorize the loop even if the read objects are unidentified.

PR14719.

llvm-svn: 171124

5350cd31

Dec 26, 2012

LoopVectorizer: Optimize the vectorization of consecutive memory access when... · 3f7c4f36
Nadav Rotem authored Dec 26, 2012
```
LoopVectorizer: Optimize the vectorization of consecutive memory access when the iteration step is -1

llvm-svn: 171114
```
3f7c4f36

BBVectorize: Use VTTI to compute costs for intrinsics vectorization · 30e95a8e

Hal Finkel authored Dec 26, 2012

For the time being this includes only some dummy test cases. Once the
generic implementation of the intrinsics cost function does something other
than assuming scalarization in all cases, or some target specializes the
interface, some real test cases can be added.

Also, for consistency, I changed the type of IID from unsigned to Intrinsic::ID
in a few other places.

llvm-svn: 171079

30e95a8e

LoopVectorize: Enable vectorization of the fmuladd intrinsic · b44f8901
Hal Finkel authored Dec 25, 2012
```
llvm-svn: 171076
```
b44f8901

Dec 25, 2012
- BBVectorize: Enable vectorization of the fmuladd intrinsic · 2a456112
  Hal Finkel authored Dec 25, 2012
```
llvm-svn: 171075
```
  2a456112
Dec 24, 2012
- Fix typo "Makre" -> "Make". · fb432580
  Nick Lewycky authored Dec 24, 2012
```
llvm-svn: 171043
```
  fb432580
- LoopVectorizer: When checking for vectorizable types, also check · 5f7c12cf
  Nadav Rotem authored Dec 24, 2012
```
the StoreInst operands.

PR14705.

llvm-svn: 171023
```
  5f7c12cf
- LoopVectorizer: Fix an endless loop in the code that looks for reductions. · bd5d1d83
  Nadav Rotem authored Dec 24, 2012
```
The bug was in the code that detects PHIs in if-then-else block sequence.

PR14701.

llvm-svn: 171008
```
  bd5d1d83
Dec 23, 2012
- CostModel: Change the default target-independent implementation for finding · cf9999d9
  Nadav Rotem authored Dec 23, 2012
```
the cost of arithmetic functions. We now assume that the cost of arithmetic
operations that are marked as Legal or Promote is low, but ops that are
marked as custom are higher.

llvm-svn: 171002
```
  cf9999d9
- Loop Vectorizer: Update the cost model of scatter/gather operations and make · 2cade680
  Nadav Rotem authored Dec 23, 2012
```
them more expensive.

llvm-svn: 170995
```
  2cade680
Dec 21, 2012

Fix a bug in the code that checks if we can vectorize loops while using dynamic · e7785686

Nadav Rotem authored Dec 21, 2012

memory bound checks.  Before the fix we were able to vectorize this loop from
the Livermore Loops benchmark:

for ( k=1 ; k<n ; k++ )
  x[k] = x[k-1] + y[k];

llvm-svn: 170811

e7785686

Dec 20, 2012

LoopVectorize: Fix a bug in the scalarization of instructions. · 2ababf68

Nadav Rotem authored Dec 20, 2012

Before if-conversion we could check if a value is loop invariant
if it was declared inside the basic block. Now that loops have
multiple blocks this check is incorrect.

This fixes External/SPEC/CINT95/099_go/099_go

llvm-svn: 170756

2ababf68

Add a new attribute, 'noduplicate'. If a function contains a noduplicate call,... · 4f6fb953

James Molloy authored Dec 20, 2012

Add a new attribute, 'noduplicate'. If a function contains a noduplicate call, the call cannot be duplicated - Jump threading, loop unrolling, loop unswitching, and loop rotation are inhibited if they would duplicate the call.

Similarly inlining of the function is inhibited, if that would duplicate the call (in particular inlining is still allowed when there is only one callsite and the function has internal linkage).

llvm-svn: 170704

4f6fb953

Dec 19, 2012

Transform (x&C)>V into (x&C)!=0 where possible · 5917f4c7

Paul Redmond authored Dec 19, 2012

When the least bit of C is greater than V, (x&C) must be greater than V
if it is not zero, so the comparison can be simplified.

Although this was suggested in Target/X86/README.txt, it benefits any
architecture with a directly testable form of AND.

Patch by Kevin Schoedel

llvm-svn: 170576

5917f4c7

Make TargetLowering::getTypeConversion more resilient against odd illegal MVTs. · ae0bb610

Benjamin Kramer authored Dec 19, 2012

- An MVT can become an EVT when being split (e.g. v2i8 -> v1i8, the latter doesn't exist)
- Return the scalar value when an MVT is scalarized (v1i64 -> i64)

Fixes PR14639ff.

llvm-svn: 170546

ae0bb610

rdar://12801297 · 37a1efe1

Shuxin Yang authored Dec 18, 2012

 InstCombine for unsafe floating-point add/sub.

llvm-svn: 170471

37a1efe1

Dec 18, 2012

LoopVectorize: Emit reductions as log2(vectorsize) shuffles + vector ops... · f0e5d2f0

Benjamin Kramer authored Dec 18, 2012

LoopVectorize: Emit reductions as log2(vectorsize) shuffles + vector ops instead of scalar operations.

For example on x86 with SSE4.2 a <8 x i8> add reduction becomes
	movdqa	%xmm0, %xmm1
	movhlps	%xmm1, %xmm1            ## xmm1 = xmm1[1,1]
	paddw	%xmm0, %xmm1
	pshufd	$1, %xmm1, %xmm0        ## xmm0 = xmm1[1,0,0,0]
	paddw	%xmm1, %xmm0
	phaddw	%xmm0, %xmm0
	pextrb	$0, %xmm0, %edx

instead of
	pextrb	$2, %xmm0, %esi
	pextrb	$0, %xmm0, %edx
	addb	%sil, %dl
	pextrb	$4, %xmm0, %esi
	addb	%dl, %sil
	pextrb	$6, %xmm0, %edx
	addb	%sil, %dl
	pextrb	$8, %xmm0, %esi
	addb	%dl, %sil
	pextrb	$10, %xmm0, %edi
	pextrb	$14, %xmm0, %edx
	addb	%sil, %dil
	pextrb	$12, %xmm0, %esi
	addb	%dil, %sil
	addb	%sil, %dl

llvm-svn: 170439

f0e5d2f0

Rename the test so that we can add additional vectors-of-pointers tests · cb233428
Nadav Rotem authored Dec 18, 2012
```
into the same file in the future.

llvm-svn: 170414
```
cb233428
SROA: Replace calls to getScalarSizeInBits to DataLayout's API because · a5024fc3
Nadav Rotem authored Dec 18, 2012
```
getScalarSizeInBits could not handle vectors of pointers.

llvm-svn: 170412
```
a5024fc3

Dec 17, 2012

Fix another SROA crasher, PR14601. · e3f4119b

Chandler Carruth authored Dec 17, 2012

This was a silly oversight, we weren't pruning allocas which were used
by variable-length memory intrinsics from the set that could be widened
and promoted as integers. Fix that.

llvm-svn: 170353

e3f4119b

Teach the rewriting of memcpy calls to support subvector copies. · 21eb4e96

Chandler Carruth authored Dec 17, 2012

This also cleans up a bit of the memcpy call rewriting by sinking some
irrelevant code further down and making the call-emitting code a bit
more concrete.

Previously, memcpy of a subvector would actually miscompile (!!!) the
copy into a single vector element copy. I have no idea how this ever
worked. =/ This is the memcpy half of PR14478 which we probably weren't
noticing previously because it didn't actually assert.

The rewrite relies on the newly refactored insert- and extractVector
functions to do the heavy lifting, and those are the same as used for
loads and stores which makes the test coverage a bit more meaningful
here.

llvm-svn: 170338

21eb4e96

Fix a secondary bug I introduced while fixing the first part of PR14478. · cacda256

Chandler Carruth authored Dec 17, 2012

The first half of fixing this bug was actually in r170328, but was
entirely coincidental. It did however get me to realize the nature of
the bug, and adapt the test case to test more interesting behavior. In
turn, that uncovered the rest of the bug which I've fixed here.

This should fix two new asserts that showed up in the vectorize nightly
tester.

llvm-svn: 170333

cacda256

Fix the first part of PR14478: memset now works. · ccca504f

Chandler Carruth authored Dec 17, 2012

PR14478 highlights a serious problem in SROA that simply wasn't being
exercised due to a lack of vector input code mixed with C-library
function calls. Part of SROA was written carefully to handle subvector
accesses via memset and memcpy, but the rewriter never grew support for
this. Fixing it required refactoring the subvector access code in other
parts of SROA so it could be shared, and then fixing the splat formation
logic and using subvector insertion (this patch).

The PR isn't quite fixed yet, as memcpy is still broken in the same way.
I'm starting on that series of patches now.

Hopefully this will be enough to bring the bullet benchmark back to life
with the bb-vectorizer enabled, but that may require fixing memcpy as
well.

llvm-svn: 170301

ccca504f

Dec 15, 2012
- Add a corollary test for PR14572. We got this code path correct already. · c50394fc
  Chandler Carruth authored Dec 15, 2012
```
llvm-svn: 170271
```
  c50394fc
- Relax an overly aggressive assert to fix PR14572. · 067edd34
  Chandler Carruth authored Dec 15, 2012
```
The alloca width is based on the alloc size, not the type size.

llvm-svn: 170270
```
  067edd34
Dec 14, 2012
- Add back FoldOpIntoPhi optimizations with fix. Included test cases to help... · e2754dc8
  Michael Ilseman authored Dec 14, 2012
```
Add back FoldOpIntoPhi optimizations with fix. Included test cases to help catch these errors and to test the presence of the optimization itself

llvm-svn: 170248
```
  e2754dc8
- Fix a crash in ValueTracking on vectors of pointers. · aa3e2a90
  Nadav Rotem authored Dec 14, 2012
```
llvm-svn: 170240
```
  aa3e2a90
- rdar://12753946 · f8e9a5a0
  Shuxin Yang authored Dec 14, 2012
```
Implement rule : "x * (select cond 1.0, 0.0) -> select cond x, 0.0"

llvm-svn: 170226
```
  f8e9a5a0
Dec 13, 2012

Revert r170020, "Simplify negated bit test", for now. · 38d2b244

NAKAMURA Takumi authored Dec 13, 2012

This assumes (1 << n) is always not zero. Consider n is greater than word size.
Although I know it is undefined, this transforms undefined behavior hidden.

This led clang unexpected behavior with some failures. I will investigate to fix undefined shl in clang.

llvm-svn: 170128

38d2b244

Take into account minimize size attribute in the inliner. · c0dba203

Quentin Colombet authored Dec 13, 2012

Better controls the inlining of functions when the caller function has MinSize attribute.
Basically, when the caller function has this attribute, we do not "force" the inlining
of callee functions carrying the InlineHint attribute (i.e., functions defined with
inline keyword)

llvm-svn: 170065

c0dba203