Commits · 9aee065e3cd1e609fc6f08f3640b35a7eb67ce6a · Roger Ferrer / llvm-epi-0.8

Dec 19, 2012
- Enable the loop vectorizer in clang and not in the pass manager, so that we... · 9aee065e
  Nadav Rotem authored Dec 18, 2012
```
Enable the loop vectorizer in clang and not in the pass manager, so that we can disable it in clang.

llvm-svn: 170470
```
  9aee065e
Dec 18, 2012

LoopVectorize: Emit reductions as log2(vectorsize) shuffles + vector ops... · f0e5d2f0

Benjamin Kramer authored Dec 18, 2012

LoopVectorize: Emit reductions as log2(vectorsize) shuffles + vector ops instead of scalar operations.

For example on x86 with SSE4.2 a <8 x i8> add reduction becomes
	movdqa	%xmm0, %xmm1
	movhlps	%xmm1, %xmm1            ## xmm1 = xmm1[1,1]
	paddw	%xmm0, %xmm1
	pshufd	$1, %xmm1, %xmm0        ## xmm0 = xmm1[1,0,0,0]
	paddw	%xmm1, %xmm0
	phaddw	%xmm0, %xmm0
	pextrb	$0, %xmm0, %edx

instead of
	pextrb	$2, %xmm0, %esi
	pextrb	$0, %xmm0, %edx
	addb	%sil, %dl
	pextrb	$4, %xmm0, %esi
	addb	%dl, %sil
	pextrb	$6, %xmm0, %edx
	addb	%sil, %dl
	pextrb	$8, %xmm0, %esi
	addb	%dl, %sil
	pextrb	$10, %xmm0, %edi
	pextrb	$14, %xmm0, %edx
	addb	%sil, %dil
	pextrb	$12, %xmm0, %esi
	addb	%dil, %sil
	addb	%sil, %dl

llvm-svn: 170439

f0e5d2f0

Enable the loop vectorizer. · c0699854
Nadav Rotem authored Dec 18, 2012
```
llvm-svn: 170416
```
c0699854
SROA: Replace calls to getScalarSizeInBits to DataLayout's API because · a5024fc3
Nadav Rotem authored Dec 18, 2012
```
getScalarSizeInBits could not handle vectors of pointers.

llvm-svn: 170412
```
a5024fc3
Initialize NoRedZone and remove unused default values. · 46b9c8a2
Rafael Espindola authored Dec 18, 2012
```
llvm-svn: 170404
```
46b9c8a2

Dec 17, 2012

Fix another SROA crasher, PR14601. · e3f4119b

Chandler Carruth authored Dec 17, 2012

This was a silly oversight, we weren't pruning allocas which were used
by variable-length memory intrinsics from the set that could be widened
and promoted as integers. Fix that.

llvm-svn: 170353

e3f4119b

[msan] Fix lint warning. · 88b8dced
Evgeniy Stepanov authored Dec 17, 2012
```
llvm-svn: 170347
```
88b8dced

Teach the rewriting of memcpy calls to support subvector copies. · 21eb4e96

Chandler Carruth authored Dec 17, 2012

This also cleans up a bit of the memcpy call rewriting by sinking some
irrelevant code further down and making the call-emitting code a bit
more concrete.

Previously, memcpy of a subvector would actually miscompile (!!!) the
copy into a single vector element copy. I have no idea how this ever
worked. =/ This is the memcpy half of PR14478 which we probably weren't
noticing previously because it didn't actually assert.

The rewrite relies on the newly refactored insert- and extractVector
functions to do the heavy lifting, and those are the same as used for
loads and stores which makes the test coverage a bit more meaningful
here.

llvm-svn: 170338

21eb4e96

Optimize tree walking in markAliveBlocks. · 95a80abe

Evgeniy Stepanov authored Dec 17, 2012

Check whether a BB is known as reachable before adding it to the worklist.
This way BB's with multiple predecessors are added to the list no more than
once.

llvm-svn: 170335

95a80abe

Fix a secondary bug I introduced while fixing the first part of PR14478. · cacda256

Chandler Carruth authored Dec 17, 2012

The first half of fixing this bug was actually in r170328, but was
entirely coincidental. It did however get me to realize the nature of
the bug, and adapt the test case to test more interesting behavior. In
turn, that uncovered the rest of the bug which I've fixed here.

This should fix two new asserts that showed up in the vectorize nightly
tester.

llvm-svn: 170333

cacda256

Hoist a convertValue call to the two paths where it is needed. · 95e1fb8a

Chandler Carruth authored Dec 17, 2012

I noticed this while looking at r170328. We only ever do a vector
rewrite when the alloca *is* the vector type, so it's good to not paper
over bugs here by doing a convertValue that isn't needed.

llvm-svn: 170331

95e1fb8a

Hoist the insertVector helper to be a static helper. · ce4562bd

Chandler Carruth authored Dec 17, 2012

This will allow its use inside of memcpy rewriting as well. This routine
is more complex than extractVector, and some of its uses are not 100%
where I want them to be so there is still some work to do here.

While this can technically change the output in some cases, it shouldn't
be a change that matters -- IE, it can leave some dead code lying around
that prior versions did not, etc.

Yet another step in the refactorings leading up to the solution to the
last component of PR14478.

llvm-svn: 170328

ce4562bd

Lift the extractVector helper all the way out to a static helper function. · b6bc8749

Chandler Carruth authored Dec 17, 2012

The method helpers all implicitly act upon the alloca, and what we
really want is a fully generic helper. Doing memcpy rewrites is more
special than all other rewrites because we are at times rewriting
instructions which touch pointers *other* than the alloca. As
a consequence all of the helpers needed by memcpy rewriting of
sub-vector copies will need to be generalized fully.

Note that all of these helpers ({insert,extract}{Integer,Vector}) are
woefully uncommented. I'm going to go back through and document them
once I get the factoring correct.

No functionality changed.

llvm-svn: 170325

b6bc8749

Factor the vector load rewriting into a more generic form. · 769445ef

Chandler Carruth authored Dec 17, 2012

This makes it suitable for use in rewriting memcpy in the presence of
subvector memcpy intrinsics.

No functionality changed.

llvm-svn: 170324

769445ef

Fix the first part of PR14478: memset now works. · ccca504f

Chandler Carruth authored Dec 17, 2012

PR14478 highlights a serious problem in SROA that simply wasn't being
exercised due to a lack of vector input code mixed with C-library
function calls. Part of SROA was written carefully to handle subvector
accesses via memset and memcpy, but the rewriter never grew support for
this. Fixing it required refactoring the subvector access code in other
parts of SROA so it could be shared, and then fixing the splat formation
logic and using subvector insertion (this patch).

The PR isn't quite fixed yet, as memcpy is still broken in the same way.
I'm starting on that series of patches now.

Hopefully this will be enough to bring the bullet benchmark back to life
with the bb-vectorizer enabled, but that may require fixing memcpy as
well.

llvm-svn: 170301

ccca504f

Extract the logic for inserting a subvector into a vector alloca. · eae65a56
Chandler Carruth authored Dec 17, 2012
```
No functionality changed. Another step of refactoring toward solving
PR14487.

llvm-svn: 170300
```
eae65a56

Lift the integer splat computation into a helper function. · 514f34f9

Chandler Carruth authored Dec 17, 2012

No functionality changed. Refactoring leading up to the fix for PR14478
which requires some significant changes to the memset and memcpy
rewriting.

llvm-svn: 170299

514f34f9

Dec 15, 2012
- Relax an overly aggressive assert to fix PR14572. · 067edd34
  Chandler Carruth authored Dec 15, 2012
```
The alloca width is based on the alloc size, not the type size.

llvm-svn: 170270
```
  067edd34
- Revert r170246, "Enable the loop vectorizer by default." · 8f45b6c7
  NAKAMURA Takumi authored Dec 15, 2012
```
llvm-svn: 170267
```
  8f45b6c7
Dec 14, 2012
- Add back FoldOpIntoPhi optimizations with fix. Included test cases to help... · e2754dc8
  Michael Ilseman authored Dec 14, 2012
```
Add back FoldOpIntoPhi optimizations with fix. Included test cases to help catch these errors and to test the presence of the optimization itself

llvm-svn: 170248
```
  e2754dc8
- Enable the loop vectorizer by default. · acde7748
  Nadav Rotem authored Dec 14, 2012
```
llvm-svn: 170246
```
  acde7748
- rdar://12753946 · f8e9a5a0
  Shuxin Yang authored Dec 14, 2012
```
Implement rule : "x * (select cond 1.0, 0.0) -> select cond x, 0.0"

llvm-svn: 170226
```
  f8e9a5a0
- Fix lint warnings in MemorySanitizer.cpp. · 9b72e991
  Evgeniy Stepanov authored Dec 14, 2012
```
llvm-svn: 170203
```
  9b72e991
- [msan] Origin stores and loads do not need explicit alignment. · 49175b23
  Evgeniy Stepanov authored Dec 14, 2012
```
Origin address is always 4 byte aligned, and the access type is always i32.

llvm-svn: 170199
```
  49175b23
- [msan] Refactor default shadow propagation and origin tracking. · f18e3af1
  Evgeniy Stepanov authored Dec 14, 2012
```
This change moves the code for default shadow propagaition (handleShadowOr)
and origin tracking (setOriginForNaryOp) into a new builder-like class. Also
gets rid of handleShadowOrBinary.

llvm-svn: 170192
```
  f18e3af1
- revert r170166 - disable the loop vectorizer. · d3a3c9fd
  Nadav Rotem authored Dec 14, 2012
```
llvm-svn: 170172
```
  d3a3c9fd
- Enable the loop vectorizer. · 3b606d6f
  Nadav Rotem authored Dec 14, 2012
```
llvm-svn: 170166
```
  3b606d6f
- Disable the loop vectorizer. · b4ea4b37
  Nadav Rotem authored Dec 14, 2012
```
llvm-svn: 170162
```
  b4ea4b37
- Enable the Loop Vectorizer by default for O2 and O3. Disable if-conversion by... · e5e28b48
  Nadav Rotem authored Dec 13, 2012
```
Enable the Loop Vectorizer by default for O2 and O3. Disable if-conversion by default. I plan to revert this patch later today.

llvm-svn: 170157
```
  e5e28b48
Dec 13, 2012

Revert r170020, "Simplify negated bit test", for now. · 38d2b244

NAKAMURA Takumi authored Dec 13, 2012

This assumes (1 << n) is always not zero. Consider n is greater than word size.
Although I know it is undefined, this transforms undefined behavior hidden.

This led clang unexpected behavior with some failures. I will investigate to fix undefined shl in clang.

llvm-svn: 170128

38d2b244

Revert "Restore the PHI optimization I accidently removed" temporarily since · a1bbeeca
Eric Christopher authored Dec 13, 2012
```
it seems to be breaking self-host for a few people and is PR14592.

This reverts commit r170024.

llvm-svn: 170106
```
a1bbeeca
Missed these calls from the previous rename somehow. · a2c107e6
Rafael Espindola authored Dec 13, 2012
```
llvm-svn: 170094
```
a2c107e6

Rename isPowerOfTwo to isKnownToBeAPowerOfTwo. · 319f74cd

Rafael Espindola authored Dec 13, 2012

In a previous thread it was pointed out that isPowerOfTwo is not a very precise
name since it can return false for powers of two if it is unable to show that
they are powers of two.

llvm-svn: 170093

319f74cd

Pattern matching code for intrinsics. · 536cc32b

Michael Ilseman authored Dec 13, 2012

Provides m_Argument that allows matching against a CallSite's specified argument. Provides m_Intrinsic pattern that can be templatized over the intrinsic id and bind/match arguments similarly to other pattern matchers. Implementations provided for 0 to 4 arguments, though it's very simple to extend for more. Also provides example template specialization for bswap (m_BSwap) and example of code cleanup for its use.

llvm-svn: 170091

536cc32b

Take into account minimize size attribute in the inliner. · c0dba203

Quentin Colombet authored Dec 13, 2012

Better controls the inlining of functions when the caller function has MinSize attribute.
Basically, when the caller function has this attribute, we do not "force" the inlining
of callee functions carrying the InlineHint attribute (i.e., functions defined with
inline keyword)

llvm-svn: 170065

c0dba203

Teach the cost model about the optimization in r169904: Truncation of... · 36510f71

Nadav Rotem authored Dec 13, 2012

Teach the cost model about the optimization in r169904: Truncation of induction variables costs the same as scalar trunc. 

llvm-svn: 170051

36510f71

Typo. · e28ae30a
Chad Rosier authored Dec 13, 2012
```
llvm-svn: 170050
```
e28ae30a

Dec 12, 2012
- Restore the PHI optimization I accidently removed · 3c814128
  Michael Ilseman authored Dec 12, 2012
```
llvm-svn: 170024
```
  3c814128
- Remove trailing whitespace · 9fc0f258
  Michael Ilseman authored Dec 12, 2012
```
llvm-svn: 170022
```
  9fc0f258
- Simplify negated bit test · 5226aa94
  David Majnemer authored Dec 12, 2012
```
llvm-svn: 170020
```
  5226aa94