Commits · 2d9dec322e25333fb4ef2def21fb0fa457e94a55 · Roger Ferrer / llvm-epi-0.8

Apr 09, 2013

Add support for bottom-up SLP vectorization infrastructure. · 2d9dec32

Nadav Rotem authored Apr 09, 2013

This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
The infrastructure has three potential users:

  1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).

  2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.

  3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.

This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:

void SAXPY(int *x, int *y, int a, int i) {
  x[i]   = a * x[i]   + y[i];
  x[i+1] = a * x[i+1] + y[i+1];
  x[i+2] = a * x[i+2] + y[i+2];
  x[i+3] = a * x[i+3] + y[i+3];
}

llvm-svn: 179117

2d9dec32

Redo the fix Benjamin Kramer committed in r178793 about iterator invalidation in Reassociate. · 331f01dc

Shuxin Yang authored Apr 08, 2013

I brazenly think this change is slightly simpler than r178793 because: 
  - no "state" in functor
  - "OpndPtrs[i]" looks simpler than "&Opnds[OpndIndices[i]]" 

  While I can reproduce the probelm in Valgrind, it is rather difficult to come up
a standalone testing case. The reason is that when an iterator is invalidated,
the stale invalidated elements are not yet clobbered by nonsense data, so the
optimizer can still proceed successfully. 

  Thank Benjamin for fixing this bug and generously providing the test case.

llvm-svn: 179062

331f01dc

Apr 07, 2013

Fix PR15674 (and PR15603): a SROA think-o. · 0e8a52d1

Chandler Carruth authored Apr 07, 2013

The fix for PR14972 in r177055 introduced a real think-o in the *store*
side, likely because I was much more focused on the load side. While we
can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily
widen a value to be stored, as that changes the width of memory access!
Lock down the code path in the store rewriting which would do this to
only handle the intended circumstance.

All of the existing tests continue to pass, and I've added a test from
the PR.

llvm-svn: 178974

0e8a52d1

Apr 06, 2013

Removed trailing whitespace. · 7924997c
Michael Gottesman authored Apr 05, 2013
```
llvm-svn: 178932
```
7924997c

An objc_retain can serve as a use for a different pointer. · 31ba23aa

Michael Gottesman authored Apr 05, 2013

This is the counterpart to commit r160637, except it performs the action
in the bottomup portion of the data flow analysis.

llvm-svn: 178922

31ba23aa

Properly model precise lifetime when given an incomplete dataflow sequence. · 1d8d2577

Michael Gottesman authored Apr 05, 2013

The normal dataflow sequence in the ARC optimizer consists of the following
states:

    Retain -> CanRelease -> Use -> Release

The optimizer before this patch stored the uses that determine the lifetime of
the retainable object pointer when it bottom up hits a retain or when top down
it hits a release. This is correct for an imprecise lifetime scenario since what
we are trying to do is remove retains/releases while making sure that no
``CanRelease'' (which is usually a call) deallocates the given pointer before we
get to the ``Use'' (since that would cause a segfault).

If we are considering the precise lifetime scenario though, this is not
correct. In such a situation, we *DO* care about the previous sequence, but
additionally, we wish to track the uses resulting from the following incomplete
sequences:

  Retain -> CanRelease -> Release   (TopDown)
  Retain <- Use <- Release          (BottomUp)

*NOTE* This patch looks large but the most of it consists of updating
test cases. Additionally this fix exposed an additional bug. I removed
the test case that expressed said bug and will recommit it with the fix
in a little bit.

llvm-svn: 178921

1d8d2577

Apr 05, 2013

Tidy up a bit. No functional change. · bdbd7346
Jim Grosbach authored Apr 05, 2013
```
llvm-svn: 178915
```
bdbd7346

Disable the optimization about promoting vector-element-access with symbolic index. · 95adf525

Shuxin Yang authored Apr 05, 2013

This optimization is unstable at this moment; it 
  1) block us on a very important application
  2) PR15200
  3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll
     (the CHECK command compare the output against wrong result)

   I personally believe this optimization should not have any impact on the
autovectorized code, as auto-vectorizer is supposed to put gather/scatter
in a "right" way.  Although in theory downstream optimizaters might reveal 
some gather/scatter optimization opportunities, the chance is quite slim.

   For the hand-crafted vectorizing code, in term of redundancy elimination,
load-CSE, copy-propagation and DSE can collectively achieve the same result,
but in much simpler way. On the other hand, these optimizers are able to 
improve the code in a incremental way; in contrast, SROA is sort of all-or-none
approach. However, SROA might slighly win in stack size, as it tries to figure 
out a stretch of memory tightenly cover the area accessed by the dynamic index.

 rdar://13174884
 PR15200

llvm-svn: 178912

95adf525

Added two debug logging messages to VisitInstructionsTopDown to match VisitInstructionsBottomUp. · bab49e97
Michael Gottesman authored Apr 05, 2013
```
llvm-svn: 178895
```
bab49e97
Cleaned up whitespace and made debug logging less verbose. · 89279f83
Michael Gottesman authored Apr 05, 2013
```
llvm-svn: 178893
```
89279f83

LoopVectorizer: Pass OperandValueKind information to the cost model · df6f67ed

Arnold Schwaighofer authored Apr 04, 2013

Pass down the fact that an operand is going to be a vector of constants.

This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.

radar://13576547

llvm-svn: 178809

df6f67ed

Apr 04, 2013

Reassociate: Avoid iterator invalidation. · dd67654a

Benjamin Kramer authored Apr 04, 2013

OpndPtrs stored pointers into the Opnd vector that became invalid when the
vector grows. Store indices instead. Sadly I only have a large testcase that
only triggers under valgrind, so I didn't include it.

llvm-svn: 178793

dd67654a

Refactored out the helper method FindPredecessorAutoreleaseWithSafePath from... · 21a4ed32

Michael Gottesman authored Apr 03, 2013

Refactored out the helper method FindPredecessorAutoreleaseWithSafePath from ObjCARCOpt::OptimizeReturns.

Now ObjCARCOpt::OptimizeReturns is easy to read and reason about.

llvm-svn: 178715

21a4ed32

Refactored out the helper function FindPredecessorRetainWithSafePath from... · 6908db14
Michael Gottesman authored Apr 03, 2013
```
Refactored out the helper function FindPredecessorRetainWithSafePath from ObjCARCOpt::OptimizeReturns.

llvm-svn: 178714
```
6908db14

Small cleanups. · c2d5bf5c

Michael Gottesman authored Apr 03, 2013

Cleaned up trailing whitespace and added extra slashes in front of a
function level comment so that it follow the convention of having 3
slashes.

llvm-svn: 178712

c2d5bf5c

Refactored out a part of ObjCARCOpt::OptimizeReturns into its own method... · 54dc7fde
Michael Gottesman authored Apr 03, 2013
```
Refactored out a part of ObjCARCOpt::OptimizeReturns into its own method HasSafePathToPredecessorCall.

llvm-svn: 178710
```
54dc7fde
Removed an old comment. · 0a1748bb
Michael Gottesman authored Apr 03, 2013
```
llvm-svn: 178709
```
0a1748bb

Clean up arc annotations by moving the top/bottom BB annotations into... · 43e7e00a

Michael Gottesman authored Apr 03, 2013

Clean up arc annotations by moving the top/bottom BB annotations into conditional macros that no-op in Release mode instead of #ifdef sections of the code.

This is to follow the example of the DEBUG macro.

llvm-svn: 178705

43e7e00a

Apr 03, 2013

Remove an optimization where we were changing an objc_autorelease into an... · b8c88365

Michael Gottesman authored Apr 03, 2013

Remove an optimization where we were changing an objc_autorelease into an objc_autoreleaseReturnValue.

The semantics of ARC implies that a pointer passed into an objc_autorelease
must live until some point (potentially down the stack) where an
autorelease pool is popped. On the other hand, an
objc_autoreleaseReturnValue just signifies that the object must live
until the end of the given function at least.

Thus objc_autorelease is stronger than objc_autoreleaseReturnValue in
terms of the semantics of ARC* implying that performing the given
strength reduction without any knowledge of how this relates to
the autorelease pool pop that is further up the stack violates the
semantics of ARC.

*Even though objc_autoreleaseReturnValue if you know that no RV
optimization will occur is more computationally expensive.

llvm-svn: 178612

b8c88365

Improved comment. No functionality change. · 62424391
Michael Gottesman authored Apr 03, 2013
```
llvm-svn: 178605
```
62424391

Apr 02, 2013

Use a worklist to avoid a sneaky iterator invalidation. · 88d06c3b

Bill Wendling authored Apr 02, 2013

The iterator could be invalidated when it's recursively deleting a whole bunch
of constant expressions in a constant initializer.

Note: This was only reproducible if `opt' was run on a `.bc' file. If `opt' was
run on a `.ll' file, it wouldn't crash. This is why the test first pushes the
`.ll' file through `llvm-as' before feeding it to `opt'.

PR15440

llvm-svn: 178531

88d06c3b

Apr 01, 2013
- Correct assertion condition · 6662fd0f
  Shuxin Yang authored Apr 01, 2013
```
llvm-svn: 178484
```
  6662fd0f
Mar 30, 2013

Implement XOR reassociation. It is based on following rules: · 7b0c94e2

Shuxin Yang authored Mar 30, 2013

  rule 1: (x | c1) ^ c2 => (x & ~c1) ^ (c1^c2),
     only useful when c1=c2
  rule 2: (x & c1) ^ (x & c2) = (x & (c1^c2))
  rule 3: (x | c1) ^ (x | c2) = (x & c3) ^ c3 where c3 = c1 ^ c2
  rule 4: (x | c1) ^ (x & c2) => (x & c3) ^ c1, where c3 = ~c1 ^ c2

 It reduces an application's size (in terms of # of instructions) by 8.9%.
 Reviwed by Pete Cooper. Thanks a lot!

 rdar://13212115  

llvm-svn: 178409

7b0c94e2

Mar 29, 2013

Add clang.arc.used to ModuleHasARC so ARC always runs if said call is present in a module. · 3b8f8778

Michael Gottesman authored Mar 29, 2013

clang.arc.used is an interesting call for ARC since ObjCARCContract
needs to run to remove said intrinsic to avoid a linker error (since the
call does not exist).

llvm-svn: 178369

3b8f8778

Removed trailing whitespace. · 60f6b28c
Michael Gottesman authored Mar 29, 2013
```
llvm-svn: 178329
```
60f6b28c

Removed dead code from ObjCARCOpts relating to tracking objc_retainBlocks... · ba64859e

Michael Gottesman authored Mar 28, 2013

Removed dead code from ObjCARCOpts relating to tracking objc_retainBlocks through the ARC Dataflow analysis. By the time we get to the ARC dataflow analysis, any objc_retainBlock calls are not optimizable.

llvm-svn: 178306

ba64859e

Mar 28, 2013

Minor simplification. · 85722f48

Bill Wendling authored Mar 28, 2013

Go ahead and use the full path for both the .gcno and .gcda files.

llvm-svn: 178302

85722f48

Non optimizable objc_retainBlock calls are not forwarding. · 49f9885a

Michael Gottesman authored Mar 28, 2013

Since we handle optimizable objc_retainBlocks through strength reduction
in OptimizableIndividualCalls, we know that all code after that point
will only see non-optimizable objc_retainBlock calls. IsForwarding is
only called by functions after that point, so it is ok to just classify
objc_retainBlock as non-forwarding.

<rdar://problem/13249661>.

llvm-svn: 178285

49f9885a

[ObjCARC] Strength reduce objc_retainBlock -> objc_retain if the objc_retainBlock is optimizable. · 158fdf69

Michael Gottesman authored Mar 28, 2013

If an objc_retainBlock has the copy_on_escape metadata attached to it
AND if the block pointer argument only escapes down the stack, we are
allowed to strength reduce the objc_retainBlock to to an objc_retain and
thus optimize it.

Current there is logic in the ARC data flow analysis to handle
this case which is complicated and involved making distinctions in
between objc_retainBlock and objc_retain in certain places and
considering them the same in others.

This patch simplifies said code by:

1. Performing the strength reduction in the initial ARC peephole
analysis (ObjCARCOpts::OptimizeIndividualCalls).

2. Changes the ARC dataflow analysis (which runs after the peephole
analysis) to consider all objc_retainBlock calls to not be optimizable
(since if the call was optimizable, we would have strength reduced it
already).

This patch leaves in the infrastructure in the ARC dataflow analysis to
handle this case, which due to 2 will just be dead code. I am doing this
on purpose to separate the removal of the old code from the testing of
the new code.

<rdar://problem/13249661>.

llvm-svn: 178284

158fdf69

[tsan] make sure memset/memcpy/memmove are not inlined in tsan mode · 463aa814
Kostya Serebryany authored Mar 28, 2013
```
llvm-svn: 178230
```
463aa814
Check if Type is a vector before calling function Type::getVectorNumElements. · 99866dd5
Akira Hatanaka authored Mar 28, 2013
```
llvm-svn: 178208
```
99866dd5

Mar 26, 2013

Use the full path when outputting the `.gcda' file. · 5aa82397

Bill Wendling authored Mar 26, 2013

If we compile a single source program, the `.gcda' file will be generated where
the program was executed. This isn't desirable, because that place may be at an
unpredictable place (the program could call `chdir' for instance).

Instead, we will output the `.gcda' file in the same place we output the `.gcno'
file. I.e., the directory where the executable was generated. This matches GCC's
behavior.

<rdar://problem/13061072> & PR11809

llvm-svn: 178084

5aa82397

Make InstCombineCasts.cpp:OptimizeIntToFloatBitCast endian safe. · 8a51d8ea

Ulrich Weigand authored Mar 26, 2013

The OptimizeIntToFloatBitCast converts shift-truncate sequences
into extractelement operations.  The computation of the element
index to be used in the resulting operation is currently only
correct for little-endian targets.

This commit fixes the element index computation to be correct
for big-endian targets as well.  If the target byte order is
unknown, the optimization cannot be performed at all.

llvm-svn: 178031

8a51d8ea

[ASan] Change the ABI of __asan_before_dynamic_init function: now it takes... · e1e26bf1

Alexey Samsonov authored Mar 26, 2013

[ASan] Change the ABI of __asan_before_dynamic_init function: now it takes pointer to private string with module name. This string serves as a unique module ID in ASan runtime. LLVM part

llvm-svn: 178013

e1e26bf1

[ObjCARC Annotations] Added support for displaying the state of pointers at... · cd4de0f9

Michael Gottesman authored Mar 26, 2013

[ObjCARC Annotations] Added support for displaying the state of pointers at the bottom/top of BBs of the ARC dataflow analysis for both bottomup and topdown analyses.

This will allow for verification and analysis of the merge function of
the data flow analyses in the ARC optimizer.

The actual implementation of this feature is by introducing calls to
the functions llvm.arc.annotation.{bottomup,topdown}.{bbstart,bbend}
which are only declared. Each such call takes in a pointer to a global
with the same name as the pointer whose provenance is being tracked and
a pointer whose name is one of our Sequence states and points to a
string that contains the same name.

To ensure that the optimizer does not consider these annotations in any
way, I made it so that the annotations are considered to be of IC_None
type.

A test case is included for this commit and the previous
ObjCARCAnnotation commit.

llvm-svn: 177952

cd4de0f9

[ObjCARC Annotations] Implemented ARC annotation metadata to expose the ARC... · 81b1d437

Michael Gottesman authored Mar 26, 2013

[ObjCARC Annotations] Implemented ARC annotation metadata to expose the ARC data flow analysis state in the IR via metadata.

Previously the inner works of the data flow analysis in ObjCARCOpts was hard to
get out of the optimizer for analysis of bugs or testing. All of the current ARC
unit tests are based off of testing the effect of the data flow
analysis (i.e. what statements are removed or moved, etc.). This creates
weakness in the current unit testing regimem since we are not actually testing
what effects various instructions have on the modeled pointer state.
Additionally in order to analyze a bug in the optimizer, one would need to track
by hand what the optimizer was actually doing either through use of DEBUG
statements or through the usage of a debugger, both yielding large loses in
developer productivity.

This patch deals with these two issues by providing ARC annotation
metadata that annotates instructions with the state changes that they cause in
various pointers as well as provides metadata to annotate provenance sources.

Specifically, we introduce the following metadata types:

1. llvm.arc.annotation.bottomup.
2. llvm.arc.annotation.topdown.
3. llvm.arc.annotation.provenancesource.

llvm.arc.annotation.{bottomup,topdown}: These annotations describes a state
change in a pointer when we are visiting instructions bottomup/topdown
respectively. The output format for both is the same:

!1 = metadata !{metadata !"(test,%x)", metadata !"S_Release", metadata !"S_Use"}

The first element is a string tuple with the following format:

(function,variable name)

The second two elements of the metadata show the previous state of the
pointer (in this case S_Release) and the new state of the pointer (S_Use). We
write the metadata in such a manner to ensure that it is easy for outside tools
to parse. This is important since I am currently working on a tool for taking
this information and pretty printing it besides the IR and that can be used for
LIT style testing via the generation of an index.

llvm.arc.annotation.provenancesource: This metadata is used to annotate
instructions which act as provenance sources, i.e. ones that introduce a
new (from the optimizer's perspective) non-argument pointer to track. This
enables cross-referencing in between provenance sources and the state changes
that occur to them.

This is still a work in progress. Additionally I plan on committing
later today additions to the annotations that annotate at the top/bottom
of basic blocks the state of the various pointers being tracked.

*NOTE* The metadata support is conditionally compiled into libObjCARCOpts only
when we are producing a debug build of llvm/clang and even so are
disabled by default. To enable the annotation metadata, pass in
-enable-objc-arc-annotations to opt.

llvm-svn: 177951

81b1d437

Mar 25, 2013

Fix a bug in fast-math fadd/fsub simplification. · 389ed4b8

Shuxin Yang authored Mar 25, 2013

The problem is that the code mistakenly took for granted that following constructor 
is able to create an APFloat from a *SIGNED* integer:
   
  APFloat::APFloat(const fltSemantics &ourSemantics, integerPart value)

rdar://13486998

llvm-svn: 177906

389ed4b8

Address issues found by Duncan during post-commit review of r177856. · 3ee88e8a
Arnaud A. de Grandmaison authored Mar 25, 2013
```
llvm-svn: 177863
```
3ee88e8a

InstCombine: simplify comparisons to zero of (shl %x, Cst) or (mul %x, Cst) · 9c383d68

Arnaud A. de Grandmaison authored Mar 25, 2013

This simplification happens at 2 places :
 - using the nsw attribute when the shl / mul is used by a sign test
 - when the shl / mul is compared for (in)equality to zero

llvm-svn: 177856

9c383d68

Changed isNullOrUndef => IsNullOrUndef and isNoopInstruction =>... · 65c2481d

Michael Gottesman authored Mar 25, 2013

Changed isNullOrUndef => IsNullOrUndef and isNoopInstruction => IsNoopInstruction so that all helper functions are named similarly in ObjCARC.h.

llvm-svn: 177855

65c2481d