Commits · 29aeb2051826e235a39545db3c891dc00f347730 · Roger Ferrer / llvm-epi-0.8

Nov 17, 2013

Add a loop rerolling flag to the PassManagerBuilder · 29aeb205

Hal Finkel authored Nov 17, 2013

This adds a boolean member variable to the PassManagerBuilder to control loop
rerolling (just like we have for unrolling and the various vectorization
options). This is necessary for control by the frontend. Loop rerolling remains
disabled by default at all optimization levels.

llvm-svn: 194966

29aeb205

Add the cold attribute to error-reporting call sites · 66cd3f1b

Hal Finkel authored Nov 17, 2013

Generally speaking, control flow paths with error reporting calls are cold.
So far, error reporting calls are calls to perror and calls to fprintf,
fwrite, etc. with stderr as the stream. This can be extended in the future.

The primary motivation is to improve block placement (the cold attribute
affects the static branch prediction heuristics).

llvm-svn: 194943

66cd3f1b

Fix ndebug-build unused variable in loop rerolling · 67107ea1
Hal Finkel authored Nov 17, 2013
```
llvm-svn: 194941
```
67107ea1

Add a loop rerolling pass · bf45efde

Hal Finkel authored Nov 16, 2013

This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The
transformation aims to take loops like this:

for (int i = 0; i < 3200; i += 5) {
  a[i]     += alpha * b[i];
  a[i + 1] += alpha * b[i + 1];
  a[i + 2] += alpha * b[i + 2];
  a[i + 3] += alpha * b[i + 3];
  a[i + 4] += alpha * b[i + 4];
}

and turn them into this:

for (int i = 0; i < 3200; ++i) {
  a[i] += alpha * b[i];
}

and loops like this:

for (int i = 0; i < 500; ++i) {
  x[3*i] = foo(0);
  x[3*i+1] = foo(0);
  x[3*i+2] = foo(0);
}

and turn them into this:

for (int i = 0; i < 1500; ++i) {
  x[i] = foo(0);
}

There are two motivations for this transformation:

  1. Code-size reduction (especially relevant, obviously, when compiling for
code size).

  2. Providing greater choice to the loop vectorizer (and generic unroller) to
choose the unrolling factor (and a better ability to vectorize). The loop
vectorizer can take vector lengths and register pressure into account when
choosing an unrolling factor, for example, and a pre-unrolled loop limits that
choice. This is especially problematic if the manual unrolling was optimized
for a machine different from the current target.

The current implementation is limited to single basic-block loops only. The
rerolling recognition should work regardless of how the loop iterations are
intermixed within the loop body (subject to dependency and side-effect
constraints), but the significant restriction is that the order of the
instructions in each iteration must be identical. This seems sufficient to
capture all current use cases.

This pass is not currently enabled by default at any optimization level.

llvm-svn: 194939

bf45efde

Nov 16, 2013

Apply the InstCombine fptrunc sqrt optimization to llvm.sqrt · 12100bf7

Hal Finkel authored Nov 16, 2013

InstCombine, in visitFPTrunc, applies the following optimization to sqrt calls:

  (fptrunc (sqrt (fpext x))) -> (sqrtf x)

but does not apply the same optimization to llvm.sqrt. This is a problem
because, to enable vectorization, Clang generates llvm.sqrt instead of sqrt in
fast-math mode, and because this optimization is being applied to sqrt and not
applied to llvm.sqrt, sometimes the fast-math code is slower.

This change makes InstCombine apply this optimization to llvm.sqrt as well.

This fixes the specific problem in PR17758, although the same underlying issue
(optimizations applied to libcalls are not applied to intrinsics) exists for
other optimizations in SimplifyLibCalls.

llvm-svn: 194935

12100bf7

InstCombine: fold (A >> C) == (B >> C) --> (A^B) < (1 << C) for constant Cs. · 03f3e248
Benjamin Kramer authored Nov 16, 2013
```
This is common in bitfield code.

llvm-svn: 194925
```
03f3e248

LoopVectorizer: Use abi alignment for accesses with no alignment · dbb7b87d

Arnold Schwaighofer authored Nov 15, 2013

When we vectorize a scalar access with no alignment specified, we have to set
the target's abi alignment of the scalar access on the vectorized access.
Using the same alignment of zero would be wrong because most targets will have a
bigger abi alignment for vector types.

This probably fixes PR17878.

llvm-svn: 194876

dbb7b87d

Nov 15, 2013

ArgumentPromotion: correctly transfer TBAA tags and alignments. · bc37658a

Manman Ren authored Nov 15, 2013

We used to use std::map<IndicesVector, LoadInst*> for OriginalLoads, and when we
try to promote two arguments, they will both write to OriginalLoads causing
created loads for the two arguments to have the same original load. And the same
tbaa tag and alignment will be put to the created loads for the two arguments.

The fix is to use std::map<std::pair<Argument*, IndicesVector>, LoadInst*>
for OriginalLoads, so each Argument will write to different parts of the map.

PR17906

llvm-svn: 194846

bc37658a

[asan] use GlobalValue::PrivateLinkage for coverage guard to save quite a bit of code size · 0604c62d
Kostya Serebryany authored Nov 15, 2013
```
llvm-svn: 194800
```
0604c62d

Reapply "[asan] Poor man's coverage that works with ASan" · da4147c7

Bob Wilson authored Nov 15, 2013

I was able to successfully run a bootstrapped LTO build of clang with
r194701, so this change does not seem to be the cause of our failing
buildbots.

llvm-svn: 194789

da4147c7

Add instcombine visitor for addrspacecast · a9e95abc
Matt Arsenault authored Nov 15, 2013
```
llvm-svn: 194786
```
a9e95abc

Revert "[asan] Poor man's coverage that works with ASan" · ae73587c

Bob Wilson authored Nov 15, 2013

This reverts commit 194701. Apple's bootstrapped LTO builds have been failing,
and this change (along with compiler-rt 194702-194704) is the only thing on
the blamelist.  I will either reappy these changes or help debug the problem,
depending on whether this fixes the buildbots.

llvm-svn: 194780

ae73587c

Nov 14, 2013

[asan] Poor man's coverage that works with ASan · 6da3f740
Kostya Serebryany authored Nov 14, 2013
```
llvm-svn: 194701
```
6da3f740

[msan] Fast path optimization for wrap-indirect-calls feature of MemorySanitizer. · 585813e3

Evgeniy Stepanov authored Nov 14, 2013

Indirect call wrapping helps MSanDR (dynamic instrumentation companion tool
for MSan) to catch all cases where execution leaves a compiler-instrumented
module by allowing the tool to rewrite targets of indirect calls.

This change is an optimization that skips wrapping for calls when target is
inside the current module. This relies on the linker providing symbols at the
begin and end of the module code (or code + data, does not really matter).
Gold linker provides such symbols by default. GNU (BFD) linker needs a link
flag: -Wl,--defsym=__executable_start=0.

More info:
https://code.google.com/p/memory-sanitizer/wiki/MSanDR#Native_exec

llvm-svn: 194697

585813e3

Nov 13, 2013

Use StringRef instead of std::string · 86a7492f
Jakub Staszak authored Nov 13, 2013
```
llvm-svn: 194601
```
86a7492f
Fix -Wdelete-non-virtual-dtor warnings by making SampleProfile methods non-virtual · aa19c0a1
Alexey Samsonov authored Nov 13, 2013
```
llvm-svn: 194568
```
aa19c0a1

SampleProfileLoader pass. Initial setup. · 8d6568b5

Diego Novillo authored Nov 13, 2013

This adds a new scalar pass that reads a file with samples generated
by 'perf' during runtime. The samples read from the profile are
incorporated and emmited as IR metadata reflecting that profile.

The profile file is assumed to have been generated by an external
profile source. The profile information is converted into IR metadata,
which is later used by the analysis routines to estimate block
frequencies, edge weights and other related data.

External profile information files have no fixed format, each profiler
is free to define its own. This includes both the on-disk representation
of the profile and the kind of profile information stored in the file.
A common kind of profile is based on sampling (e.g., perf), which
essentially counts how many times each line of the program has been
executed during the run.

The SampleProfileLoader pass is organized as a scalar transformation.
On startup, it reads the file given in -sample-profile-file to
determine what kind of profile it contains.  This file is assumed to
contain profile information for the whole application. The profile
data in the file is read and incorporated into the internal state of
the corresponding profiler.

To facilitate testing, I've organized the profilers to support two file
formats: text and native. The native format is whatever on-disk
representation the profiler wants to support, I think this will mostly
be bitcode files, but it could be anything the profiler wants to
support. To do this, every profiler must implement the
SampleProfile::loadNative() function.

The text format is mostly meant for debugging. Records are separated by
newlines, but each profiler is free to interpret records as it sees fit.
Profilers must implement the SampleProfile::loadText() function.

Finally, the pass will call SampleProfile::emitAnnotations() for each
function in the current translation unit. This function needs to
translate the loaded profile into IR metadata, which the analyzer will
later be able to use.

This patch implements the first steps towards the above design. I've
implemented a sample-based flat profiler. The format of the profile is
fairly simplistic. Each sampled function contains a list of relative
line locations (from the start of the function) together with a count
representing how many samples were collected at that line during
execution. I generate this profile using perf and a separate converter
tool.

Currently, I have only implemented a text format for these profiles. I
am interested in initial feedback to the whole approach before I send
the other parts of the implementation for review.

This patch implements:

- The SampleProfileLoader pass.
- The base ExternalProfile class with the core interface.
- A SampleProfile sub-class using the above interface. The profiler
  generates branch weight metadata on every branch instructions that
  matches the profiles.
- A text loader class to assist the implementation of
  SampleProfile::loadText().
- Basic unit tests for the pass.

Additionally, the patch uses profile information to compute branch
weights based on instruction samples.

This patch converts instruction samples into branch weights. It
does a fairly simplistic conversion:

Given a multi-way branch instruction, it calculates the weight of
each branch based on the maximum sample count gathered from each
target basic block.

Note that this assignment of branch weights is somewhat lossy and can be
misleading. If a basic block has more than one incoming branch, all the
incoming branches will get the same weight. In reality, it may be that
only one of them is the most heavily taken branch.

I will adjust this assignment in subsequent patches.

llvm-svn: 194566

8d6568b5

Update the docs to match the function name. · ea186b95
Nadav Rotem authored Nov 13, 2013
```
llvm-svn: 194537
```
ea186b95

Nov 12, 2013

Fold (iszero(A&K1) | iszero(A&K2)) -> (A&(K1|K2)) != (K1|K2) if we know that... · 0ed2fdb5

Nadav Rotem authored Nov 12, 2013

Fold (iszero(A&K1) | iszero(A&K2)) ->  (A&(K1|K2)) != (K1|K2) if we know that K1 and K2 are 'one-hot' (only one bit is on).

llvm-svn: 194525

0ed2fdb5

FoldBranchToCommonDest merges branches into a single branch with or/and of the... · 53d32211

Nadav Rotem authored Nov 12, 2013

FoldBranchToCommonDest merges branches into a single branch with or/and of the condition. It has a heuristics for estimating when some of the dependencies are processed by out-of-order processors. This patch adds another rule to the heuristics that says that if the "BonusInstruction" that we speculatively execute is used by the condition of the second branch then it is okay to hoist it. This change exposes more opportunities for other passes to transform the code. It does not matter that much that we if-convert the code because the selectiondag builder splits or/and branches into multiple branches when profitable.

llvm-svn: 194524

53d32211

Corruptly merge constants with explicit and implicit alignments. · dd8757ab

Rafael Espindola authored Nov 12, 2013

Constant merge can merge a constant with implicit alignment with one that has
explicit alignment. Before this change it was assuming that the explicit
alignment was higher than the implicit one, causing the result to be under
aligned in some cases.

Fixes pr17815.

Patch by Chris Smowton!

llvm-svn: 194506

dd8757ab

SimplifyCFG: Use existing constant folding logic when forming switch tables. · 7c30260a
Benjamin Kramer authored Nov 12, 2013
```
Both simpler and more powerful than the hand-rolled folding logic.

llvm-svn: 194475
```
7c30260a
Correct a glitch in r194424 which may invalidate iterator. · f1ec34bd
Shuxin Yang authored Nov 12, 2013
```
llvm-svn: 194457
```
f1ec34bd
llvm-cov: Added call to update run/program counts. · 062f24c9
Yuchen Wu authored Nov 12, 2013
```
Also updated test files that were generated from this change.

llvm-svn: 194453
```
062f24c9

Nov 11, 2013

Fix PR17952. · 3168ab33

Shuxin Yang authored Nov 11, 2013

  The symptom is that an assertion is triggered. The assertion was added by
me to detect the situation when value is propagated from dead blocks.
(We can certainly get rid of assertion; it is safe to do so, because propagating
 value from dead block to alive join node is certainly ok.)

  The root cause of this bug is : edge-splitting is conducted on the fly,
the edge being split could be a dead edge, therefore the block that 
split the critial edge needs to be flagged "dead" as well.

  There are 3 ways to fix this bug:
  1) Get rid of the assertion as I mentioned eariler 
  2) When an dead edge is split, flag the inserted block "dead".
  3) proactively split the critical edges connecting dead and live blocks when
     new dead blocks are revealed.

  This fix go for 3) with additional 2 LOC.

  Testing case was added by Rafael the other day.

llvm-svn: 194424

3168ab33

Move debug message in vectorizer · 3f67a7de
Renato Golin authored Nov 11, 2013
```
No functional change, just better reporting.

llvm-svn: 194388
```
3f67a7de
[msan] Propagate origin for insertvalue, extractvalue. · 560e0893
Evgeniy Stepanov authored Nov 11, 2013
```
llvm-svn: 194374
```
560e0893

Nov 10, 2013

Revert "Resurrect r191017 " GVN proceeds in the presence of dead code" plus a... · fed6c220

Bill Wendling authored Nov 10, 2013

Revert "Resurrect r191017 " GVN proceeds in the presence of dead code" plus a fix to PR17307 & 17308."

This causes PR17852.

This reverts commit d93e8a06b2ca09ab18f390cd514b7443e2e571f7.

Conflicts:
	test/Transforms/GVN/cond_br2.ll

llvm-svn: 194348

fed6c220

Use type form of getIntPtrType. · c900303e

Matt Arsenault authored Nov 10, 2013

This should be inconsequential and is work
towards removing the default address space
arguments.

llvm-svn: 194347

c900303e

SimplifyCFG has a heuristics for out-of-order processors that decides when it... · 5ba1c6ce

Nadav Rotem authored Nov 10, 2013

SimplifyCFG has a heuristics for out-of-order processors that decides when it is worthwhile to merge branches. It tries to estimate if the operands of the instruction that we want to hoist are ready. This commit marks function arguments as 'ready' because they require no calculation. This boosts libquantum and a few other workloads from the testsuite.
  

llvm-svn: 194346

5ba1c6ce

Teach MergeFunctions about address spaces · 5bcefabc
Matt Arsenault authored Nov 10, 2013
```
llvm-svn: 194342
```
5bcefabc

Nov 08, 2013

Remove dead code from LoopUnswitch · 1a642aef

Hal Finkel authored Nov 08, 2013

LoopUnswitch's code simplification routine has logic to convert conditional
branches into unconditional branches, after unswitching makes the condition
constant, and then remove any blocks that renders dead. Unfortunately, this
code is dead, currently broken, and furthermore, has never been alive (at least
as far back at 2006).

No functionality change intended.

llvm-svn: 194277

1a642aef

Nov 05, 2013

[objc-arc] Convert the one directional retain/release relation assert to a... · 24b2f6fd

Michael Gottesman authored Nov 05, 2013

[objc-arc] Convert the one directional retain/release relation assert to a conditional check + fail.

Due to the previously added overflow checks, we can have a retain/release
relation that is one directional. This occurs specifically when we run into an
additive overflow causing us to drop state in only one direction. If that
occurs, we should bail and not optimize that retain/release instead of
asserting.

Apologies for the size of the testcase. It is necessary to cause the additive
cfg overflow to trigger.

rdar://15377890

llvm-svn: 194083

24b2f6fd

Add a runtime unrolling parameter to the LoopUnroll pass constructor · 081eaef6

Hal Finkel authored Nov 05, 2013

As with the other loop unrolling parameters (the unrolling threshold, partial
unrolling, etc.) runtime unrolling can now also be controlled via the
constructor. This will be necessary for moving non-trivial unrolling late in
the pass manager (after loop vectorization).

No functionality change intended.

llvm-svn: 194027

081eaef6

Nov 04, 2013
- Remove dead code · d1382b6c
  Shuxin Yang authored Nov 04, 2013
```
llvm-svn: 194017
```
  d1382b6c
- SLPVectorizer: Use properlyDominates to satisfy the irreflexivity of a strict weak ordering. · 9e7f7c7f
  Benjamin Kramer authored Nov 04, 2013
```
STL debug mode checks this.

llvm-svn: 194015
```
  9e7f7c7f
- Scalarize select vector arguments when extracted. · 243140f2
  Matt Arsenault authored Nov 04, 2013
```
When the elements are extracted from a select on vectors
or a vector select, do the select on the extracted scalars
from the input if there is only one use.

llvm-svn: 194013
```
  243140f2
Nov 03, 2013

SLPVectorizer: Add a missing pair of parens. No functionality change. · 191ba00b
Benjamin Kramer authored Nov 03, 2013
```
llvm-svn: 193958
```
191ba00b

SLPVectorizer: When CSEing generated gathers only scan blocks containing them. · 91e8f3c3

Benjamin Kramer authored Nov 03, 2013

Instead of doing a RPO traversal of the whole function remember the blocks
containing gathers (typically <= 2) and scan them in dominator-first order.

The actual CSE is still quadratic, but I'm not confident that adding a
scoped hash table here is worth it as we're only looking at the generated
instructions and not arbitrary code.

llvm-svn: 193956

91e8f3c3

Revert "Inliner: Handle readonly attribute per argument when adding memcpy" · 120f4a06

David Majnemer authored Nov 03, 2013

This reverts commit r193356, it caused PR17781.

A reduced test case covering this regression has been added to the test suite.

llvm-svn: 193955

120f4a06