Commits · 68dc3c7ab2ccaa4a5a1459ba506ef340ed23a5ed · Lorenzo Albano / LLVM bpEVL

Oct 16, 2014

Preserve non-byval pointer alignment attributes using @llvm.assume when inlining · 68dc3c7a

Hal Finkel authored Oct 15, 2014

For pointer-typed function arguments, enhanced alignment can be asserted using
the 'align' attribute. When inlining, if this enhanced alignment information is
not otherwise available, preserve it using @llvm.assume-based alignment
assumptions.

llvm-svn: 219876

68dc3c7a

Oct 14, 2014

Optimize away fabs() calls when input is squared (known positive). · 0ca42bb5

Sanjay Patel authored Oct 14, 2014

Eliminate library calls and intrinsic calls to fabs when the input 
is a squared value.

Note that no unsafe-math / fast-math assumptions are needed for
this optimization.

Differential Revision: http://reviews.llvm.org/D5777

llvm-svn: 219717

0ca42bb5

Switch to select optimization for two-case switches · 5bbe3df6

Marcello Maggioni authored Oct 14, 2014

This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block
and a test to check that this condition is handled.

llvm-svn: 219656

5bbe3df6

Oct 12, 2014
- Revert r219223, it creates invalid PHI nodes. · 5ca10d0e
  Joerg Sonnenberger authored Oct 12, 2014
```
llvm-svn: 219587
```
  5ca10d0e
Oct 10, 2014

SimplifyCFG: Don't convert phis into selects if we could remove undef behavior · d7d010eb

Arnold Schwaighofer authored Oct 10, 2014

instead

We used to transform this:

  define void @test6(i1 %cond, i8* %ptr) {
  entry:
    br i1 %cond, label %bb1, label %bb2

  bb1:
    br label %bb2

  bb2:
    %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ]
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

into this:

  define void @test6(i1 %cond, i8* %ptr) {
    %ptr.2 = select i1 %cond, i8* null, i8* %ptr
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

because the simplifycfg transformation into selects would happen to happen
before the simplifycfg transformation that removes unreachable control flow
(We have 'unreachable control flow' due to the store to null which is undefined
behavior).

The existing transformation that removes unreachable control flow in simplifycfg
is:

  /// If BB has an incoming value that will always trigger undefined behavior
  /// (eg. null pointer dereference), remove the branch leading here.
  static bool removeUndefIntroducingPredecessor(BasicBlock *BB)

Now we generate:

  define void @test6(i1 %cond, i8* %ptr) {
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

I did not see any impact on the test-suite + externals.

rdar://18596215

llvm-svn: 219462

d7d010eb

Oct 07, 2014

LoopUnroll: Create sub-loops in LoopInfo · c46cfcbb

Duncan P. N. Exon Smith authored Oct 07, 2014

`LoopUnrollPass` says that it preserves `LoopInfo` -- make it so.  In
particular, tell `LoopInfo` about copies of inner loops when unrolling
the outer loop.

Conservatively, also tell `ScalarEvolution` to forget about the original
versions of these loops, since their inputs may have changed.

Fixes PR20987.

llvm-svn: 219241

c46cfcbb

LoopUnroll: Only check for ScalarEvolution analysis once, NFC · 9b4d37e8
Duncan P. N. Exon Smith authored Oct 07, 2014
```
A follow-up commit will add use to a tight loop.  We might as well just
find it once anyway.

llvm-svn: 219239
```
9b4d37e8

Two case switch to select optimization · 963bc87d

Marcello Maggioni authored Oct 07, 2014

This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block
to a select or a couple of selects (depending if the default block is reachable or not).

The typical case this optimization wants to be able to optimize is this one:

Example:
switch (a) {
  case 10:                %0 = icmp eq i32 %a, 10
    return 10;            %1 = select i1 %0, i32 10, i32 4
  case 20:        ---->   %2 = icmp eq i32 %a, 20
    return 2;             %3 = select i1 %2, i32 2, i32 %1
  default:
    return 4;
}

It also sets the base for further optimizations that are planned and being reviewed.

llvm-svn: 219223

963bc87d

LoopUnroll: Change code order of changes to new basic blocks · e5d7d979

Duncan P. N. Exon Smith authored Oct 06, 2014

Add new basic blocks to `LoopInfo` earlier.  No functionality change
intended (simplifies upcoming bugfix patch).

llvm-svn: 219150

e5d7d979

Sink comment, NFC · 0bbf5418
Duncan P. N. Exon Smith authored Oct 06, 2014
```
llvm-svn: 219149
```
0bbf5418

Oct 01, 2014

DIBuilder: Encapsulate DIExpression's element type · 611afb22

Duncan P. N. Exon Smith authored Oct 01, 2014

`DIExpression`'s elements are 64-bit integers that are stored as
`ConstantInt`.  The accessors already encapsulate the storage.  This
commit updates the `DIBuilder` API to also encapsulate that.

llvm-svn: 218797

611afb22

Move the complex address expression out of DIVariable and into an extra · 87b7eb9d

Adrian Prantl authored Oct 01, 2014

argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787

87b7eb9d

Revert r218778 while investigating buldbot breakage. · b458dc2e
Adrian Prantl authored Oct 01, 2014
```
"Move the complex address expression out of DIVariable and into an extra"

llvm-svn: 218782
```
b458dc2e

Move the complex address expression out of DIVariable and into an extra · 25a7174e

Adrian Prantl authored Oct 01, 2014

argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

llvm-svn: 218778

25a7174e

C API: Add LLVMCloneModule() · 0a4e9a3b
Tom Stellard authored Oct 01, 2014
```
llvm-svn: 218775
```
0a4e9a3b

[SimplifyCFG] threshold for folding branches with common destination · fc029670

Jingyue Wu authored Sep 30, 2014

Summary:
This patch adds a threshold that controls the number of bonus instructions
allowed for folding branches with common destination. The original code allows
at most one bonus instruction. With this patch, users can customize the
threshold to allow multiple bonus instructions. The default threshold is still
1, so that the code behaves the same as before when users do not specify this
threshold.

The motivation of this change is that tuning this threshold significantly (up
to 25%) improves the performance of some CUDA programs in our internal code
base. In general, branch instructions are very expensive for GPU programs.
Therefore, it is sometimes worth trading more arithmetic computation for a more
straightened control flow. Here's a reduced example:

  __global__ void foo(int a, int b, int c, int d, int e, int n,
                      const int *input, int *output) {
    int sum = 0;
    for (int i = 0; i < n; ++i)
      sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
    *output = sum;
  }

The select statement in the loop body translates to two branch instructions "if
((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
With the default threshold, SimplifyCFG is unable to fold them, because
computing the condition of the second branch "(i | c) ^ d > e" requires two
bonus instructions. With the threshold increased, SimplifyCFG can fold the two
branches so that the loop body contains only one branch, making the code
conceptually look like:

  sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];

Increasing the threshold significantly improves the performance of this
particular example. In the configuration where both conditions are guaranteed
to be true, increasing the threshold from 1 to 2 improves the performance by
18.24%. Even in the configuration where the first condition is false and the
second condition is true, which favors shortcuts, increasing the threshold from
1 to 2 still improves the performance by 4.35%.

We are still looking for a good threshold and maybe a better cost model than
just counting the number of bonus instructions. However, according to the above
numbers, we think it is at least worth adding a threshold to enable more
experiments and tuning. Let me know what you think. Thanks!

Test Plan: Added one test case to check the threshold is in effect

Reviewers: nadav, eliben, meheff, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D5529

llvm-svn: 218711

fc029670

Sep 29, 2014

Use a loop to simplify the runtime unrolling prologue. · fc02e3c3

Kevin Qin authored Sep 29, 2014

Runtime unrolling will create a prologue to execute the extra
iterations which is can't divided by the unroll factor. It
generates an if-then-else sequence to jump into a factor -1
times unrolled loop body, like

    extraiters = tripcount % loopfactor
    if (extraiters == 0) jump Loop:
    if (extraiters == loopfactor) jump L1
    if (extraiters == loopfactor-1) jump L2
    ...
    L1:  LoopBody;
    L2:  LoopBody;
    ...
    if tripcount < loopfactor jump End
    Loop:
    ...
    End:

It means if the unroll factor is 4, the loop body will be 7
times unrolled, 3 are in loop prologue, and 4 are in the loop.
This commit is to use a loop to execute the extra iterations
in prologue, like

        extraiters = tripcount % loopfactor
        if (extraiters == 0) jump Loop:
        else jump Prol
 Prol:  LoopBody;
        extraiters -= 1                 // Omitted if unroll factor is 2.
        if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2.
        if (tripcount < loopfactor) jump End
 Loop:
 ...
 End:

Then when unroll factor is 4, the loop body will be copied by
only 5 times, 1 in the prologue loop, 4 in the original loop.
And if the unroll factor is 2, new loop won't be created, just
as the original solution.

llvm-svn: 218604

fc02e3c3

Sep 24, 2014

GlobalOpt: Preserve comdats of unoptimized initializers · 78927e88

Reid Kleckner authored Sep 23, 2014

Rather than slurping in and splatting out the whole ctor list, preserve
the existing array entries without trying to understand them.  Only
remove the entries that we know we can optimize away.  This way we don't
need to wire through priority and comdats or anything else we might add.

Fixes a linker issue where the .init_array or .ctors entry would point
to discarded initialization code if the comdat group from the TU with
the faulty global_ctors entry was dropped.

llvm-svn: 218337

78927e88

Sep 17, 2014

Fixing a build error. · cf93cbb7
Chris Bieneman authored Sep 17, 2014
```
llvm-svn: 217983
```
cf93cbb7

Refactoring SimplifyLibCalls to remove static initializers and generally cleaning up the code. · ad070d05

Chris Bieneman authored Sep 17, 2014

Summary: This eliminates ~200 lines of code mostly file scoped struct definitions that were unnecessary.

Reviewers: chandlerc, resistor

Reviewed By: resistor

Subscribers: morisset, resistor, llvm-commits

Differential Revision: http://reviews.llvm.org/D5364

llvm-svn: 217982

ad070d05

Sep 15, 2014

Remove dead code in SimplifyCFG · b67140b8

Jingyue Wu authored Sep 15, 2014

Summary: UsedByBranch is always true according to how BonusInst is defined.

Test Plan:
Passes check-all, and also verified 

if (BonusInst && !UsedByBranch) {
  ...
}

is never entered during check-all.

Reviewers: resistor, nadav, jingyue

Reviewed By: jingyue

Subscribers: llvm-commits, eliben, meheff

Differential Revision: http://reviews.llvm.org/D5324

llvm-svn: 217824

b67140b8

Sep 13, 2014
- Simplify code. No functionality change. · 0bd147da
  Benjamin Kramer authored Sep 13, 2014
```
llvm-svn: 217726
```
  0bd147da
Sep 07, 2014

Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.) · 60db0589

Hal Finkel authored Sep 07, 2014

This change, which allows @llvm.assume to be used from within computeKnownBits
(and other associated functions in ValueTracking), adds some (optional)
parameters to computeKnownBits and friends. These functions now (optionally)
take a "context" instruction pointer, an AssumptionTracker pointer, and also a
DomTree pointer, and most of the changes are just to pass this new information
when it is easily available from InstSimplify, InstCombine, etc.

As explained below, the significant conceptual change is that known properties
of a value might depend on the control-flow location of the use (because we
care that the @llvm.assume dominates the use because assumptions have
control-flow dependencies). This means that, when we ask if bits are known in a
value, we might get different answers for different uses.

The significant changes are all in ValueTracking. Two main changes: First, as
with the rest of the code, new parameters need to be passed around. To make
this easier, I grouped them into a structure, and I made internal static
versions of the relevant functions that take this structure as a parameter. The
new code does as you might expect, it looks for @llvm.assume calls that make
use of the value we're trying to learn something about (often indirectly),
attempts to pattern match that expression, and uses the result if successful.
By making use of the AssumptionTracker, the process of finding @llvm.assume
calls is not expensive.

Part of the structure being passed around inside ValueTracking is a set of
already-considered @llvm.assume calls. This is to prevent a query using, for
example, the assume(a == b), to recurse on itself. The context and DT params
are used to find applicable assumptions. An assumption needs to dominate the
context instruction, or come after it deterministically. In this latter case we
only handle the specific case where both the assumption and the context
instruction are in the same block, and we need to exclude assumptions from
being used to simplify their own ephemeral values (those which contribute only
to the assumption) because otherwise the assumption would prove its feeding
comparison trivial and would be removed.

This commit adds the plumbing and the logic for a simple masked-bit propagation
(just enough to write a regression test). Future commits add more patterns
(and, correspondingly, more regression tests).

llvm-svn: 217342

60db0589

Add an Assumption-Tracking Pass · 74c2f355

Hal Finkel authored Sep 07, 2014

This adds an immutable pass, AssumptionTracker, which keeps a cache of
@llvm.assume call instructions within a module. It uses callback value handles
to keep stale functions and intrinsics out of the map, and it relies on any
code that creates new @llvm.assume calls to notify it of the new instructions.
The benefit is that code needing to find @llvm.assume intrinsics can do so
directly, without scanning the function, thus allowing the cost of @llvm.assume
handling to be negligible when none are present.

The current design is intended to be lightweight. We don't keep track of
anything until we need a list of assumptions in some function. The first time
this happens, we scan the function. After that, we add/remove @llvm.assume
calls from the cache in response to registration calls and ValueHandle
callbacks.

There are no new direct test cases for this pass, but because it calls it
validation function upon module finalization, we'll pick up detectable
inconsistencies from the other tests that touch @llvm.assume calls.

This pass will be used by follow-up commits that make use of @llvm.assume.

llvm-svn: 217334

74c2f355

Sep 04, 2014
- Enable noalias metadata by default and swap the order of the SLP and Loop vectorizers by default. · 6b95d8ed
  James Molloy authored Sep 04, 2014
```
After some time maturing, hopefully the flags themselves will be removed.

llvm-svn: 217144
```
  6b95d8ed
Sep 01, 2014

Feed AA to the inliner and use AA->getModRefBehavior in AddAliasScopeMetadata · 0c083024

Hal Finkel authored Sep 01, 2014

This feeds AA through the IFI structure into the inliner so that
AddAliasScopeMetadata can use AA->getModRefBehavior to figure out which
functions only access their arguments (instead of just hard-coding some
knowledge of memory intrinsics). Most of the information is only available from
BasicAA; this is important for preserving alias scoping information for
target-specific intrinsics when doing the noalias parameter attribute to
metadata conversion.

llvm-svn: 216866

0c083024

Fix AddAliasScopeMetadata again - alias.scope must be a complete description · cbb85f24

Hal Finkel authored Sep 01, 2014

I thought that I had fixed this problem in r216818, but I did not do a very
good job. The underlying issue is that when we add alias.scope metadata we are
asserting that this metadata completely describes the aliasing relationships
within the current aliasing scope domain, and so in the context of translating
noalias argument attributes, the pointers must all be based on noalias
arguments (as underlying objects) and have no other kind of underlying object.
In r216818 excluding appropriate accesses from getting alias.scope metadata is
done by looking for underlying objects that are not identified function-local
objects -- but that's wrong because allocas, etc. are also function-local
objects and we need to explicitly check that all underlying objects are the
noalias arguments for which we're adding metadata aliasing scopes.

This fixes the underlying-object check for adding alias.scope metadata, and
does some refactoring of the related capture-checking eligibility logic (and
adds more comments; hopefully making everything a bit clearer).

Fixes self-hosting on x86_64 with -mllvm -enable-noalias-to-md-conversion (the
feature is still disabled by default).

llvm-svn: 216863

cbb85f24

Aug 30, 2014

Fix AddAliasScopeMetadata to not add scopes when deriving from unknown pointers · a3708df4

Hal Finkel authored Aug 30, 2014

The previous implementation of AddAliasScopeMetadata, which adds noalias
metadata to preserve noalias parameter attribute information when inlining had
a flaw: it would add alias.scope metadata to accesses which might have been
derived from pointers other than noalias function parameters. This was
incorrect because even some access known not to alias with all noalias function
parameters could easily alias with an access derived from some other pointer.
Instead, when deriving from some unknown pointer, we cannot add alias.scope
metadata at all. This fixes a miscompile of the test-suite's tramp3d-v4.
Furthermore, we cannot add alias.scope to functions unless we know they
access only argument-derived pointers (currently, we know this only for
memory intrinsics).

Also, we fix a theoretical problem with using the NoCapture attribute to skip
the capture check. This is incorrect (as explained in the comment added), but
would not matter in any code generated by Clang because we get only inferred
nocapture attributes in Clang-generated IR.

This functionality is not yet enabled by default.

llvm-svn: 216818

a3708df4

Aug 29, 2014
- Fix a typo in AddAliasScopeMetadata · 2d3d6da4
  Hal Finkel authored Aug 29, 2014
```
llvm-svn: 216741
```
  2d3d6da4
Aug 27, 2014

Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or just... · e1d12948

Craig Topper authored Aug 27, 2014

Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or just letting them be implicitly created.

llvm-svn: 216525

e1d12948

Aug 25, 2014

Remove dangling initializers in GlobalDCE · e2a1fa35

Bruno Cardoso Lopes authored Aug 25, 2014

GlobalDCE deletes global vars and updates their initializers to nullptr
while leaving underlying constants to be cleaned up later by its uses.
The clean up may never happen, fix this by forcing it every time it's
safe to destroy constants.

Final patch by Rafael Espindola
http://reviews.llvm.org/D4931

<rdar://problem/17523868>

llvm-svn: 216390

e2a1fa35

Use range based for loops to avoid needing to re-mention SmallPtrSet size. · 4627679c
Craig Topper authored Aug 24, 2014
```
llvm-svn: 216351
```
4627679c

Aug 22, 2014

Use DILexicalBlockFile, rather than DILexicalBlock, to track discriminator... · 2f3f76fd

David Blaikie authored Aug 21, 2014

Use DILexicalBlockFile, rather than DILexicalBlock, to track discriminator changes to ensure discriminator changes don't introduce new DWARF DW_TAG_lexical_blocks.

Somewhat unnoticed in the original implementation of discriminators, but
it could cause instructions to end up in new, small,
DW_TAG_lexical_blocks due to the use of DILexicalBlock to track
discriminator changes.

Instead, use DILexicalBlockFile which we already use to track file
changes without introducing new scopes, so it works well to track
discriminator changes in the same way.

llvm-svn: 216239

2f3f76fd

Aug 21, 2014
- Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. · 71b7b68b
  Craig Topper authored Aug 21, 2014
```
llvm-svn: 216158
```
  71b7b68b
Aug 18, 2014
- Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid... · 6230691c
  Craig Topper authored Aug 18, 2014
```
Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size."

Getting a weird buildbot failure that I need to investigate.

llvm-svn: 215870
```
  6230691c
- Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. · 5229cfd1
  Craig Topper authored Aug 17, 2014
```
llvm-svn: 215868
```
  5229cfd1
Aug 15, 2014

Introduce a helper to combine instruction metadata. · ea46c32f

Rafael Espindola authored Aug 15, 2014

Replace the old code in GVN and BBVectorize with it. Update SimplifyCFG to use
it.

Patch by Björn Steinbrink!

llvm-svn: 215723

ea46c32f

Aug 14, 2014

Copy noalias metadata from call sites to inlined instructions · 61c38612

Hal Finkel authored Aug 14, 2014

When a call site with noalias metadata is inlined, that metadata can be
propagated directly to the inlined instructions (only those that might access
memory because it is not useful on the others). Prior to inlining, the noalias
metadata could express that a call would not alias with some other memory
access, which implies that no instruction within that called function would
alias. By propagating the metadata to the inlined instructions, we preserve
that knowledge.

This should complete the enhancements requested in PR20500.

llvm-svn: 215676

61c38612

Add noalias metadata for general calls (not just memory intrinsics) during inlining · d2dee16c

Hal Finkel authored Aug 14, 2014

When preserving noalias function parameter attributes by adding noalias
metadata in the inliner, we should do this for general function calls (not just
memory intrinsics). The logic is very similar to what already existed (except
that we want to add this metadata even for functions taking no relevant
parameters). This metadata can be used by ModRef queries in the caller after
inlining.

This addresses the first part of PR20500. Adding noalias metadata during
inlining is still turned off by default.

llvm-svn: 215657

d2dee16c

Aug 13, 2014

utils: Fix segfault in flattencfg · 0cd3ec6c

Jan Vesely authored Aug 13, 2014



v2: continue iterating through the rest of the bb
    use for loop

v3: initialize FlattenCFG pass in ScalarOps
    add test

v4: split off initializing flattencfg to a separate patch
    add comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215574

0cd3ec6c