Commits · 5b4c837c588f5aeff9f10af5a79b5377fd39ed09 · Lorenzo Albano / LLVM bpEVL

Oct 13, 2015

TransformUtils: Remove implicit ilist iterator conversions, NFC · 5b4c837c

Duncan P. N. Exon Smith authored Oct 13, 2015

Continuing the work from last week to remove implicit ilist iterator
conversions.  First related commit was probably r249767, with some more
motivation in r249925.  This edition gets LLVMTransformUtils compiling
without the implicit conversions.

No functional change intended.

llvm-svn: 250142

5b4c837c

Oct 12, 2015

Update the branch weight metadata in JumpThreading pass. · 3320bcd8

Cong Hou authored Oct 12, 2015

In JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).

Differential revision: http://reviews.llvm.org/D10979

llvm-svn: 250089

3320bcd8

GlobalOpt does not treat externally_initialized globals correctly · 939724cd

Oliver Stannard authored Oct 12, 2015

GlobalOpt currently merges stores into the initialisers of internal,
externally_initialized globals, but should not do so as the value of the global
may change between the initialiser and any code in the module being run.

llvm-svn: 250035

939724cd

[LoopVectorize] Shrink integer operations into the smallest type possible · 55d633bd

James Molloy authored Oct 12, 2015

C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int
type (e.g. i32) whenever arithmetic is performed on them.

For targets with native i8 or i16 operations, usually InstCombine can shrink
the arithmetic type down again. However InstCombine refuses to create illegal
types, so for targets without i8 or i16 registers, the lengthening and
shrinking remains.

Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when
their scalar equivalents do not, so during vectorization it is important to
remove these lengthens and truncates when deciding the profitability of
vectorization.

The algorithm this uses starts at truncs and icmps, trawling their use-def
chains until they terminate or instructions outside the loop are found (or
unsafe instructions like inttoptr casts are found). If the use-def chains
starting from different root instructions (truncs/icmps) meet, they are
unioned. The demanded bits of each node in the graph are ORed together to form
an overall mask of the demanded bits in the entire graph. The minimum bitwidth
that graph can be truncated to is the bitwidth minus the number of leading
zeroes in the overall mask.

The intention is that this algorithm should "first do no harm", so it will
never insert extra cast instructions. This is why the use-def graphs are
unioned, so that subgraphs with different minimum bitwidths do not need casts
inserted between them.

This algorithm works hard to reduce compile time impact. DemandedBits are only
queried if there are extends of illegal types and if a truncate to an illegal
type is seen. In the general case, this results in a simple linear scan of the
instructions in the loop.

No non-noise compile time impact was seen on a clang bootstrap build.

llvm-svn: 250032

55d633bd

Oct 11, 2015
- [InstCombine][X86][XOP] Combine XOP integer vector comparisons to native IR · 1d1c56e2
  Simon Pilgrim authored Oct 11, 2015
```
We now have lowering support for XOP PCOM/PCOMU instructions.

llvm-svn: 249977
```
  1d1c56e2
Oct 10, 2015
- [IndVars] Use `auto`; NFC · cc16ccc1
  Sanjoy Das authored Oct 10, 2015
```
llvm-svn: 249944
```
  cc16ccc1
Oct 09, 2015

Generalize convergent check to handle invokes as well as calls. · 97ca0f3f
Owen Anderson authored Oct 09, 2015
```
llvm-svn: 249892
```
97ca0f3f

Teach LoopUnswitch not to perform non-trivial unswitching on loops containing... · 2c9978b1

Owen Anderson authored Oct 09, 2015

Teach LoopUnswitch not to perform non-trivial unswitching on loops containing convergent operations.

Doing so could cause the post-unswitching convergent ops to be
control-dependent on the unswitch condition where they were not before.
This check could be refined to allow unswitching where the convergent
operation was already control-dependent on the unswitch condition.

llvm-svn: 249874

2c9978b1

Refine the definition of convergent to only disallow the addition of new control dependencies. · d95b08a0

Owen Anderson authored Oct 09, 2015

This covers the common case of operations that cannot be sunk.
Operations that cannot be hoisted should already be handled properly via
the safe-to-speculate rules and mechanisms.

llvm-svn: 249865

d95b08a0

Make HeaderLineno a local variable. · 41dc5a6e

Dehao Chen authored Oct 09, 2015

http://reviews.llvm.org/D13576

As we are using hierarchical profile, there is no need to keep HeaderLineno a member variable. This is because each level of the inline stack will have its own header lineno. One should use the head lineno of its own inline stack level instead of the actual symbol.

llvm-svn: 249848

41dc5a6e

[MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls. · 99493df2

Andrea Di Biagio authored Oct 09, 2015

Pass MemCpyOpt doesn't check if a store instruction is nontemporal.
As a consequence, adjacent nontemporal stores are always merged into a
memset call.

Example:

;;;
define void @foo(<4 x float>* nocapture %p) {
entry:
  store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0
  %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
  store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0
  ret void
}

!0 = !{i32 1}
;;;

In this example, the two nontemporal stores are combined to a memset of zero
which does not preserve the nontemporal hint. Later on the backend (tested on a
x86-64 corei7) expands that memset call into a sequence of two normal 16-byte
aligned vector stores.

opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o -

Before:
  xorps  %xmm0, %xmm0
  movaps  %xmm0, 16(%rdi)
  movaps  %xmm0, (%rdi)

With this patch, we no longer merge nontemporal stores into calls to memset.
In this example, llc correctly expands the two stores into two movntps:
  xorps  %xmm0, %xmm0
  movntps %xmm0, 16(%rdi)
  movntps  %xmm0, (%rdi)

In theory, we could extend the usage of !nontemporal metadata to memcpy/memset
calls. However a change like that would only have the effect of forcing the
backend to expand !nontemporal memsets back to sequences of store instructions.
A memset library call would not have exactly the same semantic of a builtin
!nontemporal memset call. So, SelectionDAG will have to conservatively expand
it back to a sequence of !nontemporal stores (effectively undoing the merging).

Differential Revision: http://reviews.llvm.org/D13519

llvm-svn: 249820

99493df2

[EarlyCSE] Address post commit review for r249523. · 859b2ac0
Arnaud A. de Grandmaison authored Oct 09, 2015
```
llvm-svn: 249814
```
859b2ac0

[RS4GC] Refactoring to make a later change easier, NFCI · 3c520a12

Sanjoy Das authored Oct 08, 2015

Summary:
These non-semantic changes will help make a later change adding
support for deopt operand bundles more streamlined.

Reviewers: reames, swaroop.sridhar

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13491

llvm-svn: 249779

3c520a12

[PlaceSafeopints] Extract out `callsGCLeafFunction`, NFC · c21a05a3

Sanjoy Das authored Oct 08, 2015

Summary:
This will be used in a later change to RewriteStatepointsForGC.

Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13490

llvm-svn: 249777

c21a05a3

[RS4GC] Don't copy ADT's unneccessarily, NFCI · 1ede5367

Sanjoy Das authored Oct 08, 2015

Summary: Use `const auto &` instead of `auto` in `makeStatepointExplicit`.

Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13454

llvm-svn: 249776

1ede5367

Oct 08, 2015

New MSan mapping layout (llvm part). · d12212bc

Evgeniy Stepanov authored Oct 08, 2015

This is an implementation of
https://github.com/google/sanitizers/issues/579

It has a number of advantages over the current mapping:
* Works for non-PIE executables.
* Does not require ASLR; as a consequence, debugging MSan programs in
  gdb no longer requires "set disable-randomization off".
* Supports linux kernels >=4.1.2.
* The code is marginally faster and smaller.

This is an ABI break. We never really promised ABI stability, but
this patch includes a courtesy escape hatch: a compile-time macro
that reverts back to the old mapping layout.

llvm-svn: 249753

d12212bc

Add Triple::isAndroid(). · 5fe279e7

Evgeniy Stepanov authored Oct 08, 2015

This is a simple refactoring that replaces Triple.getEnvironment()
checks for Android with Triple.isAndroid().

llvm-svn: 249750

5fe279e7

[InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886) · f61a08fb

Sanjay Patel authored Oct 08, 2015

This is a partial fix for PR24886:
https://llvm.org/bugs/show_bug.cgi?id=24886

Without this IR transform, the backend (x86 at least) was producing inefficient code.

This patch is making 2 assumptions:

    1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic.
    2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef.

Differential Revision: http://reviews.llvm.org/D13076

llvm-svn: 249702

f61a08fb

Oct 07, 2015

[RS4GC] Use AssertingVH for RematerializedValueMapTy, NFCI · 40bdd041

Sanjoy Das authored Oct 07, 2015

Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13489

llvm-svn: 249620

40bdd041

[IndVars] Preserve LCSSA in `eliminateIdentitySCEV` · 0015e5a0

Sanjoy Das authored Oct 07, 2015

Summary:
After r249211, SCEV can see through some LCSSA phis.  Add a
`replacementPreservesLCSSAForm` check before replacing uses of these phi
nodes with a simplified use of the induction variable to avoid breaking
LCSSA.

Fixes 25047.

Depends on D13460.

Reviewers: atrick, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13461

llvm-svn: 249575

0015e5a0

[EarlyCSE] Fix handling of target memory intrinsics for CSE'ing loads. · a6178a17

Arnaud A. de Grandmaison authored Oct 07, 2015

Summary:
Some target intrinsics can access multiple elements, using the pointer as a
base address (e.g. AArch64 ld4). When trying to CSE such instructions,
it must be checked the available value comes from a compatible instruction
because the pointer is not enough to discriminate whether the value is
correct.

Reviewers: ssijaric

Subscribers: mcrosier, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D13475

llvm-svn: 249523

a6178a17

[RS4GC] Remove an unnecessary assert & related variables · 60bf3db1

Sanjoy Das authored Oct 07, 2015

I don't think this assert adds much value, and removing it and related
variables avoids an "unused variable" warning in release builds.

llvm-svn: 249511

60bf3db1

[RS4GC] Cosmetic cleanup, NFC · b40bd1a9

Sanjoy Das authored Oct 07, 2015

Summary:
A series of cosmetic cleanup changes to RewriteStatepointsForGC:

  - Rename variables to LLVM style
  - Remove some redundant asserts
  - Remove an unsued `Pass *` parameter
  - Remove unnecessary variables
  - Use C++11 idioms where applicable
  - Pass CallSite by value, not reference

Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13370

llvm-svn: 249508

b40bd1a9

InstCombine: Fold comparisons between unguessable allocas and other pointers · f1f36517

Hans Wennborg authored Oct 07, 2015

This will allow us to optimize code such as:

  int f(int *p) {
    int x;
    return p == &x;
  }

as well as:

  int *allocate(void);
  int f() {
    int x;
    int *p = allocate();
    return p == &x;
  }

The folding can only be done under certain circumstances. Even though p and &x
cannot alias, the comparison must still return true if the pointer
representations are equal. If a user successfully generates a p that's a
correct guess for &x, comparison should return true even though p is an invalid
pointer.

This patch argues that if the address of the alloca isn't observable outside the
function, the function can act as-if the address is impossible to guess from the
outside. The tricky part is keeping the act consistent: if we fold p == &x to
false in one place, we must make sure to fold any other comparisons based on
those pointers similarly. To ensure that, we only fold when &x is involved
exactly once in comparison instructions.

Differential Revision: http://reviews.llvm.org/D13358

llvm-svn: 249490

f1f36517

Fix Clang-tidy modernize-use-nullptr warnings in source directories and... · 083ca9bb

Hans Wennborg authored Oct 06, 2015

Fix Clang-tidy modernize-use-nullptr warnings in source directories and generated files; other minor cleanups.

Patch by Eugene Zelenko!

Differential Revision: http://reviews.llvm.org/D13321

llvm-svn: 249482

083ca9bb

Oct 06, 2015

[IndVars] Don't break dominance in `eliminateIdentitySCEV` · 5c8bead4

Sanjoy Das authored Oct 06, 2015

Summary:
After r249211, `getSCEV(X) == getSCEV(Y)` does not guarantee that X and
Y are related in the dominator tree, even if X is an operand to Y (I've
included a toy example in comments, and a real example as a test case).

This commit changes `SimplifyIndVar` to require a `DominatorTree`.  I
don't think this is a problem because `ScalarEvolution` requires it
anyway.

Fixes PR25051.

Depends on D13459.

Reviewers: atrick, hfinkel

Subscribers: joker.eph, llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13460

llvm-svn: 249471

5c8bead4

[IndVars] Extract out eliminateIdentitySCEV, NFC · 088bb0ea

Sanjoy Das authored Oct 06, 2015

Summary:
Reflow a comment while at it.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13459

llvm-svn: 249470

088bb0ea

[WinEH] Recognize CoreCLR personality function · 2afea543

Joseph Tremoulet authored Oct 06, 2015

Summary:
 - Add CoreCLR to if/else ladders and switches as appropriate.
 - Rename isMSVCEHPersonality to isFuncletEHPersonality to better
   reflect what it captures.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, AndyAyers, llvm-commits

Differential Revision: http://reviews.llvm.org/D13449

llvm-svn: 249455

2afea543

[EarlyCSE] Constify ParseMemoryInst methods (NFC). · 6fd488b1
Arnaud A. de Grandmaison authored Oct 06, 2015
```
llvm-svn: 249400
```
6fd488b1

[InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector... · 40f59e44

Andrea Di Biagio authored Oct 06, 2015

[InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector select masks with ConstantExpr elements (PR24922)

If the mask of a select instruction is a ConstantVector, method
SimplifyDemandedVectorElts iterates over the mask elements to identify which
values are selected from the select inputs.

Before this patch, method SimplifyDemandedVectorElts always used method
Constant::isNullValue() to check if a value in the mask was zero. Unfortunately
that method always returns false when called on a ConstantExpr.

This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit
check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we
avoid calling isNullValue() on it.

Fixes PR24922.

Differential Revision: http://reviews.llvm.org/D13219

llvm-svn: 249390

40f59e44

Oct 05, 2015

[msan] Correct a typo in poison stack pattern command line description. · 670abcfd
Evgeniy Stepanov authored Oct 05, 2015
```
Patch by Jon Eyolfson.

llvm-svn: 249331
```
670abcfd

MergeFunctions: Clear GlobalNumbers ValueMap · 0591c5d7

Arnold Schwaighofer authored Oct 05, 2015

Otherwise, the map will observe changes as long as MergeFunctions is alive. This
is bad because follow-up passes could replace-all-uses-with on the key of an
entry in the map. The value handle callback of ValueMap however asserts that the
key type matches.

rdar://22971893

llvm-svn: 249327

0591c5d7

Oct 03, 2015

inariant.group handling in GVN · dc9b2cfc

Piotr Padlewski authored Oct 02, 2015

The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.

http://reviews.llvm.org/D12992

llvm-svn: 249196

dc9b2cfc

Oct 02, 2015

[SimplifyLibCalls] Fix instruction misplacement in string/memory libcall optimization · b491a2d6

Bruno Cardoso Lopes authored Oct 01, 2015

When trying to optimize fortified library functions use the right
location to insert new instructions in order to preserve correct
def-use order.

This fixes an issue where a misplaced instruction definition would
happen to be *after* one of its use after a RAUW, forming invalid IR.
This behavior was introduced by r227250.

Differential Revision: http://reviews.llvm.org/D13301

rdar://problem/22802369

llvm-svn: 249092

b491a2d6

Oct 01, 2015

[InstCombine] Remove trivially empty lifetime start/end ranges. · 849f3bf8

Arnaud A. de Grandmaison authored Oct 01, 2015

Summary:
Some passes may open up opportunities for optimizations, leaving empty
lifetime start/end ranges. For example, with the following code:

    void foo(char *, char *);
    void bar(int Size, bool flag) {
      for (int i = 0; i < Size; ++i) {
        char text[1];
        char buff[1];
        if (flag)
          foo(text, buff); // BBFoo
      }
    }

the loop unswitch pass will create 2 versions of the loop, one with
flag==true, and the other one with flag==false, but always leaving
the BBFoo basic block, with lifetime ranges covering the scope of the for
loop. Simplify CFG will then remove BBFoo in the case where flag==false,
but will leave the lifetime markers.

This patch teaches InstCombine to remove trivially empty lifetime marker
ranges, that is ranges ending right after they were started (ignoring
debug info or other lifetime markers in the range).

This fixes PR24598: excessive compile time after r234581.

Reviewers: reames, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13305

llvm-svn: 249018

849f3bf8

[NaryReassociate] SeenExprs records WeakVH · df1a1b11

Jingyue Wu authored Oct 01, 2015

Summary:
The instructions SeenExprs records may be deleted during rewriting.
FindClosestMatchingDominator should ignore these deleted instructions.

Fixes PR24301.

Reviewers: grosser

Subscribers: grosser, llvm-commits

Differential Revision: http://reviews.llvm.org/D13315

llvm-svn: 248983

df1a1b11

Update sample profile propagation algorithm. · 7c41dd64
Dehao Chen authored Oct 01, 2015
```
http://reviews.llvm.org/D13218

llvm-svn: 248968
```
7c41dd64

Sep 30, 2015

[SLP] Don't vectorize loads of non-packed types (like i1, i2). · fc783e91

Michael Zolotukhin authored Sep 30, 2015

Summary:
Given an array of i2 elements, 4 consecutive scalar loads will be lowered to
i8-sized loads and thus will access 4 consecutive bytes in memory. If we
vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in
memory. Hence, we should prohibit vectorization in such cases.

PS: Initial patch was proposed by Arnold.

Reviewers: aschwaighofer, nadav, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13277

llvm-svn: 248943

fc783e91

Fix debug info with SafeStack. · f608111d
Evgeniy Stepanov authored Sep 30, 2015
```
llvm-svn: 248933
```
f608111d

DeadCodeElimination: rewrite to be faster · b0c6d917

Fiona Glaser authored Sep 30, 2015

Same strategy as simplifyInstructionsInBlock. ~1/3 less time
on my test suite. This pass doesn't have many in-tree users,
but getting rid of an O(N^2) worst case and making it cleaner
should at least make it a viable alternative to ADCE, since
it's now consistently somewhat faster.

llvm-svn: 248927

b0c6d917