Commits · c7fc81e6595865296738fe0f8ffe692ea41b1ffc · Lorenzo Albano / LLVM bpEVL

Dec 30, 2017
- Use phi ranges to simplify code. No functionality change intended. · c7fc81e6
  Benjamin Kramer authored Dec 30, 2017
```
llvm-svn: 321585
```
  c7fc81e6
Dec 29, 2017
- StructurizeCFG: Use phi iterator range · 8dcfa137
  Matt Arsenault authored Dec 29, 2017
```
llvm-svn: 321568
```
  8dcfa137
Dec 28, 2017

Remove superfluous copies in sample profiling. · 24cb28bb
Benjamin Kramer authored Dec 28, 2017
```
No functionliaty change intended.

llvm-svn: 321530
```
24cb28bb
Revert r321377, it causes regression to https://reviews.llvm.org/P8055. · 29697c13
Guozhi Wei authored Dec 28, 2017
```
llvm-svn: 321528
```
29697c13
Avoid int to string conversion in Twine or raw_ostream contexts. · 3a13ed60
Benjamin Kramer authored Dec 28, 2017
```
Some output changes from uppercase hex to lowercase hex, no other functionality change intended.

llvm-svn: 321526
```
3a13ed60

[RewriteStatepoints] Fix incorrect assertion · a13e163a

Max Kazantsev authored Dec 28, 2017

`RewriteStatepointsForGC` iterates over function blocks and their predecessors
in order of declaration. One of outcomes of this is that callsites are placed in
arbitrary order which has nothing to do with travelsar order.

On the other hand, function `recomputeLiveInValues` asserts that bases are
added to `Info.PointerToBase` before their deried pointers are updated. But
if call sites are processed in order different from RPOT, this is not necessarily
true. We cannot guarantee that the base was placed there before every
pointer derived from it. All we can guarantee is that this base was marked as
known base by this point.

This patch replaces the fact that we assert from checking that the base was
added to the map with assert that the base was marked as known base.

Differential Revision: https://reviews.llvm.org/D41593

llvm-svn: 321517

a13e163a

[InstCombine] Check for isa<Instruction> before using cast<> · 472689a1
Simon Pilgrim authored Dec 28, 2017
```
Protects against casts from constexpr etc.

Reduced from oss-fuzz #4788 test case

llvm-svn: 321515
```
472689a1

Revert "[memcpyopt] Teach memcpyopt to optimize across basic blocks" · 6d31001c

Reid Kleckner authored Dec 28, 2017

This reverts r321138. It seems there are still underlying issues with
memdep. PR35519 seems to still be present if debug info is enabled. We
end up losing a memcpy. Somehow during store to memset merging, we
insert the memset after the memcpy or fail to update the memdep analysis
to account for the newly inserted memset of a pair.

Reduced test case:

  #include <assert.h>
  #include <stdio.h>
  #include <string>
  #include <utility>
  #include <vector>

  void do_push_back(
      std::vector<std::pair<std::string, std::vector<std::string>>>* crls) {
    crls->push_back(std::make_pair(std::string(), std::vector<std::string>()));
  }

  int __attribute__((optnone)) main() {
    // Put some data in the vector and then remove it so we take the push_back
    // fast path.
    std::vector<std::pair<std::string, std::vector<std::string>>> crl_set;
    crl_set.push_back({"asdf", {}});
    crl_set.pop_back();
    printf("first word in vector storage: %p\n", *(void**)crl_set.data());

    // Do the push_back which may fail to initialize the data.
    do_push_back(&crl_set);
    auto* first = &crl_set.back().first;
    printf("first word in vector storage (should be zero): %p\n",
           *(void**)crl_set.data());
    assert(first->empty());
    puts("ok");
  }

Compile with libc++, enable optimizations, and enable debug info:
$ clang++ -stdlib=libc++ -g -O2 t.cpp -o t.exe -Wl,-rpath=llvm/build/lib

This program will assert with this change.

llvm-svn: 321510

6d31001c

Dec 27, 2017
- [InstCombine] Gracefully handle out of range extractelement indices · e7d032f1
  Simon Pilgrim authored Dec 27, 2017
```
InstSimplify is responsible for handling these, but we shouldn't just assert here.

Reduced from oss-fuzz #4808 test case

llvm-svn: 321489
```
  e7d032f1
- [instcombine] add powi(x, 2) -> x * x · cd13a663
  Philip Reames authored Dec 27, 2017
```
llvm-svn: 321468
```
  cd13a663
- Sink a couple of transforms from instcombine into instsimplify. · 5000ba69
  Philip Reames authored Dec 27, 2017
```
llvm-svn: 321467
```
  5000ba69
- [NFC] Extract out a helper function for SimplifyCall(CS, Q) · 7a6db4fc
  Philip Reames authored Dec 27, 2017
```
This simplifies code, but the real motivation is that it lets me clean up some downstream code.

llvm-svn: 321466
```
  7a6db4fc
- [Unroll][DebugInfo] Propagate loop body's debug location to epilog preheader · 8af1e1cb
  Zhaoshi Zheng authored Dec 26, 2017
```
NewExit and epilog PreHeader should has the same debug loc as the original loop
body, instead of original loop exit.

llvm-svn: 321465
```
  8af1e1cb
Dec 26, 2017

[InstCombine] fix miscompile of frem with 0.0 operand (PR34870) · 14adbacd

Sanjay Patel authored Dec 26, 2017

We might want to select NAN here or do this transform with fast-math,
but this should at least fix the miscompile.

llvm-svn: 321461

14adbacd

Dec 24, 2017
- Make helpers static. No functionality change. · 802e6255
  Benjamin Kramer authored Dec 24, 2017
```
llvm-svn: 321425
```
  802e6255
Dec 23, 2017

[CallSiteSplitting] Remove isOrHeader restriction. · 7e932890

Florian Hahn authored Dec 23, 2017

By following the single predecessors of the predecessors of the call
site, we do not need to restrict the control flow.

Reviewed By: junbuml, davide

Differential Revision: https://reviews.llvm.org/D40729

llvm-svn: 321413

7e932890

[SCCP] Manually fold branches on undef. · 55b66343

Davide Italiano authored Dec 23, 2017

This code was originally removed and replace with an assertion
because believed unnecessary. It turns out there was simply
no test coverage for this case, and the constant folder doesn't
yet know about patterns like `br undef %label1, %label2`.
Presumably at some point the constant folder might learn about
these patterns, but it's a broader change.
A testcase will be added to make sure this doesn't regress again
in the future.

Fixes PR35723.

llvm-svn: 321402

55b66343

Dec 22, 2017

[SimplifyCFG] Don't do if-conversion if there is a long dependence chain · 33250340

Guozhi Wei authored Dec 22, 2017

If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor.

This patch checks for the long dependence chain, and give up if-conversion if find one.

Differential Revision: https://reviews.llvm.org/D39352

llvm-svn: 321377

33250340

Add hasProfileData() to check if a function has profile data. NFC. · a17f2205

Easwaran Raman authored Dec 22, 2017

Summary:
This replaces calls to getEntryCount().hasValue() with hasProfileData
that does the same thing. This refactoring is useful to do before adding
synthetic function entry counts but also a useful cleanup IMO even
otherwise. I have used hasProfileData instead of hasRealProfileData as
David had earlier suggested since I think profile implies "real" and I
use the phrase "synthetic entry count" and not "synthetic profile count"
but I am fine calling it hasRealProfileData if you prefer.

Reviewers: davidxl, silvas

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41461

llvm-svn: 321331

a17f2205

Dec 21, 2017

[SimplifyCFG] Avoid quadratic on a predecessors number behavior in instruction sinking. · ad371e0c

Michael Zolotukhin authored Dec 21, 2017

If a block has N predecessors, then the current algorithm will try to
sink common code to this block N times (whenever we visit a
predecessor). Every attempt to sink the common code includes going
through all predecessors, so the complexity of the algorithm becomes
O(N^2).
With this patch we try to sink common code only when we visit the block
itself. With this, the complexity goes down to O(N).
As a side effect, the moment the code is sunk is slightly different than
before (the order of simplifications has been changed), that's why I had
to adjust two tests (note that neither of the tests is supposed to test
SimplifyCFG):
* test/CodeGen/AArch64/arm64-jumptable.ll - changes in this test mimic
the changes that previous implementation of SimplifyCFG would do.
* test/CodeGen/ARM/avoid-cpsr-rmw.ll - in this test I disabled common
code sinking by a command line flag.

llvm-svn: 321236

ad371e0c

Dec 20, 2017

[ICP] Expose unconditional call promotion interface · cb35c5d5

Matthew Simpson authored Dec 20, 2017

This patch modifies the indirect call promotion utilities by exposing and using
an unconditional call promotion interface. The unconditional promotion
interface (i.e., call promotion without creating an if-then-else) can be used
if it's known that an indirect call has only one possible callee. The existing
conditional promotion interface uses this unconditional interface to promote an
indirect call after it has been versioned and placed within the "then" block.

A consequence of unconditional promotion is that the fix-up operations for phi
nodes in the normal destination of invoke instructions are changed. This is
necessary because the existing implementation assumed that an invoke had been
versioned, creating a "merge" block where a return value bitcast could be
placed. In the new implementation, the edge between a promoted invoke's parent
block and its normal destination is split if needed to add a bitcast for the
return value. If the invoke is also versioned, the phi node merging the return
value of the promoted and original invoke instructions is placed in the "merge"
block.

Differential Revision: https://reviews.llvm.org/D40751

llvm-svn: 321210

cb35c5d5

[hwasan] Implement -fsanitize-recover=hwaddress. · 3fd1b1a7

Evgeniy Stepanov authored Dec 20, 2017

Summary: Very similar to AddressSanitizer, with the exception of the error type encoding.

Reviewers: kcc, alekseyshl

Subscribers: cfe-commits, kubamracek, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D41417

llvm-svn: 321203

3fd1b1a7

[InstCombine] Add debug location to new caller. · 012c8f97

Florian Hahn authored Dec 20, 2017

Reviewers: rnk, aprantl, majnemer

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D414

llvm-svn: 321191

012c8f97

Revert r320548:[SLP] Vectorize jumbled memory loads · 3a934d6a
Mohammad Shahid authored Dec 20, 2017
```
llvm-svn: 321181
```
3a934d6a

[LV] Remove unnecessary DoExtraAnalysis guard (silent bug) · 467abe3e

Florian Hahn authored Dec 20, 2017

canVectorize is only checking if the loop has a normalized pre-header if DoExtraAnalysis is true.
This doesn't make sense to me because reporting analysis information shouldn't alter legality
checks. This is probably the result of a last minute minor change before committing (?).

Patch by Diego Caballero.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D40973

llvm-svn: 321172

467abe3e

[memcpyopt] Teach memcpyopt to optimize across basic blocks · aa392281

Dan Gohman authored Dec 20, 2017

This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.

This is r319482 and r319483, along with fixes for PR35519: fix the 
optimization that merges stores into memsets to preserve cached memdep
info, and fix memdep's non-local caching strategy to not assume that larger
queries are always more conservative than smaller ones.

Fixes PR28958 and PR35519.

Differential Revision: https://reviews.llvm.org/D40802

llvm-svn: 321138

aa392281

Dec 19, 2017

Silence a bunch of implicit fallthrough warnings · 0e6694d1
Adrian Prantl authored Dec 19, 2017
```
llvm-svn: 321114
```
0e6694d1
[SeparateConstOffsetFromGEP] Fix a typo. NFC. · 5b106ef9
Haicheng Wu authored Dec 19, 2017
```
do CSE for to => do CSE to

llvm-svn: 321098
```
5b106ef9

[JumpThreading] Restrict PRE across instructions that don't pass control to successors · fd95ee0c

Max Kazantsev authored Dec 19, 2017

PRE in JumpThreading should not be able to hoist copy of non-speculable loads across
instructions that don't always transfer execution to their successors, otherwise they may
introduce an unsafe load which otherwise would not be executed.

The same problem for GVN was fixed as rL316975.

Differential Revision: https://reviews.llvm.org/D40347

llvm-svn: 321063

fd95ee0c

Dec 18, 2017

[PGO] Fix handling of cold entry count for instrumented PGO · 915897e2

Teresa Johnson authored Dec 18, 2017

Summary:
In r277849, getEntryCount was changed to return None when the entry
count was 0, specifically for SamplePGO where it means no samples were
recorded. However, for instrumentation PGO a 0 entry count should be
returned directly, since it does mean that the function was completely
cold. Otherwise we end up treating these functions conservatively
in isFunctionEntryCold() and isColdBB().

Instead, for SamplePGO use -1 when there are no samples, and change
getEntryCount to return None when the value is -1.

Reviewers: danielcdh, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41307

llvm-svn: 321018

915897e2

Fix more inconsistent line endings. NFC. · e4f5d010
Dimitry Andric authored Dec 18, 2017
```
llvm-svn: 321016
```
e4f5d010
Removed unused DominanceFrontier · d89d0b64
Matt Arsenault authored Dec 18, 2017
```
llvm-svn: 321001
```
d89d0b64
[PGO] add MST min edge selection heuristic to ensure non-zero entry count · 19fb5b46
Xinliang David Li authored Dec 18, 2017
```
Differential Revision: http://reviews.llvm.org/D41059

llvm-svn: 320998
```
19fb5b46

[Memcpy Loop Lowering] Remove the fixed int8 lowering. · 5fb624a3

Sean Fertile authored Dec 18, 2017

Switch over to the lowering that uses target supplied operand types.

Differential Revision: https://reviews.llvm.org/D41201

llvm-svn: 320989

5fb624a3

[ThinLTO] Remove unused code · c95b4960

Eugene Leviant authored Dec 18, 2017

This is a re-commit of r320464, after patch for gold plugin
was landed.

llvm-svn: 320968

c95b4960

[SROA] Disable non-whole-alloca splits by default · c6faf154

Hiroshi Inoue authored Dec 18, 2017

This patch introduce a switch to control splitting of non-whole-alloca slices with default off.
The switch will be default on again after fixing an issue reported in PR35657.

llvm-svn: 320958

c6faf154

Dec 16, 2017

[Memcpy Loop Lowering] Only calculate residual size/bytes copied when needed. · 68d7f9da

Sean Fertile authored Dec 16, 2017

If the loop operand type is int8 then there will be no residual loop for the
unknown size expansion. Dont create the residual-size and bytes-copied values
when they are not needed.

llvm-svn: 320929

68d7f9da

[InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel · 5a0cdac1

Sanjay Patel authored Dec 16, 2017

We want to do this for 2 reasons:
1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766.
2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern.

More detail about what happens in the backend:
1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs 
   into the shift variant. That is the opposite of this IR canonicalization.
2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs 
   into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization.
3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2
   into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node 
   when that's legal/custom.
4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a 
   variety of ways.
   a. For #2, the vector path is missing the case for setlt with a '1' constant.
   b. For #3, we are missing a match for commuted versions of the shift variants.
5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel 
   produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the 
   shift sequence when not.
6. In the following examples with this patch applied, we may get conditional moves rather than the shift 
   produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific 
   decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.

define i32 @abs_shifty(i32 %x) {
  %signbit = ashr i32 %x, 31 
  %add = add i32 %signbit, %x  
  %abs = xor i32 %signbit, %add 
  ret i32 %abs
}

define i32 @abs_cmpsubsel(i32 %x) {
  %cmp = icmp slt i32 %x, zeroinitializer
  %sub = sub i32 zeroinitializer, %x
  %abs = select i1 %cmp, i32 %sub, i32 %x
  ret i32 %abs
}

define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
  %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> 
  %add = add <4 x i32> %signbit, %x  
  %abs = xor <4 x i32> %signbit, %add 
  ret <4 x i32> %abs
}

define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
  %cmp = icmp slt <4 x i32> %x, zeroinitializer
  %sub = sub <4 x i32> zeroinitializer, %x
  %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
  ret <4 x i32> %abs
}

> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx 
> abs_shifty:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_cmpsubsel:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_shifty_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> abs_cmpsubsel_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
> abs_shifty:
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
> 
> abs_cmpsubsel: 
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
>                                        
> abs_shifty_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> abs_cmpsubsel_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le 
> abs_shifty:  
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_cmpsubsel:
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_shifty_vec:   
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
> 
> abs_cmpsubsel_vec: 
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
>

Differential Revision: https://reviews.llvm.org/D40984

llvm-svn: 320921

5a0cdac1

[LV] Extend InstWidening with CM_Widen_Recursive · 5444f409

Hal Finkel authored Dec 16, 2017

Changes to the original scalar loop during LV code gen cause the return value
of Legal->isConsecutivePtr() to be inconsistent with the return value during
legal/cost phases (further analysis and information of the bug is in D39346).
This patch is an alternative fix to PR34965 following the CM_Widen approach
proposed by Ayal and Gil in D39346. It extends InstWidening enum with
CM_Widen_Reverse to properly record the widening decision for consecutive
reverse memory accesses and, consequently, get rid of the
Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner
solution to PR34965 than the one in D39346.

Fixes PR34965.

Patch by Diego Caballero, thanks!

Differential Revision: https://reviews.llvm.org/D40742

llvm-svn: 320913

5444f409

[SimplifyLibCalls] Inline calls to cabs when it's safe to do so · 2ff24731

Hal Finkel authored Dec 16, 2017

When unsafe algerbra is allowed calls to cabs(r) can be replaced by:

  sqrt(creal(r)*creal(r) + cimag(r)*cimag(r))

Patch by Paul Walker, thanks!

Differential Revision: https://reviews.llvm.org/D40069

llvm-svn: 320901

2ff24731