Commits · 582a5237f95a3852cead5208f28a84b4cab0efb2 · Roger Ferrer / llvm-epi

Feb 15, 2017

[AMDGPU] Revert failed scheduling · 582a5237

Stanislav Mekhanoshin authored Feb 15, 2017

This patch reverts region's scheduling to the original untouched state
in case if we have have decreased occupancy.

In addition it switches to use TargetRegisterInfo occupancy callback
for pressure limits instead of gradually increasing limits which were
just passed by. We are going to stay with the best schedule so we do
not need to tolerate worsened scheduling anymore.

Differential Revision: https://reviews.llvm.org/D29971

llvm-svn: 295206

582a5237

Revert "[JumpThreading] Thread through guards" · 94c8d497

Anna Thomas authored Feb 15, 2017

This reverts commit r294617.

We fail on an assert while trying to get a condition from an
unconditional branch.

llvm-svn: 295200

94c8d497

[InlineFunction] use getFunction(); NFC · 288f075f
Sanjay Patel authored Feb 15, 2017
```
llvm-svn: 295185
```
288f075f
[InlineFunction] use getCaller(); NFCI · 32d753ca
Sanjay Patel authored Feb 15, 2017
```
llvm-svn: 295181
```
32d753ca
[InlineFunction] use range-for loop; NFCI · ada717e2
Sanjay Patel authored Feb 15, 2017
```
llvm-svn: 295179
```
ada717e2

[X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise ZERO inputs · 0f0e5bd3

Simon Pilgrim authored Feb 15, 2017

Add support for specifying an UNPCK input as ZERO, particularly improves ZEXT cases with non-zero offsets

llvm-svn: 295169

0f0e5bd3

[LLVM][XRAY][MIPS] Support xray on mips/mipsel/mips64/mips64el · ec657929

Sagar Thakur authored Feb 15, 2017

Summary: Adds support for xray instrumentation on mips for both 32-bit and 64-bit.

Reviewed by sdardis, dberris
Differential: D27697

llvm-svn: 295164

ec657929

Revert r295110 and r295144. · eef9b033

Daniel Jasper authored Feb 15, 2017

This fails under ASAN:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/798/steps/check-llvm%20asan/logs/stdio

llvm-svn: 295162

eef9b033

[X86][AVX] Remove REX_W from AVX instructions. · b8a4f255

Ayman Musa authored Feb 15, 2017

There is no meaning for REX_W in VEX encoded AVX instruction.

Differential Revision: https://reviews.llvm.org/D29894

llvm-svn: 295157

b8a4f255

[X86] Don't create VBROADCAST nodes with 256-bit or 512-bit input types · fbc7805e

Craig Topper authored Feb 15, 2017

Summary:
We don't seem to have great rules on what a valid VBROADCAST node looks like. And as a consequence we end up with a lot of patterns to try to catch everything. We have patterns with scalar inputs, 128-bit vector inputs, 256-bit vector inputs, and 512-bit vector inputs.

As you can see from the things improved here we are currently missing patterns for 128-bit loads being extended to 256-bit before the vbroadcast.

I'd like to propose that VBROADCAST should always take a 128-bit vector type as input. As a first step towards that this patch adds an EXTRACT_SUBVECTOR in front of VBROADCAST when the input is 256 or 512-bits. In the future I would like to add scalar_to_vector around all the scalar operations. And maybe we should consider adding a VBROADCAST+load node to avoid separating loads from the broadcasting operation when the load itself isn't foldable.

This requires an additional change in target shuffle combining to look for the extract subvector and look through it to find the original operand. I'm sure this change isn't perfect but was enough to fix a few test failures that were being caused.

Another interesting thing I noticed is that the changes in masked_gather_scatter.ll show cases were we don't remove a useless insert into element 1 before broadcasting element 0.

Reviewers: delena, RKSimon, zvi

Reviewed By: zvi

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D28747

llvm-svn: 295155

fbc7805e

[AVX-512] Add PACKSS/PACKUS instructions to load folding tables. · ec5df5f4
Craig Topper authored Feb 15, 2017
```
llvm-svn: 295154
```
ec5df5f4

[SelectionDAGBuilder] Simplify creation of shufflevector DAG nodes where... · 96ec7a23

Craig Topper authored Feb 15, 2017

[SelectionDAGBuilder] Simplify creation of shufflevector DAG nodes where inputs are larger than the mask

Summary:
The current code loops over all elements to calculate a used range. Then a second short loop looks at the ranges and determines if they can be used in a extract and creates a properly aligned start index for the extract.

This range finding is unnecessary, we can just calculate a properly aligned start index for an extract for each input during the first loop. If we don't find the same start index for each indice we can't use an extract.

Reviewers: zvi, RKSimon

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29926

llvm-svn: 295152

96ec7a23

SimplifyCFG: Register cloned assume intrinsics with assumption cache when creating critical edge. · 0609acc1
Peter Collingbourne authored Feb 15, 2017
```
Differential Revision: https://reviews.llvm.org/D29976

llvm-svn: 295145
```
0609acc1

WholeProgramDevirt: Separate the code that applies optzns from the code that... · e2367415

Peter Collingbourne authored Feb 15, 2017

WholeProgramDevirt: Separate the code that applies optzns from the code that decides whether to apply them. NFCI.

The idea is that the apply* functions will also be called when importing
devirt optimizations.

Differential Revision: https://reviews.llvm.org/D29745

llvm-svn: 295144

e2367415

Revert r295138: Instead of a series of string operations, use snprintf(). · 4b58f577
Rui Ueyama authored Feb 15, 2017
```
This broke buildbots.

llvm-svn: 295142
```
4b58f577
Instead of a series of string operations, use snprintf(). · aae04a9a
Rui Ueyama authored Feb 15, 2017
```
llvm-svn: 295138
```
aae04a9a
Return early. NFC. · a39d148a
Rui Ueyama authored Feb 15, 2017
```
llvm-svn: 295137
```
a39d148a
Use LLVM-style naming scheme. · 789c4220
Rui Ueyama authored Feb 15, 2017
```
llvm-svn: 295136
```
789c4220

[AMDGPU] Fix MaxWorkGroupsPerCU for large workgroups · 19f98c6a

Stanislav Mekhanoshin authored Feb 15, 2017

This patch corrects the maximum workgroups per CU if we have big
workgroups (more than 128). This calculation contributes to the
occupancy calculation in respect to LDS size.

Differential Revision: https://reviews.llvm.org/D29974

llvm-svn: 295134

19f98c6a

Use LLVM-style naming scheme. · 09786c4c
Rui Ueyama authored Feb 15, 2017
```
llvm-svn: 295132
```
09786c4c
Remove useless local variable. · 143b52c5
Rui Ueyama authored Feb 15, 2017
```
llvm-svn: 295131
```
143b52c5
Split WinCOFFObjectWriter::defineSection. NFC. · 24e27b47
Rui Ueyama authored Feb 15, 2017
```
llvm-svn: 295128
```
24e27b47
Simplify WinCOFFObjectWriter by removing a template member function. · dfc8aa8e
Rui Ueyama authored Feb 14, 2017
```
llvm-svn: 295126
```
dfc8aa8e
Do not lookup a DenseMap twice using the same key. · 0fcdb48c
Rui Ueyama authored Feb 14, 2017
```
llvm-svn: 295124
```
0fcdb48c
Use endian::write32le instead of endian::write. · 86e3ef92
Rui Ueyama authored Feb 14, 2017
```
llvm-svn: 295120
```
86e3ef92
Use zero-initialization instead of memset. · cbb4e7c1
Rui Ueyama authored Feb 14, 2017
```
llvm-svn: 295119
```
cbb4e7c1
[libFuzzer] increase the size of FixedWord from 27 to 64, see PR31950 · 32c5004c
Kostya Serebryany authored Feb 14, 2017
```
llvm-svn: 295117
```
32c5004c

Feb 14, 2017

Fix a bug in caller's BFI update code after inlining. · 5a12f236

Easwaran Raman authored Feb 14, 2017

Multiple blocks in the callee can be mapped to a single cloned block
since we prune the callee as we clone it. The existing code
iterates over the value map and clones the block frequency (and
eventually scales the frequencies of the cloned blocks). Value map's
iteration is not deterministic and so the cloned block might get the
frequency of any of the original blocks. The fix is to set the max of
the original frequencies to the cloned block. The first block in the
sequence must have this max frequency and, in the call context,
subsequent blocks must have its frequency.

Differential Revision: https://reviews.llvm.org/D29696

llvm-svn: 295115

5a12f236

Use "%zd" format specifier for printing number of testcases executed. · ae579a79

Kostya Serebryany authored Feb 14, 2017

Summary:
This helps to avoid signed integer overflow after running a fast fuzz target for several hours, e.g.:

<...>
Done -1097903291 runs in 54001 second(s)



Reviewers: kcc

Reviewed By: kcc

Differential Revision: https://reviews.llvm.org/D29941

llvm-svn: 295112

ae579a79

[LV] Rename Induction to PrimaryInduction. NFC. · 569162fe
Michael Kuperstein authored Feb 14, 2017
```
llvm-svn: 295111
```
569162fe

WholeProgramDevirt: Change internal vcall data structures to match summary. · 534c0175

Peter Collingbourne authored Feb 14, 2017

Group calls into constant and non-constant arguments up front, and use uint64_t
instead of ConstantInt to represent constant arguments. The goal is to allow
the information from the summary to fit naturally into this data structure in
a future change (specifically, it will be added to CallSiteInfo).

This has two side effects:
- We disallow VCP for constant integer arguments of width >64 bits.
- We remove the restriction that the bitwidth of a vcall's argument and return
  types must match those of the vfunc definitions.
I don't expect either of these to matter in practice. The first case is
uncommon, and the second one will lead to UB (so we can do anything we like).

Differential Revision: https://reviews.llvm.org/D29744

llvm-svn: 295110

534c0175

[mips] Correct mips16 return instructions definitions · 454f2e78

Simon Dardis authored Feb 14, 2017

Correct the definition of MIPS16 instructions that act as return instructions
so that isReturn = 1 as expected.

llvm-svn: 295109

454f2e78

[BasicBlockUtils] Use getFirstNonPHIOrDbg to set debugloc for instructions... · 2e945ebb

Taewook Oh authored Feb 14, 2017

[BasicBlockUtils] Use getFirstNonPHIOrDbg to set debugloc for instructions created in SplitBlockPredecessors

Summary:
When setting debugloc for instructions created in SplitBlockPredecessors, current implementation copies debugloc from the first-non-phi instruction of the original basic block. However, if the first-non-phi instruction is a call for @llvm.dbg.value, the debugloc of the instruction may point the location outside of the block itself. For the example code of

```
  1 typedef struct _node_t {
  2   struct _node_t *next;
  3 } node_t;
  4
  5 extern node_t *root;
  6
  7 int foo() {
  8   node_t *node, *tmp;
  9   int ret = 0;
 10
 11   node = tmp = root->next;
 12   while (node != root) {
 13     while (node) {
 14       tmp = node;
 15       node = node->next;
 16       ret++;
 17     }
 18   }
 19
 20   return ret;
 21 }
```

, below is the basicblock corresponding to line 12 after Reassociate expressions pass:

```
while.cond:                                       ; preds = %while.cond2, %entry
  %node.0 = phi %struct._node_t* [ %1, %entry ], [ null, %while.cond2 ]
  %ret.0 = phi i32 [ 0, %entry ], [ %ret.1, %while.cond2 ]
  tail call void @llvm.dbg.value(metadata i32 %ret.0, i64 0, metadata !19, metadata !20), !dbg !21
  tail call void @llvm.dbg.value(metadata %struct._node_t* %node.0, i64 0, metadata !11, metadata !20), !dbg !31
  %cmp = icmp eq %struct._node_t* %node.0, %0, !dbg !33
  br i1 %cmp, label %while.end5, label %while.cond2, !dbg !35
```

As you can see, the first-non-phi instruction is a call for @llvm.dbg.value, and the debugloc is

```
!21 = !DILocation(line: 9, column: 7, scope: !6)
```

, which is a definition of 'ret' variable and outside of the scope of the basicblock itself. However, current implementation picks up this debugloc for the instructions created in SplitBlockPredecessors. This patch addresses this problem by picking up debugloc from the first-non-phi-non-dbg instruction.

Reviewers: dblaikie, samsonov, eugenis

Reviewed By: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29867

llvm-svn: 295106

2e945ebb

[BranchFolding] Tail common all identical unreachable blocks · a622fc9b

Reid Kleckner authored Feb 14, 2017

Summary:
Blocks ending in unreachable are typically cold because they end the
program or throw an exception, so merging them with other identical
blocks is usually profitable because it reduces the size of cold code.
MachineBlockPlacement generally does not arrange to fall through to such
blocks, so commoning these blocks will not introduce additional
unconditional branches.

Reviewers: hans, iteratee, haicheng

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29153

llvm-svn: 295105

a622fc9b

GlobalISel: deal with new G_PTR_MASK instruction on AArch64. · 398c5f57
Tim Northover authored Feb 14, 2017
```
It's just an AND-immediate instruction for us, surprisingly simple to select.

llvm-svn: 295104
```
398c5f57

GlobalISel: introduce G_PTR_MASK to simplify alloca handling. · c2f89563

Tim Northover authored Feb 14, 2017

This instruction clears the low bits of a pointer without requiring (possibly
dodgy if pointers aren't ints) conversions to and from an integer. Since (as
far as I'm aware) all masks are statically known, the instruction takes an
immediate operand rather than a register to specify the mask.

llvm-svn: 295103

c2f89563

Re-apply "[profiling] Remove dead profile name vars after emitting name data" · 55891fc7

Vedant Kumar authored Feb 14, 2017

This reverts 295092 (re-applies 295084), with a fix for dangling
references from the array of coverage names passed down from frontends.

I missed this in my initial testing because I only checked test/Profile,
and not test/CoverageMapping as well.

Original commit message:

The profile name variables passed to counter increment intrinsics are dead
after we emit the finalized name data in __llvm_prf_nm. However, we neglect to
erase these name variables. This causes huge size increases in the
__TEXT,__const section as well as slowdowns when linker dead stripping is
disabled. Some affected projects are so massive that they fail to link on
Darwin, because only the small code model is supported.

Fix the issue by throwing away the name constants as soon as we're done with
them.

Differential Revision: https://reviews.llvm.org/D29921

llvm-svn: 295099

55891fc7

Reformat slightly. · 14303d18
Eric Christopher authored Feb 14, 2017
```
llvm-svn: 295096
```
14303d18

Reapply r294532, reverted in r294787. · 399dcfaa

Wolfgang Pieb authored Feb 14, 2017

Store instructions can have more than one memory operand as a result
of optimizations that fold different stores into one.
When we identify spill instructions to generate DBG_VALUE instructions
to record the spilling of a variable, we disregard stores with 
multiple memory operands for now. We may miss some relevant spills but
the handling is a bit more complex, so we'll do it in a different patch.

This fixes PR31935.

llvm-svn: 295093

399dcfaa

Revert "[profiling] Remove dead profile name vars after emitting name data" · 27ebdf4b

Vedant Kumar authored Feb 14, 2017

This reverts commit r295084. There is a test failure on:

http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/2620/

llvm-svn: 295092

27ebdf4b