Commits · 275a4f76c4b9f15cc0f73c38b8f9ee12e0e477d5 · Roger Ferrer / llvm-epi

Nov 02, 2017

Irreducible loop metadata for more accurate block frequency under PGO. · dce9def3

Hiroshi Yamauchi authored Nov 02, 2017

Summary:
Currently the block frequency analysis is an approximation for irreducible
loops.

The new irreducible loop metadata is used to annotate the irreducible loop
headers with their header weights based on the PGO profile (currently this is
approximated to be evenly weighted) and to help improve the accuracy of the
block frequency analysis for irreducible loops.

This patch is a basic support for this.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: mehdi_amini, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D39028

llvm-svn: 317278

dce9def3

[Hexagon] Prefer L2_loadrub_io over L4_loadrub_rr · 058014fc
Krzysztof Parzyszek authored Nov 02, 2017
```
If the offset is an immediate, avoid putting it in a register
to get Rs+Rt<<#0.

llvm-svn: 317275
```
058014fc

[LoopPredication] Enable predication when latchCheckIV is wider than rangeCheck · 1d02b13e

Anna Thomas authored Nov 02, 2017

Summary:
This patch allows us to predicate range checks that have a type narrower than
the latch check type. We leverage SCEV analysis to identify a truncate for the
latchLimit and latchStart.
There is also safety checks in place which requires the start and limit to be
known at compile time. We require this to make sure that the SCEV truncate expr
for the IV corresponding to the latch does not cause us to lose information
about the IV range.
Added tests show the loop predication over range checks that are of various
types and are narrower than the latch type.
This enhancement has been in our downstream tree for a while.

Reviewers: apilipenko, sanjoy, mkazantsev

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39500

llvm-svn: 317269

1d02b13e

[test] Move llvm-lib tests into tools/llvm-lib. NFC. · 20910f46

Martin Storsjö authored Nov 02, 2017

Similarly to SVN r317189 for llvm-dlltool, these are probably
easier to find in a tools subdirectory with a name identical to
the tool, than in a toplevel directory with a different name.

This matches the move of LibDriver itself in SVN r302995.

Differential Revision: https://reviews.llvm.org/D39531

llvm-svn: 317262

20910f46

[dsymutil][doc] Improve wording in manpage and rename file. · fb7bf1d7

Jonas Devlieghere authored Nov 02, 2017

 - Improve wording
 - Rename llvm-dsymutil to dsymutil
 - Name -arch=<arch> argument

Differential revision: https://reviews.llvm.org/D39561

llvm-svn: 317226

fb7bf1d7

Strip off invariant.start because memory locations arent invariant · 729dafc1

Anna Thomas authored Nov 02, 2017

The original change was reverted in rL317217 because of the failure in
the RS4GC testcase. I couldn't reproduce the failure on my local machine
(macbook) but could reproduce it on a linux box.

The failure was around removing the uses of invariant.start. The fix
here is to just RAUW undef (which was the first implementation in D39388).
This is perfectly valid IR as discussed in the review.

llvm-svn: 317225

729dafc1

Revert "[RS4GC] Strip off invariant.start because memory locations arent invariant" · ebe429d9
Anna Thomas authored Nov 02, 2017
```
This reverts commit r317215, investigating the test failure.

llvm-svn: 317217
```
ebe429d9

[RS4GC] Strip off invariant.start because memory locations arent invariant · 486a7aaa

Anna Thomas authored Nov 02, 2017

Summary:
Invariant.start on memory locations has the property that the memory
location is unchanging. However, this is not true in the face of
rewriting statepoints for GC.
Teach RS4GC about removing invariant.start so that optimizations after
RS4GC does not incorrect sink a load from the memory location past a
statepoint.

Added test showcasing the issue.

Reviewers: reames, apilipenko, dneilson

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39388

llvm-svn: 317215

486a7aaa

Revert "[ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass." · 82bade61

Clement Courbet authored Nov 02, 2017

undefined reference to `llvm::TargetPassConfig::ID' on
clang-ppc64le-linux-multistage

This reverts commit eea333c33fa73ad225ef28607795984829f65688.

llvm-svn: 317213

82bade61

[ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass. · 1dc37b9c

Clement Courbet authored Nov 02, 2017

Summary:
This is mostly a noop (most of the test diffs are renamed blocks).
There are a few temporary register renames (eax<->ecx) and a few blocks are
shuffled around.

See the discussion in PR33325 for more details.

Reviewers: spatel

Subscribers: mgorny

Differential Revision: https://reviews.llvm.org/D39456

llvm-svn: 317211

1dc37b9c

[X86] Fix bug in legalize vector types - Split large loads · a37d1130

Ayman Musa authored Nov 02, 2017

When splitting a large load to smaller legally-typed loads, the last load should be padded to reach the size of the previous one so a CONCAT_VECTORS node could reunite them again.
The code currently pads the last load to reach the size of the first load (instead of the previous).

Differential Revision: https://reviews.llvm.org/D38495

Change-Id: Ib60b55ed26ce901fabf68108daf52683fbd5013f
llvm-svn: 317206

a37d1130

[mips] Use register scavenging with MSA. · 725acb2d

Simon Dardis authored Nov 02, 2017

MSA stores and loads to the stack are more likely to require an
emergency GPR spill slot due to the smaller offsets available
with those instructions.

Handle this by overestimating the size of the stack by determining
the largest offset presuming that all callee save registers are
spilled and accounting of incoming arguments when determining
whether an emergency spill slot is required.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D39056

llvm-svn: 317204

725acb2d

Adding test for extraxt sub vector load and store avx512 · 0c20b690
Michael Zuckerman authored Nov 02, 2017
```
Change-Id: Iefcb0ec6b6aa1b530ce5358081f02e6e522a8e50
llvm-svn: 317202
```
0c20b690

Allow inaccessiblememonly and inaccessiblemem_or_argmemonly to be overwriten... · 6fefc0d6

Yichao Yu authored Nov 02, 2017

Allow inaccessiblememonly and inaccessiblemem_or_argmemonly to be overwriten on call site with operand bundle

Summary:
Similar to argmemonly, readonly and readnone.

Fix PR35128

Reviewers: andrew.w.kaylor, chandlerc, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D39434

llvm-svn: 317201

6fefc0d6

[AsmPrinterDwarf] Add support for .cfi_restore directive · 66d2c269

Francis Visoiu Mistrih authored Nov 02, 2017

As of today we only use .cfi_offset to specify the offset of a CSR, but
we never use .cfi_restore when the CSR is restored.

If we want to perform a more advanced type of shrink-wrapping, we need
to use .cfi_restore in order to switch the CFI state between blocks.

This patch only aims at adding support for the directive.

Differential Revision: https://reviews.llvm.org/D36114

llvm-svn: 317199

66d2c269

[SimplifyCFG] Discard speculated dbg intrinsics · e73b85d1

Bjorn Pettersson authored Nov 02, 2017

Summary:
SpeculativelyExecuteBB can flatten the CFG by doing
speculative execution followed by a select instruction.
When the speculatively executed BB contained dbg intrinsics
the result could be a little bit weird, since those dbg
intrinsics were inserted before the select in the flattened
CFG. So when single stepping in the debugger, printing the
value of the variable referenced in the dbg intrinsic, it
could happen that it looked like the variable had values
that never actually were assigned to the variable.

This patch simply discards all dbg intrinsics that were found
in the speculatively executed BB.

Reviewers: aprantl, chandlerc, craig.topper

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39494

llvm-svn: 317198

e73b85d1

[ARM] and, or, xor and add with shl combine · 242052c6

Sam Parker authored Nov 02, 2017

The generic dag combiner will fold:

(shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
(shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)

This can create constants which are too large to use as an immediate.
Many ALU operations are also able of performing the shl, so we can
unfold the transformation to prevent a mov imm instruction from being
generated.

Other patterns, such as b + ((a << 1) | 510), can also be simplified
in the same manner.

Differential Revision: https://reviews.llvm.org/D38084

llvm-svn: 317197

242052c6

The patch updates sched numbers for YMM AVX instrs such as VMOVx, VORx, VXOR,... · 3c8bf5ec

Andrew V. Tischenko authored Nov 02, 2017

The patch updates sched numbers for YMM AVX instrs such as VMOVx, VORx, VXOR, VPERMILx, VBROADCASTx, etc.
PR32857 should be closed.
Differential Revision: https://reviews.llvm.org/D39227

llvm-svn: 317196

3c8bf5ec

[test] Move llvm-dlltool tests into tools/llvm-dlltool. NFC. · 53d7ba76

Martin Storsjö authored Nov 02, 2017

A toplevel test directory DllTool isn't consistent with other
similar tools.

Differential Revision: https://reviews.llvm.org/D39513

llvm-svn: 317189

53d7ba76

[X86] Fix fast-isel-int-float-conversion test · d0f59f0a

Steven Wu authored Nov 02, 2017

Test is failing due to the revert in r317136. Fix the test to make all
the bots happy.

llvm-svn: 317153

d0f59f0a

[yaml2obj][ELF] Add support for setting alignment in program headers · 03aeeb09

Jake Ehrlich authored Nov 01, 2017

Sometimes program headers have larger alignments than any of the
sections they contain. Currently yaml2obj can't produce such files. A
bug recently appeared in llvm-objcopy that failed in such a case. I'd
like to be able to add tests to llvm-objcopy for such cases.

This change adds an optional alignment parameter to program headers that
will be used instead of calculating the alignment.

Differential Revision: https://reviews.llvm.org/D39130

llvm-svn: 317139

03aeeb09

loop-unroll: teach remapInstruction to update dbg.value intrinsics. · bfa77c4c
Adrian Prantl authored Nov 01, 2017
```
Fixes PR35112.

https://bugs.llvm.org/show_bug.cgi?id=35112

llvm-svn: 317138
```
bfa77c4c
Revert "Correct dwarf unwind information in function epilogue for X86" · bb5c84fb
Petar Jovanovic authored Nov 01, 2017
```
This reverts r317100 as it introduced sanitizer-x86_64-linux-autoconf
buildbot failure (build #15606).

llvm-svn: 317136
```
bb5c84fb

Nov 01, 2017

[LLVM-C] Expose functions to create debug locations via DIBuilder. · 789164d4

whitequark authored Nov 01, 2017

These include:
  * Several functions for creating an LLVMDIBuilder,
  * LLVMDIBuilderCreateCompileUnit,
  * LLVMDIBuilderCreateFile,
  * LLVMDIBuilderCreateDebugLocation.

Patch by Harlan Haskins.

Differential Revision: https://reviews.llvm.org/D32368

llvm-svn: 317135

789164d4

[X86][SSE] Add PACKUS support to LowerTruncate · e152c2c4

Simon Pilgrim authored Nov 01, 2017

Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value.

We have to account for pre-SSE41 targets not supporting PACKUSDW

llvm-svn: 317128

e152c2c4

[X86] Add custom code to EVEX to VEX pass to turn unmasked 128-bit VPALIGND/Q... · 4e56ba27

Craig Topper authored Nov 01, 2017

[X86] Add custom code to EVEX to VEX pass to turn unmasked 128-bit VPALIGND/Q into VPALIGNR if the extended registers aren't being used.

This will enable us to prefer VALIGND/Q during shuffle lowering in order to get the extended register encoding space when BWI isn't available. But if we end up not using the extended registers we can switch VPALIGNR for the shorter VEX encoding.

Differential Revision: https://reviews.llvm.org/D39401

llvm-svn: 317122

4e56ba27

loop-rotate: avoid duplicating dbg.value intrinsics in the entry block. · 98c6549e
Adrian Prantl authored Nov 01, 2017
```
This fixes the second half of PR35113.

This reapplies r317106 without modifications.

llvm-svn: 317121
```
98c6549e

loop-rotate: eliminate duplicate debug intrinsics after splicing. · d60f34c2

Adrian Prantl authored Nov 01, 2017

Fixes part of PR35113.

This reapplies r317105 with an additional check for isa<Instruction>
as found by the bots.

llvm-svn: 317120

d60f34c2

Include GUIDs from the same module when computing GUIDs that needs to be imported. · c6c051f2

Dehao Chen authored Nov 01, 2017

Summary: In the compile phase of SamplePGO+ThinLTO, ICP is not invoked. Instead, indirect call targets will be included as function metadata for ThinIndex to buidl the call graph. This should not only include functions defined in other modules, but also functions defined in the same module, otherwise ThinIndex may find the callee dead and eliminate it, while ICP in backend will revive the symbol, which leads to undefined symbol.

Reviewers: tejohnson

Reviewed By: tejohnson

Subscribers: sanjoy, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D39480

llvm-svn: 317118

c6c051f2

[globalisel][tablegen] Add support for multi-insn emission · 9cbe7c7f

Daniel Sanders authored Nov 01, 2017

The importer will now accept nested instructions in the result pattern such as
(ADDWrr $a, (SUBWrr $b, $c)). This is only valid when the nested instruction
def's a single vreg and the parent instruction consumes a single vreg where a
nested instruction is specified. The importer will automatically create a vreg
to connect the two using the type information from the pattern. This vreg will
be constrained to the register classes given in the instruction definitions*.

* REG_SEQUENCE is explicitly rejected because of this. The definition doesn't
  constrain to a register class and it therefore needs special handling.

llvm-svn: 317117

9cbe7c7f

[X86] Prevent fast isel from folding loads into the instructions listed in hasPartialRegUpdate. · ca1aa83c

Craig Topper authored Nov 01, 2017

This patch moves the check for opt size and hasPartialRegUpdate into the lower level implementation of foldMemoryOperandImpl to catch the entry point that fast isel uses.

We're still folding undef register instructions in AVX that we should also probably disable, but that's a problem for another patch.

Unfortunately, this requires reordering a bunch of functions which is why the diff is so large. I can do the function reordering separately if we want.

Differential Revision: https://reviews.llvm.org/D39402

llvm-svn: 317112

ca1aa83c

Adds code to PPC ISEL lowering to recognize half-word inserts from... · 67152614

Graham Yiu authored Nov 01, 2017

Adds code to PPC ISEL lowering to recognize half-word inserts from vector_shuffles, and use P9 shift and vector insert instructions instead of vperm.

Differential Revision: https://reviews.llvm.org/D34160

llvm-svn: 317111

67152614

Revert r317105 to investigate bot breakage. · c8516346
Adrian Prantl authored Nov 01, 2017
```
llvm-svn: 317110
```
c8516346
Revert r317106 to facilitate reverting r317105. · 40a0ea5f
Adrian Prantl authored Nov 01, 2017
```
llvm-svn: 317109
```
40a0ea5f

LTO: Apply global DCE to ThinLTO modules at LTO opt level 0. · 9fb6e1a0

Peter Collingbourne authored Nov 01, 2017

This is necessary because DCE is applied to full LTO modules. Without
this change, a reference from a dead ThinLTO global to a dead full
LTO global will result in an undefined reference at link time.

This problem is only observable when --gc-sections is disabled, or
when targeting COFF, as the COFF port of lld requires all symbols to
have a definition even if all references are dead (this is consistent
with link.exe).

This change also adds an EliminateAvailableExternally pass at -O0. This
is necessary to handle the situation on Windows where a non-prevailing
copy of a linkonce_odr function has an SEH filter function; any
such filters must be DCE'd because they will contain a call to the
llvm.localrecover intrinsic, passing as an argument the address of the
function that the filter belongs to, and llvm.localrecover requires
this function to be defined locally.

Fixes PR35142.

Differential Revision: https://reviews.llvm.org/D39484

llvm-svn: 317108

9fb6e1a0

[X86] Regnerate test to attempt to fix build bot failure. · 56db9d6b
Craig Topper authored Nov 01, 2017
```
llvm-svn: 317107
```
56db9d6b
loop-rotate: avoid duplicating dbg.value intrinsics in the entry block. · 9259f216
Adrian Prantl authored Nov 01, 2017
```
This fixes the second half of PR35113.

llvm-svn: 317106
```
9259f216
loop-rotate: eliminate duplicate debug intrinsics after splicing. · b627acd0
Adrian Prantl authored Nov 01, 2017
```
Fixes part of PR35113.

llvm-svn: 317105
```
b627acd0

[dsymutil][NFC} Rename thread related command line options · 369a7ecc

Jonas Devlieghere authored Nov 01, 2017

This makes the command line options consistent with llvm-cov and
llvm-profdata, which both use `-num-threads` and `-j`.

This also addresses the conflict reported after landing D39355.

Differential revision: https://reviews.llvm.org/D39496

llvm-svn: 317104

369a7ecc

[X86] Add 64-bit int to float/double conversion with AVX to X86FastISel::X86SelectSIToFP · 5ae677e1

Craig Topper authored Nov 01, 2017

Summary:
[X86] Teach fast isel to handle i64 sitofp with AVX.

For some reason we only handled i32 sitofp with AVX. But with SSE only we support i64 so we should do the same with AVX.

Also add i686 command lines for the 32-bit tests. 64-bit tests are in a separate file to avoid a fast-isel abort failure in 32-bit mode.

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39450

llvm-svn: 317102

5ae677e1