Commits · 54dbcfe5f017265ae829f054f28745b2680a0fa2 · Roger Ferrer / llvm-epi

Apr 29, 2019

Fix additional cases of more that two dashes for options in tests. · 54dbcfe5
Don Hinton authored Apr 29, 2019
```
llvm-svn: 359484
```
54dbcfe5

[DAG] Refactor DAGCombiner::ReassociateOps · 82099457

Bjorn Pettersson authored Apr 29, 2019

Summary:
Extract the logic for doing reassociations
from DAGCombiner::reassociateOps into a helper
function DAGCombiner::reassociateOpsCommutative,
and use that helper to trigger reassociation
on the original operand order, or the commuted
operand order.

Codegen is not identical since the operand order will
be different when doing the reassociations for the
commuted case. That causes some unfortunate churn in
some test cases. Apart from that this should be NFC.

Reviewers: spatel, craig.topper, tstellar

Reviewed By: spatel

Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61199

llvm-svn: 359476

82099457

[ARM] Add bitcast/extract_subvec. of fp16 vectors · d95abb17

Diogo N. Sampaio authored Apr 29, 2019

Summary:
This patch adds some basic operations for fp16
vectors, such as bitcast from fp16 to i16,
required to perform extract_subvector (also added
here) and extract_element.

Reviewers: SjoerdMeijer, DavidSpickett, t.p.northover, ostannard

Reviewed By: ostannard

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60618

llvm-svn: 359433

d95abb17

[ARM] Add v4f16 and v8f16 types to the CallingConv · 2078eb74

Diogo N. Sampaio authored Apr 29, 2019

Summary:
The Procedure Call Standard for the Arm Architecture
states that float16x4_t and float16x8_t behave just
as uint16x4_t and uint16x8_t for argument passing.
This patch adds the fp16 vectors to the
ARMCallingConv.td file.

Reviewers: miyuki, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60720

llvm-svn: 359431

2078eb74

Apr 26, 2019

[AsmPrinter] refactor to support %c w/ GlobalAddress' · 7ab164c4

Nick Desaulniers authored Apr 26, 2019

Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.

Refactors a few subclasses to support the target independent %a, %c, and
%n.

The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.

It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.

Fixes:
- https://bugs.llvm.org/show_bug.cgi?id=41402
- https://github.com/ClangBuiltLinux/linux/issues/449

Reviewers: echristo, void

Reviewed By: void

Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60887

llvm-svn: 359337

7ab164c4

Apr 23, 2019

[ARM][FIX] Add missing f16.lane.vldN/vstN lowering · 2619f399

Diogo N. Sampaio authored Apr 23, 2019

Summary:
Add missing D and Q lane VLDSTLane lowering
for fp16 elements.

Reviewers: efriedma, kosarev, SjoerdMeijer, ostannard

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60874

llvm-svn: 358962

2619f399

Apr 17, 2019

[AsmPrinter] defer %c to base class for ARM, PPC, and Hexagon. NFC · a2077bab

Nick Desaulniers authored Apr 17, 2019

Summary:
None of these derived classes do anything that the base class cannot.
If we remove these case statements, then the base class can handle them
just fine.

Reviewers: peter.smith, echristo

Reviewed By: echristo

Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60803

llvm-svn: 358603

a2077bab

[ARM] tighten test checks; NFC · 1964962b
Sanjay Patel authored Apr 17, 2019
```
llvm-svn: 358594
```
1964962b

[ARM] make test checks more thorough; NFC · 1f2c81af

Sanjay Patel authored Apr 17, 2019

This will change with the proposal in D60214.
Unfortunately, the triple is not supported for auto-generation
via script, and the multiple RUN lines have diffs on this test,
but I can't tell exactly what is required by this test.
PR7162 was an assert/crash, so hopefully, this is good enough.

llvm-svn: 358587

1f2c81af

Apr 16, 2019

Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink... · 21eb771d

Hans Wennborg authored Apr 16, 2019

Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)

The original commit caused false positives from AddressSanitizer's
use-after-scope checks, which have now been fixed in r358478.

> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936

llvm-svn: 358483

21eb771d

Apr 15, 2019

[GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only. · 946b1246

Amara Emerson authored Apr 15, 2019

Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.

This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.

I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.

Compile time:
Program                                        base   cse    diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test     9.04   9.12  0.8%
test-suite...Mark/mafft/pairlocalalign.test     2.68   2.66 -0.7%
test-suite...-typeset/consumer-typeset.test     5.53   5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test         5.30   5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test        25.82  25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test     6.92   6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test    34.24  34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test           6.25   6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test     1.66   1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test         13.61  13.60 -0.0%
Geomean difference                                          -0.2%

Code size:
Program                                        base     cse      diff
test-suite...-typeset/consumer-typeset.test    1315632  1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test    1313892  1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test        1439504  1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test    2936980  2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test        3478276  3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test    8082868  8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test         3870380  3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test          1434904  1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test    764528   764528   0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test    782092   782092   0.0%
Geomean difference                                              -0.9%

Differential Revision: https://reviews.llvm.org/D60580

llvm-svn: 358369

946b1246

Apr 12, 2019

[DAGCombiner] narrow shuffle of concatenated vectors · 5e4ad39a

Sanjay Patel authored Apr 12, 2019

// shuffle (concat X, undef), (concat Y, undef), Mask -->
// concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)

The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements.

The x86 changes look neutral or better. There's one test with an
extra instruction, but that could be reversed for a subtarget with
the right attributes. But by default, we want to avoid the 256-bit
op when possible (in my motivating benchmark, a handful of ymm ops
sprinkled into a sequence of xmm ops are triggering frequency
throttling on Haswell resulting in significantly worse perf).

Differential Revision: https://reviews.llvm.org/D60545

llvm-svn: 358291

5e4ad39a

Apr 10, 2019

Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg · 0861c87b

David Green authored Apr 10, 2019

Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not
seeing through to the constant in other blocks. Revert this patch while we come
up with a better way to handle that.

I will try to follow this up with some better tests.

llvm-svn: 358113

0861c87b

[ARM] [FIX] Add missing f16 vector operations lowering · 651463e4

Diogo N. Sampaio authored Apr 10, 2019

Summary:
Add missing <8xhalf> shufflevectors pattern, when using concat_vector dag node.
As well, allows <8xhalf> and <4xhalf> vldup1 operations.

These instructions are required for v8.2a fp16 lowering of vmul_n_f16, vmulq_n_f16 and vmulq_lane_f16 intrinsics.

Reviewers: olista01, pbarrio, LukeGeeson, efriedma

Reviewed By: efriedma

Subscribers: efriedma, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60319

llvm-svn: 358081

651463e4

[ARM GlobalISel] Select G_FCONSTANT for VFP3 · b6e83b98

Diana Picus authored Apr 10, 2019

Make it possible to TableGen code for FCONSTS and FCONSTD.

We need to make two changes to the TableGen descriptions of vfp_f32imm
and vfp_f64imm respectively:
* add GISelPredicateCode to check that the immediate fits in 8 bits;
* extract the SDNodeXForms into separate definitions and create a
GISDNodeXFormEquiv and a custom renderer function for each of them.

There's a lot of boilerplate to get the actual value of the immediate,
but it basically just boils down to calling ARM_AM::getFP32Imm or
ARM_AM::getFP64Imm.

llvm-svn: 358063

b6e83b98

[ARM GlobalISel] Select G_FCONSTANT into pools · 3533ad68

Diana Picus authored Apr 10, 2019

Put all floating point constants into constant pools and load their
values from there.

llvm-svn: 358062

3533ad68

[ARM GlobalISel] Map G_FCONSTANT · 165846b0
Diana Picus authored Apr 10, 2019
```
llvm-svn: 358061
```
165846b0

Apr 09, 2019

[GlobalISel][AArch64] Allow CallLowering to handle types which are normally · 2b523f81

Amara Emerson authored Apr 09, 2019

required to be passed as different register types. E.g. <2 x i16> may need to
be passed as a larger <2 x i32> type, so formal arg lowering needs to be able
truncate it back. Likewise, when dealing with returns of these types, they need
to be widened in the appropriate way back.

Differential Revision: https://reviews.llvm.org/D60425

llvm-svn: 358032

2b523f81

Apr 05, 2019

[SelectionDAG] Add fcmp UNDEF handling to SelectionDAG::FoldSetCC · 17586cda

Simon Pilgrim authored Apr 05, 2019

Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.).

This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........

Differential Revision: https://reviews.llvm.org/D60006

llvm-svn: 357765

17586cda

[SelectionDAG] Compute known bits of CopyFromReg · 0376ac1d

Piotr Sobczak authored Apr 05, 2019

Summary:
Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
the virtual reg used has one def only.

This can be particularly useful when calling isBaseWithConstantOffset()
with the ISD::CopyFromReg argument, as more optimizations may get enabled
in the result.

Also add a missing truncation on X86, found by testing of this patch.

Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa

Reviewers: bogner, craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59535

llvm-svn: 357745

0376ac1d

Apr 04, 2019

[ARM GlobalISel] Support DBG_VALUE · 153c3887
Diana Picus authored Apr 04, 2019
```
Make sure we can map and select DBG_VALUE.

llvm-svn: 357681
```
153c3887

Revert r357452 - 'SimplifyCFG SinkCommonCodeFromPredecessors: Also sink... · 8b8a0217

David L. Jones authored Apr 04, 2019

Revert r357452 - 'SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)'

This revision causes tests to fail under ASAN. Since the cause of the failures
is not clear (could be ASAN, could be a Clang bug, could be a bug in this
revision), the safest course of action seems to be to revert while investigating.

llvm-svn: 357667

8b8a0217

Apr 03, 2019

[DAGCombiner] loosen restrictions for moving shuffles after vector binop · 00dae6b2

Sanjay Patel authored Apr 03, 2019

There are 3 changes to make this correspond to the same transform in instcombine:
1. Remove the legality check - we can't create anything less legal than we started with.
2. Ease the use restriction, so we only bail out if both operands have >1 use.
3. Ease the use restriction for binops with a repeated operand (eg, mul x, x).

As discussed in D60150, there's a scalarization opportunity that will be made
easier by allowing this transform more generally.

llvm-svn: 357580

00dae6b2

Apr 02, 2019

SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259) · b669fea4

Hans Wennborg authored Apr 02, 2019

The code was previously checking that candidates for sinking had exactly
one use or were a store instruction (which can't have uses). This meant
we could sink call instructions only if they had a use.

That limitation seemed a bit arbitrary, so this patch changes it to
"instruction has zero or one use" which seems more natural and removes
the need to special-case stores.

Differential revision: https://reviews.llvm.org/D59936

llvm-svn: 357452

b669fea4

[ARM] Optimize expressions like "return x != 0;" for Thumb1. · 3813fe0b

Eli Friedman authored Apr 02, 2019

There's an existing optimization for x != C, but somehow it was missing
a special case for 0.

While I'm here, also cleaned up the code/comments a bit: the second
value produced by the MERGE_VALUES was actually dead, since a CMOV only
produces one result.

Differential Revision: https://reviews.llvm.org/D59616

llvm-svn: 357437

3813fe0b

[ARM] Don't try to create "push {r12, lr}" in Thumb1 at -Oz. · 73af6ef2

Eli Friedman authored Apr 01, 2019

It's a little tricky to make this issue show up because
prologue/epilogue emission normally likes to push at least two
registers... but it doesn't when lr is force-spilled due to function
length.  Not sure if that really makes sense, but I decided not to touch
it for now.

Differential Revision: https://reviews.llvm.org/D59385

llvm-svn: 357436

73af6ef2

Mar 29, 2019

[ARM] Regenerate execute-only float comparison tests · a3fb3d55
Simon Pilgrim authored Mar 29, 2019
```
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357293
```
a3fb3d55

[DAGCombine] Prune unnused nodes. · fe59e140

Nirav Dave authored Mar 29, 2019

Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.

Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight

Reviewed By: jyknight

Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58070

llvm-svn: 357283

fe59e140

[ARM] Regenerate vector comparison tests · b4b98a52
Simon Pilgrim authored Mar 29, 2019
```
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357281
```
b4b98a52

Mar 28, 2019

[ARM GlobalISel] Run regbankselect test for Thumb. NFCI · 13ef0c53

Diana Picus authored Mar 28, 2019

This should just work, since ARM mode and Thumb2 mode are at the same
level of support now and should map the same to GPR and FPR.

llvm-svn: 357159

13ef0c53

[ARM GlobalISel] Fix G_STORE with s1 · 52495c47

Diana Picus authored Mar 28, 2019

G_STORE for 1-bit values uses a STRBi12, which stores the whole byte.
Zero out the undefined bits before writing.

llvm-svn: 357154

52495c47

[ARM GlobalISel] Fix selection of G_SELECT · 4d512df3

Diana Picus authored Mar 28, 2019

G_SELECT uses a 1-bit scalar for the condition, and is currently
implemented with a plain CMPri against 0. This means that values such as
0x1110 are interpreted as true, when instead the higher bits should be
treated as undefined and therefore ignored. Replace the CMPri with a
TSTri against 0x1, which performs an implicit AND, yielding the expected
result.

llvm-svn: 357153

4d512df3

Mar 27, 2019

Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes." · c6dfaa0e
Nirav Dave authored Mar 27, 2019
```
This patch appears to trigger very large compile time increases in
halide builds.

llvm-svn: 357116
```
c6dfaa0e

[ARM] Don't confuse the scheduler for very large VLDMDIA etc. · c388bfa2

Eli Friedman authored Mar 27, 2019

ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the
memory operands of load and store-multiple operations.  This doesn't
really fix the problem properly, but it's enough to prevent crashing,
at least.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41231 .

Differential Revision: https://reviews.llvm.org/D59834

llvm-svn: 357109

c388bfa2

Mar 26, 2019

[DAG] Avoid smart constructor-based dangling nodes. · a28c5145

Nirav Dave authored Mar 26, 2019

Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations or not fully pruning unused result values. This can
result in nodes that are never added to the worklist and therefore can
not be pruned.

Add a node inserter as the current node deleter to make sure such
nodes have the chance of being pruned.

Many minor changes, mostly positive.

llvm-svn: 356996

a28c5145

Mar 25, 2019

[ARM GlobalISel] 64-bit memops should be aligned · 254b11a0

Diana Picus authored Mar 25, 2019

We currently use only VLDR/VSTR for all 64-bit loads/stores, so the
memory operands must be word-aligned. Mark aligned operations as legal
and narrow non-aligned ones to 32 bits.

While we're here, also mark non-power-of-2 loads/stores as unsupported.

llvm-svn: 356872

254b11a0

Mar 22, 2019

[ARM] Don't form "ands" when it isn't scheduled correctly. · b906bba5

Eli Friedman authored Mar 22, 2019

In r322972/r323136, the iteration here was changed to catch cases at the
beginning of a basic block... but we accidentally deleted an important
safety check.  Restore that check to the way it was.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41116

Differential Revision: https://reviews.llvm.org/D59680

llvm-svn: 356809

b906bba5

[AArch64, ARM] Add support for Exynos M5 · 4a7739b6
Evandro Menezes authored Mar 22, 2019
```
Add Exynos M5 support and test cases.

llvm-svn: 356793
```
4a7739b6

Mar 20, 2019

[ARM] Eliminate redundant "mov rN, sp" instructions in Thumb1. · 638be660

Eli Friedman authored Mar 20, 2019

This takes sequences like "mov r4, sp; str r0, [r4]", and optimizes them
to something like "str r0, [sp]".

For regular stack variables, this optimization was already implemented:
we lower loads and stores using frame indexes, which are expanded later.
However, when constructing a call frame for a call with more than four
arguments, the existing optimization doesn't apply.  We need to use
stores which are actually relative to the current value of sp, and don't
have an associated frame index.

This patch adds a special case to handle that construct.  At the DAG
level, this is an ISD::STORE where the address is a CopyFromReg from SP
(plus a small constant offset).

This applies only to Thumb1: in Thumb2 or ARM mode, a regular store
instruction can access SP directly, so the COPY gets eliminated by
existing code.

The change to ARMDAGToDAGISel::SelectThumbAddrModeSP is a related
cleanup: we shouldn't pretend that it can select anything other than
frame indexes.

Differential Revision: https://reviews.llvm.org/D59568

llvm-svn: 356601

638be660

Mar 19, 2019

RegAllocFast: Remove early selection loop, the spill calculation will report... · c2e35a6f

Matt Arsenault authored Mar 19, 2019

RegAllocFast: Remove early selection loop, the spill calculation will report cost 0 anyway for free regs

The 2nd loop calculates spill costs but reports free registers as cost
0 anyway, so there is little benefit from having a separate early
loop.

Surprisingly this is not NFC, as many register are marked regDisabled
so the first loop often picks up later registers unnecessarily instead
of the first one available in the allocation order...

Patch by Matthias Braun

llvm-svn: 356499

c2e35a6f