Commits · 705e26a24310d046bd6c3d29b72f30af045e4a1f · Lorenzo Albano / LLVM bpEVL

Feb 01, 2018

Test commit: Fix a comment. · 705e26a2
Yvan Roux authored Feb 01, 2018
```
llvm-svn: 323947
```
705e26a2

[XRay][compiler-rt+llvm] Update XRay register stashing semantics · cdca0730

Dean Michael Berris authored Feb 01, 2018

Summary:
This change expands the amount of registers stashed by the entry and
`__xray_CustomEvent` trampolines.

We've found that since the `__xray_CustomEvent` trampoline calls can show up in
situations where the scratch registers are being used, and since we don't
typically want to affect the code-gen around the disabled
`__xray_customevent(...)` intrinsic calls, that we need to save and restore the
state of even the scratch registers in the handling of these custom events.

Reviewers: pcc, pelikan, dblaikie, eizan, kpw, echristo, chandlerc

Reviewed By: echristo

Subscribers: chandlerc, echristo, hiraditya, davide, dblaikie, llvm-commits

Differential Revision: https://reviews.llvm.org/D40894

llvm-svn: 323940

cdca0730

Jan 31, 2018

Revert "[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations" · 7746899f
Evgeniy Stepanov authored Jan 31, 2018
```
Miscompiles code. Testcase pending.

This reverts commit r323869.

llvm-svn: 323929
```
7746899f
AMDGPU: Fix missing SCC def from s_xor_b64_term · af88f0eb
Matt Arsenault authored Jan 31, 2018
```
llvm-svn: 323927
```
af88f0eb

[X86] Make the type checks in detectAVX512USatPattern more robust · e44faf53

Craig Topper authored Jan 31, 2018

This code currently uses isSimple and getSizeInBits in an attempt to prune types. But isSimple will return true for any type that any target supports natively. I don't think that's a good way to prune types. I also don't think the dest element type checks are very robust since we didn't do an isSimple check on the dest type.

This patch adds a check for the input type being legal to the one caller that didn't already check that. Then we explicitly check the element types for the destination are i8, i16, or i32

Differential Revision: https://reviews.llvm.org/D42706

llvm-svn: 323924

e44faf53

[Hexagon] Rename HexagonISelLowering::getNode to getInstr, NFC · 15efa98f
Krzysztof Parzyszek authored Jan 31, 2018
```
llvm-svn: 323916
```
15efa98f

[x86] Make the retpoline thunk insertion a machine function pass. · 0dcee4fe

Chandler Carruth authored Jan 31, 2018

Summary:
This removes the need for a machine module pass using some deeply
questionable hacks. This should address PR36123 which is a case where in
full LTO the memory usage of a machine module pass actually ended up
being significant.

We should revert this on trunk as soon as we understand and fix the
memory usage issue, but we should include this in any backports of
retpolines themselves.

Reviewers: echristo, MatzeB

Subscribers: sanjoy, mcrosier, mehdi_amini, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D42726

llvm-svn: 323915

0dcee4fe

[Hexagon] Implement HVX codegen for vector shifts · 1108ee24
Krzysztof Parzyszek authored Jan 31, 2018
```
llvm-svn: 323914
```
1108ee24
[Hexagon] Handle ANY_EXTEND_VECTOR_INREG in lowering · 9eb085e6
Krzysztof Parzyszek authored Jan 31, 2018
```
llvm-svn: 323912
```
9eb085e6
[Hexagon] Handle SETCC on vector pairs in lowering · b843f751
Krzysztof Parzyszek authored Jan 31, 2018
```
llvm-svn: 323911
```
b843f751

AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9 · d4bb329d

Marek Olsak authored Jan 31, 2018

Summary:
This enables load merging into x2, x4, which is driven by inline offsets.

6500 shaders are affected:
Code Size in affected shaders: -15.14 %

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D42078

llvm-svn: 323909

d4bb329d

AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16} · 13e47412

Marek Olsak authored Jan 31, 2018

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D41663

llvm-svn: 323908

13e47412

[WebAssembly] MC: Remove unused code for handling of wasm globals · 6e7f1826

Sam Clegg authored Jan 31, 2018

For now, we are not using wasm globals, except for modeling of
the stack points.

Alos, factor out common struct WasmGlobalType, which matches the
name for that tuple in the Wasm spec and rename methods
to "isBindingGlobal", "isTypeGlobal" to avoid ambiguity.

Patch by Nicholas Wilson!

Differential Revision: https://reviews.llvm.org/D42750

llvm-svn: 323901

6e7f1826

[X86] Generate testl instruction through truncates. · f9a9e9a2

Amaury Sechet authored Jan 31, 2018

Summary:
This was introduced in D42646 but ended up being reverted because the original implementation was buggy.

Depends on D42646

Reviewers: craig.topper, niravd, spatel, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42741

llvm-svn: 323899

f9a9e9a2

[Hexagon] Handle BUILD_VECTOR from undef values in buildHvxVectorReg · 82a83391
Krzysztof Parzyszek authored Jan 31, 2018
```
llvm-svn: 323889
```
82a83391

[X86] Avoid using high register trick for test instruction · f89f188d

Amaury Sechet authored Jan 31, 2018

Summary:
It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger.

Reviewers: craig.topper, niravd, spatel, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42646

llvm-svn: 323888

f89f188d

[Hexagon] Only process bitcasts of vsplats when selecting const vectors · 8cc636c5

Krzysztof Parzyszek authored Jan 31, 2018

Selecting of constant HVX vectors involves some "manual processing",
which mishandled an unrelated BITCAST operation causing a selection
error.

llvm-svn: 323887

8cc636c5

Fix formatting for r323876. NFC · 12ed95e3
Diana Picus authored Jan 31, 2018
```
llvm-svn: 323878
```
12ed95e3

[ARM GlobalISel] Modernize LegalizerInfo. NFCI · 1d4421f6

Diana Picus authored Jan 31, 2018

Start using the new LegalizerInfo API introduced in r323681.

Keep the old API for opcodes that need Lowering in some circumstances
(G_FNEG and G_UREM/G_SREM).

llvm-svn: 323876

1d4421f6

[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations · 2e442a78

Pablo Barrio authored Jan 31, 2018

Summary:
Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves

In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.

Patch by Marten Svanfeldt.

Reviewers: fhahn, pbarrio

Reviewed By: pbarrio

Subscribers: efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42574

llvm-svn: 323869

2e442a78

[SystemZ] Check the bitwidth before calling isInt/isUInt. · cc5fe736

Jonas Paulsson authored Jan 31, 2018

Since these methods will assert if the integer does not fit into 64 bits,
it is necessary to do this check before calling them in
supportedAddressingMode().

Review: Ulrich Weigand.
llvm-svn: 323866

cc5fe736

[ARM] Armv8.2-A FP16 code generation (part 2/3) · 98d5359e

Sjoerd Meijer authored Jan 31, 2018

Half-precision arguments and return values are passed as if it were an int or
float for ARM. This results in truncates and bitcasts to/from i16 and f16
values, which are legalized very early to stack stores/loads. When FullFP16 is
enabled, we want to avoid codegen for these bitcasts as it is unnecessary and
inefficient.

Differential Revision: https://reviews.llvm.org/D42580

llvm-svn: 323861

98d5359e

[PowerPC] Return true in enableMultipleCopyHints(). · e6a8329e

Jonas Paulsson authored Jan 31, 2018

Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Nemanja Ivanovic
llvm-svn: 323858

e6a8329e

[ARM] Allow the scheduler to clone a node with glue to avoid a copy CPSR

↔

GPR. · aea42087

Roger Ferrer Ibanez authored Jan 31, 2018

In Thumb 1, with the new ADDCARRY / SUBCARRY the scheduler may need to do
copies CPSR ↔ GPR but not all Thumb1 targets implement them.

The schedule can attempt, before attempting a copy, to clone the instructions
but it does not currently do that for nodes with input glue. In this patch we
introduce a target-hook to let the hook decide if a glued machinenode is still
eligible for copying. In this case these are ARM::tADCS and ARM::tSBCS .

As a follow-up of this change we should actually implement the copies for the
Thumb1 targets that do implement them and restrict the hook to the targets that
can't really do such copy as these clones are not ideal.

This change fixes PR35836.

Differential Revision: https://reviews.llvm.org/D42051

llvm-svn: 323857

aea42087

[RDF] Clear the renamable flag when copy propagating reserved registers · 11985643
Krzysztof Parzyszek authored Jan 30, 2018
```
llvm-svn: 323831
```
11985643

Jan 30, 2018

[Hexagon] Handle truncates in polynomial multiply idiom recognition · 5d9844f4

Krzysztof Parzyszek authored Jan 30, 2018

This is in anticipation of https://reviews.llvm.org/D42424, which would
otherwise break one of the pmpy testcases.

llvm-svn: 323824

5d9844f4

[X86] Remove redundant check for hasAVX512 before calling hasBWI. NFC · d759f476
Craig Topper authored Jan 30, 2018
```
hasBWI implies hasAVX512.

llvm-svn: 323823
```
d759f476
[AArch64] Properly handle dllimport of variables when using fast-isel · 708498a1
Martin Storsjö authored Jan 30, 2018
```
Differential Revision: https://reviews.llvm.org/D42567

llvm-svn: 323810
```
708498a1

[Hexagon] Handle non-aligned offsets in globals in extender optimization · 39a9842f

Krzysztof Parzyszek authored Jan 30, 2018

Instructions like memd(r0+##global+1) are legal as long as the entire
address is properly aligned. Assuming that "global" is aligned at an
8-byte boundary, the expression "global+1" appears to be misaligned.
Handle such cases in HexagonConstExtenders, and make sure that any non-
extended offsets generated are still aligned accordingly.

llvm-svn: 323799

39a9842f

Revert: [Hexagon] Make sure that offset on globals matches alignment requirements · 96a28411

Krzysztof Parzyszek authored Jan 30, 2018

This reverts r323562, since it wasn't actually necessary. Constant-
extended offsets do not need to be aligned, as long as the effective
address is aligned.

Keep the testcase, with a modification which checks that such offsets
are not unnecessarily avoided.

llvm-svn: 323798

96a28411

[X86][XOP] Update isVectorShiftByScalarCheap with cases covered by XOP · 073f089c

Simon Pilgrim authored Jan 30, 2018

Similar to D42437, XOP supports variable shift for v16i8/v8i16/v4i32/v2i64 types.

Differential Revision: https://reviews.llvm.org/D42526

llvm-svn: 323797

073f089c

[AMDGPU] isRenamable fixes to support copy forwarding · 1d531013

Geoff Berry authored Jan 30, 2018

Mark more opcodes as hasExtraSrcRegAllocReq so that their operands will
be marked as not renamable, to avoid copy forwarding violating the
constraint that only one operand may use the constant bus.

These changes fix a few mis-compiles when copy forwarding is enabled in
MachineCopyPropagation by D41835 (and were reviewed as part of that change).

llvm-svn: 323794

1d531013

[AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output." · 94ae3b2f

Mark Searles authored Jan 30, 2018

Patch caused a buildbot failure; arg; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/17373/s\
teps/build_Lld/logs/stdio :
        /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1563:18: error: unused variable 'InstCnt' [-Werror,-Wunused-variable]
          static int32_t InstCnt = 0;
                                              "
This reverts commit 4f4a7d61e306b67044d9f16bc2016fee806bc2cc.

llvm-svn: 323791

94ae3b2f

[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output. · d6d5a257

Mark Searles authored Jan 30, 2018

-amdgpu-waitcnt-forcezero={1|0}  Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-amdgpu-waitcnt-forceexp=<n>  Force emit a s_waitcnt expcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcevm=<n>   Force emit a s_waitcnt vmcnt(0) before the first <n> instrs

This patch was pushed ( abb190fd51cd2f9a9eef08c024e109f7f7e909fc ), which caused a buildbot failure, reverted ( 6227480d74da507cf8e1b4bcaffbdb9fb875b4b8 ), and then updated to fix buildbot failures (this patch).

Differential Revision: https://reviews.llvm.org/D40091

llvm-svn: 323788

d6d5a257

AMDGPU/SI: Add decoding in the GFX80_UNPACKED decoding namespace. · 0905870f
Changpeng Fang authored Jan 30, 2018
```
Reviewer:
  Dmitry (dp).

Differential Revision:
  https://reviews.llvm.org/D42596

llvm-svn: 323785
```
0905870f

[AArch64] Add new target feature to fuse address generation with load or store · f1d01645

Evandro Menezes authored Jan 30, 2018

This feature enables the fusion of the address generation and a
corresponding load or store together.

Differential revision: https://reviews.llvm.org/D42393

llvm-svn: 323782

f1d01645

[mips] Fix incorrect sign extension for fpowi libcall · daaeaba6

Simon Dardis authored Jan 30, 2018

PR36061 showed that during the expansion of ISD::FPOWI, that there
was an incorrect zero extension of the integer argument which for
MIPS64 would then give incorrect results. Address this with the
existing mechanism for correcting sign extensions.

This resolves PR36061.

Thanks to James Cowgill for reporting the issue!

Reviewers: atanasyan, hfinkel

Differential Revision: https://reviews.llvm.org/D42537

llvm-svn: 323781

daaeaba6

Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass to mark · 1f59ae31

Zaara Syeda authored Jan 30, 2018

candidates with coldcc attribute.

This recommits r322721 reverted due to sanitizer memory leak build bot failures.

Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 323778

1f59ae31

[AArch64] Add new target feature to handle cheap as move for Exynos · 07c78eee

Evandro Menezes authored Jan 30, 2018

This feature enables special handling of cheap as move in the existing
custom handling specifically for Exynos processors.

Differential revision: https://reviews.llvm.org/D42387

llvm-svn: 323774

07c78eee

[AArch64] Add pipeline model for Exynos M3 · 9f9daa1f

Evandro Menezes authored Jan 30, 2018

Add the scheduling and cost model for Exynos M3.

Differential revision: https://reviews.llvm.org/D42387

llvm-svn: 323773

9f9daa1f