Commits · d041f8a9d0a23c9b4c03088dd650c651c8e7caf1 · Roger Ferrer / llvm-epi

Sep 25, 2018

Revert rL342916: [X86] Remove shift/rotate by CL memory (RMW) overrides · b56be79e

Simon Pilgrim authored Sep 25, 2018

As suggested by Craig Topper - I'm going to look at cleaning up the RMW sequences instead.

The uops are slightly different to the register variant, so requires a +1uop tweak

llvm-svn: 342969

b56be79e

[LoopUnroll] Add check to Latch's terminator in UnrollRuntimeLoopRemainder · 9108c2b9

David Green authored Sep 25, 2018

In this patch, I'm adding an extra check to the Latch's terminator in llvm::UnrollRuntimeLoopRemainder,
similar to how it is already done in the llvm::UnrollLoop.

The compiler would crash if this function is called with a malformed loop.

Patch by Rodrigo Caetano Rocha!

Differential Revision: https://reviews.llvm.org/D51486

llvm-svn: 342958

9108c2b9

[AMDGPU] restore r342722 which was reverted with r342743 · b4f2d1cb

Sameer Sahasrabuddhe authored Sep 25, 2018

[AMDGPU] lower-switch in preISel as a workaround for legacy DA

Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.

As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.

llvm-svn: 342956

b4f2d1cb

Revert rL342953 "[llvm-exegesis] Add lit tests." · 6d92c198
Clement Courbet authored Sep 25, 2018
```
We also need to make sure that we're on the right subtarget.

llvm-svn: 342955
```
6d92c198

[llvm-exegesis] Add lit tests. · 7f1322dc

Clement Courbet authored Sep 25, 2018

Summary:
Right now we only have unit tests. This will allow testing the whole
tool. Even though We can't really check actual values, this will avoid
regressions such as PR39055.

Reviewers: gchatelet, alexshap

Subscribers: mgorny, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D52407

llvm-svn: 342953

7f1322dc

[DebugInfo] Do not generate address info for removed debug labels. · 9c246362

Hsiangkai Wang authored Sep 25, 2018

In some senario, LLVM will remove llvm.dbg.labels in IR. For example,
when the labels are in unreachable blocks, these labels will not
be generated in LLVM IR. In the case, these debug labels will have
address zero as their address. It is not legal address for debugger to
set breakpoints or query sources. So, the patch inhibits the address info
(DW_AT_low_pc) of removed labels.

Differential Revision: https://reviews.llvm.org/D51908

llvm-svn: 342943

9c246362

[WebAssembly] SIMD sqrt · 12da0f9c

Thomas Lively authored Sep 25, 2018

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52387

llvm-svn: 342937

12da0f9c

[AMDGPU] Remove useless check from test. NFC. · 14fefe7f

Stanislav Mekhanoshin authored Sep 25, 2018

The check for assignment of zero is practically useless
while the assignment moves around with different scheduling.

llvm-svn: 342935

14fefe7f

[X86] Don't create FILD ISD nodes when X87 is disabled. · 9ce5da7b

Craig Topper authored Sep 25, 2018

The included test case previously asserted because the type legalizer tried to soften the FILD ISD node.

Fixes PR38819.

llvm-svn: 342934

9ce5da7b

[WebAssembly][NFC] Fix hardcoded stack indices in tests · 58615365

Thomas Lively authored Sep 24, 2018

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52388

llvm-svn: 342928

58615365

[hwasan] Record and display stack history in stack-based reports. · 090f0f95

Evgeniy Stepanov authored Sep 24, 2018

Summary:
Display a list of recent stack frames (not a stack trace!) when
tag-mismatch is detected on a stack address.

The implementation uses alignment tricks to get both the address of
the history buffer, and the base address of the shadow with a single
8-byte load. See the comment in hwasan_thread_list.h for more
details.

Developed in collaboration with Kostya Serebryany.

Reviewers: kcc

Subscribers: srhines, kubamracek, mgorny, hiraditya, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D52249

llvm-svn: 342923

090f0f95

Revert "[hwasan] Record and display stack history in stack-based reports." · 20c4999e
Evgeniy Stepanov authored Sep 24, 2018
```
This reverts commit r342921: test failures on clang-cmake-arm* bots.

llvm-svn: 342922
```
20c4999e

Sep 24, 2018

[hwasan] Record and display stack history in stack-based reports. · 9043e17e

Evgeniy Stepanov authored Sep 24, 2018

Summary:
Display a list of recent stack frames (not a stack trace!) when
tag-mismatch is detected on a stack address.

The implementation uses alignment tricks to get both the address of
the history buffer, and the base address of the shadow with a single
8-byte load. See the comment in hwasan_thread_list.h for more
details.

Developed in collaboration with Kostya Serebryany.

Reviewers: kcc

Subscribers: srhines, kubamracek, mgorny, hiraditya, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D52249

llvm-svn: 342921

9043e17e

Re-submitting changes in D51550 because it failed to patch. · e9437480

Christy Lee authored Sep 24, 2018

Reviewers: javed.absar, trentxintong, courbet

Reviewed By: trentxintong

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52433

llvm-svn: 342919

e9437480

[X86] Remove shift/rotate by CL memory (RMW) overrides · 0b4ad759
Simon Pilgrim authored Sep 24, 2018
```
The uops are slightly different to the register variant, so requires a +1uop tweak

llvm-svn: 342916
```
0b4ad759

[Power9] [LLVM] Add __float128 exponent GET and SET builtins · b5305771

Stefan Pintilie authored Sep 24, 2018

Added

__builtin_vsx_scalar_extract_expq
__builtin_vsx_scalar_insert_exp_qp

Builtins should behave the same way as in GCC.

Differential Revision: https://reviews.llvm.org/D48185

llvm-svn: 342910

b5305771

[X86][AVX] Add truncation as shuffle test for PR31451 · 51cbd838
Simon Pilgrim authored Sep 24, 2018
```
llvm-svn: 342908
```
51cbd838
Reland r342494 after fixing LIT checks. · bf112ea2
Christy Lee authored Sep 24, 2018
```
llvm-svn: 342907
```
bf112ea2
[InstCombine] add/move tests for extractelement; NFC · 7b86bc22
Sanjay Patel authored Sep 24, 2018
```
llvm-svn: 342905
```
7b86bc22

[Thumb1] Any imm8 should have cost of 1 · 05b46dc3

Zhaoshi Zheng authored Sep 24, 2018

A simple MOVS rd, imm8 can materialize [-128, 127] in signed i8 type or
[0, 255] in unsigned i8 type on Thumb1.

Differential Revision: https://reviews.llvm.org/D52257

llvm-svn: 342898

05b46dc3

[New PM][PassInstrumentation] IR printing support for New Pass Manager · 662e5686

Fedor Sergeev authored Sep 24, 2018

Implementing -print-before-all/-print-after-all/-filter-print-func support
through PassInstrumentation callbacks.

- PrintIR routines implement printing callbacks.

- StandardInstrumentations class provides a central place to manage all
  the "standard" in-tree pass instrumentations. Currently it registers
  PrintIR callbacks.

Reviewers: chandlerc, paquette, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D50923

llvm-svn: 342896

662e5686

[X86] Split WriteIMul into 8/16/32/64 implementations (PR36931) · 00865a48

Simon Pilgrim authored Sep 24, 2018

Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases.

This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions.

llvm-svn: 342892

00865a48

[Arm][AsmParser] Restrict register list size for VSTM/VLDM · ab7f9b17

Luke Cheeseman authored Sep 24, 2018

- The assembler accepts VSTM/VLDM with register lists (specifically double registers lists) with more than 16 registers specified
- The Arm architecture reference manual says this instruction must not contain more than 16 registers when the registers are doubleword registers
- This addresses one of the concerns in https://bugs.llvm.org/show_bug.cgi?id=38389

Differential Revision: https://reviews.llvm.org/D52082

llvm-svn: 342891

ab7f9b17

[DAGCombiner] use UADDO to optimize saturated unsigned add · 2c901742

Sanjay Patel authored Sep 24, 2018

This is a preliminary step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

If we have an 'add' instruction that sets flags, we can use that to eliminate an
explicit compare instruction or some other instruction (cmn) that sets flags for
use in the later select.

As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively
reversing an IR icmp canonicalization that replaces a variable operand with a
constant:
https://rise4fun.com/Alive/V1Q

But we're not using 'uaddo' in those cases via DAG transforms. This happens in
CGP after D8889 without checking target lowering to see if the op is supported.
So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with
"using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned
saturated add and converts to uaddo without checking target capabilities.

This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only
see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title
(unlike x86 which sees improvements for all sizes because all sizes are 'custom').
But the AArch code (like x86) looks better when translated to 'uaddo' in all cases.
So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO,
so this patch will fire on those tests.

Another possibility given the existing behavior: we could remove the legal-or-custom
check altogether because we're assuming that a UADDO sequence is canonical/optimal
before we ever reach here. But that seems like a bug to me. If the target doesn't
have an add-with-flags op, then it's not likely that we'll get optimal DAG combining
using a UADDO node. This is similar justification for why we don't canonicalize IR to
the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first
place.

Differential Revision: https://reviews.llvm.org/D51929

llvm-svn: 342886

2c901742

[Mips][FastISel] Fix selectBranch on icmp i1 · f9808c5f

Petar Jovanovic authored Sep 24, 2018

The r337288 tried to fix result of icmp i1 when its input is not sanitized
by falling back to DagISel. While it now produces the correct result for
bit 0, the other bits can still hold arbitrary value which is not supported
by MipsFastISel branch lowering. This patch fixes the issue by falling back
to DagISel in this case.

Patch by Dragan Mladjenovic.

Differential Revision: https://reviews.llvm.org/D52045

llvm-svn: 342884

f9808c5f

[PowerPC] Support operand modifier 'x' in inline asm · edefda48

Zaara Syeda authored Sep 24, 2018

gcc uses operand modifier 'x' in inline asm for VSX registers.
Without this modifier, instructions which use VSX numbering for their
operands are printed as VMX registers. This patch adds support for the
operand modifier 'x'.

Differential Revision: https://reviews.llvm.org/D52244

llvm-svn: 342882

edefda48

[dsymutil] Set LSan blacklist whenever sanitizers are enabled. · 8a7cfc6c

Jonas Devlieghere authored Sep 24, 2018

LSan can be enabled by itself or as part of the address sanitizer.
Rather than checking the enabled sanitizers for both, just set the LSan
env options whenever a sanitizer is enabled.

llvm-svn: 342881

8a7cfc6c

[NFC][CodeGen][X86][AArch64] More tests for 'bit field extract' w/ constants · fb697d0f

Roman Lebedev authored Sep 24, 2018

It would be best to introduce ISD::BitFieldExtract,
because clearly more than one backend faces the same problem.
But for now let's solve this in the x86-specific DAG combine.

https://bugs.llvm.org/show_bug.cgi?id=38938

llvm-svn: 342880

fb697d0f

AMDGPU: Fix private handling for allowsMisalignedMemoryAccesses · f432011d

Matt Arsenault authored Sep 24, 2018

If the alignment is at least 4, this should report true.

Something still seems off with how < 4-byte types are
handled here though.

Fixing this seems to change how some combines get
to where they get, but somehow isn't changing the net
result.

llvm-svn: 342879

f432011d

Fix some missing opcodes in bcanalyzer · b53feca3
Matt Arsenault authored Sep 24, 2018
```
llvm-svn: 342878
```
b53feca3

[ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33 · d986ede3

Sjoerd Meijer authored Sep 24, 2018

A sequence of VMUL and VADD instructions always give the same or better
performance than a fused VMLA instruction on the Cortex-M4 and Cortex-M33.
Executing the VMUL and VADD back-to-back requires the same cycles, but
having separate instructions allows scheduling to avoid the hazard between
these 2 instructions.

Differential Revision: https://reviews.llvm.org/D52289

llvm-svn: 342874

d986ede3

[ARM][ARMLoadStoreOptimizer] · bda54bca

Luke Cheeseman authored Sep 24, 2018

- The load store optimizer is currently merging multiple loads/stores into VLDM/VSTM with more than 16 doubleword registers
- This is an UNPREDICTABLE instruction and shouldn't be done
- It looks like the Limit for how many registers included in a merge got dropped at some point so I am reintroducing it in this patch
- This fixes https://bugs.llvm.org/show_bug.cgi?id=38389

Differential Revision: https://reviews.llvm.org/D52085

llvm-svn: 342872

bda54bca

[deadargelim] Update dbg.value of 'unused' parameters · c451c9ef

Petar Jovanovic authored Sep 24, 2018

DeadArgElim pass marks unused function arguments as ‘undef’ without updating
existing dbg.values referring to it. As a consequence the debug info
metadata in the final executable was wrong.

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D51968

llvm-svn: 342871

c451c9ef

[ARM] bottom-top mul support ARMParallelDSP · a7b2405b

Sam Parker authored Sep 24, 2018

Originally committed in rL342210 but was reverted in rL342260 because
it was causing issues in vectorized code, because I had forgotten to
ensure that we're operating on scalar values.

Original commit message:

On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.

Differential Revision: https://reviews.llvm.org/D51983

llvm-svn: 342870

a7b2405b

[X86] Add 512-bit test cases to setcc-wide-types.ll. NFC · 2b810761
Craig Topper authored Sep 24, 2018
```
llvm-svn: 342860
```
2b810761
Fix asserts when linking wrong address space declarations · 9a71e806
Matt Arsenault authored Sep 24, 2018
```
llvm-svn: 342858
```
9a71e806

llvm-diff: Fix crash on anonymous functions · ce5f2034

Matt Arsenault authored Sep 24, 2018

Not sure what the correct behavior is for this.
Skip them and report how many there were.

llvm-svn: 342857

ce5f2034

Sep 23, 2018

[X86] ROR*mCL instruction models should match ROL*mCL etc. · 9202c9fb

Simon Pilgrim authored Sep 23, 2018

Confirmed with Craig Topper - fix a typo that was missing a Port4 uop for ROR*mCL instructions on some Intel models.

Yet another step on the scheduler model cleanup marathon......

llvm-svn: 342846

9202c9fb

[DAGCombiner][x86] extend decompose of integer multiply into shift/add with negation · 00279469

Sanjay Patel authored Sep 23, 2018

This is an alternative to https://reviews.llvm.org/D37896. We can't decompose 
multiplies generically without a target hook to tell us when it's profitable.

ARM and AArch64 may be able to remove some existing code that overlaps with
this transform.

This extends D52195 and may resolve PR34474: 
https://bugs.llvm.org/show_bug.cgi?id=34474
(still an open question about transforming legal vector multiplies, but we
could open another bug report for those)

llvm-svn: 342844

00279469

[X86] Added missing RCL/RCR schedule overrides to the generic SNB model · 19952add

Simon Pilgrim authored Sep 23, 2018

The SandyBridge model was missing schedule values for the RCL/RCR values - instead using the (incredibly optimistic) WriteShift (now WriteRotate) defaults.

I've added overrides with more realistic (slow) values, based on a mixture of Agner/instlatx64 numbers and what later Intel models do as well.

This is necessary to allow WriteRotate to be updated to remove other rotate overrides.

It'd probably be a good idea to investigate a WriteRotateCarry class at some point but its not high priority given the unusualness of these instructions.

llvm-svn: 342842

19952add