Commits · 80c6ec11d9a32a6c34adf478a91ab7d794cdba10 · Roger Ferrer / llvm-epi

Aug 09, 2018

[GlobalOpt] Don't apply fastcc if it would break inalloca invariants · 80c6ec11

Reid Kleckner authored Aug 09, 2018

The inalloca parameter has to be the only parameter passed in memory.
Changing the convention to fastcc can break that.

At some point we should teach global opt how to optimize ABI attributes
like inalloca and maybe byval. These attributes are mainly used to match
C ABIs. They are harder for LLVM to optimize and they don't always
generate the best code.

Fixes PR38487

llvm-svn: 339360

80c6ec11

[SelectionDAG] try harder to convert funnel shift to rotate · 15d1501a

Sanjay Patel authored Aug 09, 2018

Similar to rL337966 - if the DAGCombiner's rotate matching was 
working as expected, I don't think we'd see any test diffs here.

AArch only goes right, and PPC only goes left. 
x86 has both, so no diffs there.

Differential Revision: https://reviews.llvm.org/D50091

llvm-svn: 339359

15d1501a

extend folding fsub/fadd to fneg for FMF · ca382546

Michael Berg authored Aug 09, 2018

Summary: This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation

Reviewers: spatel, wristow, arsenm

Reviewed By: spatel

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D50195

llvm-svn: 339357

ca382546

[ARM] Adjust the feature set for Exynos · 8c436627

Evandro Menezes authored Aug 09, 2018

Enable `FeatureZCZeroing`, `FeatureHasSlowFPVMLx`, `FeatureExpandMLx`,
`FeatureProfUnpredicate`, `FeatureSlowVDUP32`, `FeatureSlowVGETLNi32`,
`FeatureSplatVFPToNeon`, `FeatureHasRetAddrStack`, `FeatureSlowFPBrcc` for
all Exynos processors.

llvm-svn: 339356

8c436627

[ARM] Replace processor check with feature · 9a92fe0c

Evandro Menezes authored Aug 09, 2018

Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a
processor check.  Otherwise, NFC.

llvm-svn: 339354

9a92fe0c

[MC][PredicateExpander] Extend the grammar to support simple switch and return statements. · f3bde048

Andrea Di Biagio authored Aug 09, 2018

This patch introduces tablegen class MCStatement.

Currently, an MCStatement can be either a return statement, or a switch
statement.

```
MCStatement:
   MCReturnStatement
   MCOpcodeSwitchStatement
```

A MCReturnStatement expands to a return statement, and the boolean expression
associated with the return statement is described by a MCInstPredicate.

An MCOpcodeSwitchStatement is a switch statement where the condition is a check
on the machine opcode. It allows the definition of multiple checks, as well as a
default case. More details on the grammar implemented by these two new
constructs can be found in the diff for TargetInstrPredicates.td.

This patch makes it easier to read the body of auto-generated TargetInstrInfo
predicates.

In future, I plan to reuse/extend the MCStatement grammar to describe more
complex target hooks. For now, this is just a first step (mostly a minor
cosmetic change to polish the new predicates framework).

Differential Revision: https://reviews.llvm.org/D50457

llvm-svn: 339352

f3bde048

[MC] Remove PhysRegSize from MCRegisterClass · c8b782ce

Bjorn Pettersson authored Aug 09, 2018

Summary:
The interface to get size and spill size of a register
was moved from MCRegisterInfo to TargetRegisterInfo over
a year ago. Afaik the old interface has bee around
to give out-of-tree targets a chance to adapt to the
new interface.

One problem with the old MCRegisterClass::PhysRegSize was that
it represented the size of a register as "size in bits" / 8.
So a register had to be a multiple of eight bits wide for the
size to be correct (and the byte size for the target needed to
be eight bits).

Reviewers: kparzysz, qcolombet

Reviewed By: kparzysz

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47199

llvm-svn: 339350

c8b782ce

[InstCombine] reduce code duplication; NFC · ebec4204
Sanjay Patel authored Aug 09, 2018
```
llvm-svn: 339349
```
ebec4204
[TargetLowering] Add BuildSDIVPattern helper to BuildExactSDIV (NFCI). · a9f95429
Simon Pilgrim authored Aug 09, 2018
```
As requested in D50392, pull the magic constant calculations out into a helper function.

llvm-svn: 339346
```
a9f95429
[ARM] FP16: codegen support for VTRN · 806f70d2
Sjoerd Meijer authored Aug 09, 2018
```
Differential Revision: https://reviews.llvm.org/D50454

llvm-svn: 339340
```
806f70d2

[X86][SSE] Remove PMULDQ/PMULUDQ by zero · 511c3fc5

Simon Pilgrim authored Aug 09, 2018

Exposed by D50328

Differential Revision: https://reviews.llvm.org/D50328

llvm-svn: 339337

511c3fc5

[X86][SSE] Combine (some) target shuffles with multiple uses · 01ae462f

Simon Pilgrim authored Aug 09, 2018

As discussed on D41794, we have many cases where we fail to combine shuffles as the input operands have other uses.

This patch permits these shuffles to be combined as long as they don't introduce additional variable shuffle masks, which should reduce instruction dependencies and allow the total number of shuffles to still drop without increasing the constant pool.

However, this may mean that some memory folds may no longer occur, and on pre-AVX require the occasional extra register move.

This also exposes some poor PMULDQ/PMULUDQ codegen which was doing unnecessary upper/lower calculations which will in fact fold to zero/undef - the fix will be added in a followup commit.

Differential Revision: https://reviews.llvm.org/D50328

llvm-svn: 339335

01ae462f

[X86] Improved sched models for X86 XCHG*rr and XADD*rr instructions. · 24f63bcb
Andrew V. Tischenko authored Aug 09, 2018
```
Differential Revision: https://reviews.llvm.org/D49861

llvm-svn: 339321
```
24f63bcb

[NVPTX] Select atomic loads and stores · 20526bf4

Jonas Hahnfeld authored Aug 09, 2018

According to PTX ISA .volatile has the same memory synchronization
semantics as .relaxed.sys, so it can be used to implement monotonic
atomic loads and stores. This is important for OpenMP's atomic
construct where
 - 'read's and 'write's are lowered to atomic loads and stores, and
 - an update of float or double types are lowered into a cmpxchg loop.
(Note that PTX could do better because it has atom.add.f{32,64} but
LLVM's atomicrmw instruction only allows integer types.)

Higher levels of atomicity (like acquire and release) need additional
synchronization properties which were added with PTX ISA 6.0 / sm_70.
So using these instructions still results in an error.

Differential Revision: https://reviews.llvm.org/D50391

llvm-svn: 339316

20526bf4

[RISCV] Add "lla" pseudo-instruction to assembler · 577a97e2

Roger Ferrer Ibanez authored Aug 09, 2018

This pseudo-instruction is similar to la but uses PC-relative addressing
unconditionally. This is, la is only different to lla when using -fPIC. This
pseudo-instruction seems often forgotten in several specs but it is definitely
mentioned in binutils opcodes/riscv-opc.c. The semantics are defined both in
page 37 of the "RISC-V Reader" book but also in function macro found in
gas/config/tc-riscv.c.

This is a very first step towards adding PIC support for Linux in the RISC-V
backend.

The lla pseudo-instruction expands to a sequence of auipc + addi with a couple
of pc-rel relocations where the second points to the first one. This is
described in
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#pc-relative-symbol-addresses

For now, this patch only introduces support of that pseudo instruction at the
assembler parser.

Differential Revision: https://reviews.llvm.org/D49661

llvm-svn: 339314

577a97e2

[NFC] ConstantMerge: don't insert when find should be used · 3f270336

JF Bastien authored Aug 09, 2018

Summary: DenseMap's operator[] performs an insertion if the entry isn't found. The second phase of ConstantMerge isn't trying to insert anything: it's just looking to see if the first phased performed an insertion. Use find instead, avoiding insertion of every single global initializer in the map of constants. This has the side-effect of making all entries in CMap non-null (because only global declarations would have null initializers, and that would be a bug).

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D50476

llvm-svn: 339309

3f270336

[LICM] Add an assert to ensure all instruction types needing aliasing are handled [NFC] · 22b20a09
Philip Reames authored Aug 09, 2018
```
llvm-svn: 339308
```
22b20a09
[DWARF] Verifier now handles .debug_types sections. · 508b0815
Paul Robinson authored Aug 08, 2018
```
Differential Revision: https://reviews.llvm.org/D50466

llvm-svn: 339302
```
508b0815

[DAGCombiner] loosen constraints for fsub+fadd fold · e47dc1a4

Sanjay Patel authored Aug 08, 2018

isNegatibleForFree() should not matter here (as the test diffs show)
because it's always a win to replace an fsub+fadd with fneg. The
problem in D50195 persists because either (1) we are doing these
folds in the wrong order or (2) we're missing another fold for fadd.

llvm-svn: 339299

e47dc1a4

[DAGCombiner] move fadd simplification ahead of other folds · e327266d

Sanjay Patel authored Aug 08, 2018

  
I don't know if it's possible to expose this diff in a test,
but we should always try simplifications (no new nodes created)
before more complicated transforms for efficiency (similar to
what we do in IR).

llvm-svn: 339298

e327266d

[ADT] Normalize empty triple components · 7b274544

Petr Hosek authored Aug 08, 2018

LLVM triple normalization is handling "unknown" and empty components
differently; for example given "x86_64-unknown-linux-gnu" and
"x86_64-linux-gnu" which should be equivalent, triple normalization
returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's
config.sub returns "x86_64-unknown-linux-gnu" for both
"x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the
triple normalization to behave the same way, replacing empty triple
components with "unknown".

This addresses PR37129.

Differential Revision: https://reviews.llvm.org/D50219

llvm-svn: 339294

7b274544

Aug 08, 2018

[DWARF] Unclamp line table version on Darwin for v5 and later. · 49ff4d90

Jonas Devlieghere authored Aug 08, 2018

On Darwin we pin the DWARF line tables to version 2. Stop doing so for
DWARF v5 and later.

Differential revision: https://reviews.llvm.org/D49381

llvm-svn: 339288

49ff4d90

[ARM] Avoid spilling lr with Thumb1 tail calls. · 5b45a390

Eli Friedman authored Aug 08, 2018

Normally, if any registers are spilled, we prefer to spill lr on Thumb1
so we can fold the "bx lr" into the "pop".  However, if there are tail
calls involved, restoring lr is expensive, so skip the optimization in
that case.

The spill of r7 in the new test also isn't necessary, but that's
mostly orthogonal to this patch. (It's the same code in
ARMFrameLowering, but it's not related to tail calls.)

Differential Revision: https://reviews.llvm.org/D49459

llvm-svn: 339283

5b45a390

[MS Demangler] Create a new backref context for template instantiations. · d346cba9

Zachary Turner authored Aug 08, 2018

Template manglings use a fresh back-referencing context, so we
need to do the same.  This fixes several existing tests which are
marked as FIXME, so those are now actually run.

llvm-svn: 339275

d346cba9

revert '[CodeGen] emit inline asm clobber list warnings for reserved' · 083fb1a2
Ties Stuij authored Aug 08, 2018
```
llvm-svn: 339274
```
083fb1a2
[Hexagon] Diagnose misaligned absolute loads and stores · 1df70591
Krzysztof Parzyszek authored Aug 08, 2018
```
Differential Revision: https://reviews.llvm.org/D50405

llvm-svn: 339272
```
1df70591

AMDGPU: Error more gracefully on libcalls · 935f3b70

Matt Arsenault authored Aug 08, 2018

I think this is the only situation where the callsite
will have a null instruction.

llvm-svn: 339271

935f3b70

AMDGPU: Fix shifts for i128 · e719139b
Matt Arsenault authored Aug 08, 2018
```
llvm-svn: 339270
```
e719139b

[WASM] Fix overflow when reading custom section · 8511777d

Jonas Devlieghere authored Aug 08, 2018

When reading a custom WASM section, it was possible that its name
extended beyond the size of the section. This resulted in a bogus value
for the section size due to the size overflowing.

Fixes heap buffer overflow detected by OSS-fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=8190

Differential revision: https://reviews.llvm.org/D50387

llvm-svn: 339269

8511777d

[DebugInfo] Fine tune emitting flags as part of the producer · caacedb0

Jonas Devlieghere authored Aug 08, 2018

When using APPLE extensions, don't duplicate the compiler invocation's
flags both in AT_producer and AT_APPLE_flags.

Differential revision: https://reviews.llvm.org/D50453

llvm-svn: 339268

caacedb0

[InstCombine] fold fadd+fsub with common operand · fe839695
Sanjay Patel authored Aug 08, 2018
```
This is a sibling to the simplify from:
https://reviews.llvm.org/rL339174

llvm-svn: 339267
```
fe839695
[InstCombine] fold fsub+fsub with common operand · 2054dd79
Sanjay Patel authored Aug 08, 2018
```
This is a sibling to the simplify from:
rL339171

llvm-svn: 339266
```
2054dd79

[DAG] DAGCombiner::visitSDIVLike - remove unnecessary isConstOrConstSplat call. NFCI. · 4d4220fa

Simon Pilgrim authored Aug 08, 2018

The isConstOrConstSplat result is only used in a ISD::matchUnaryPredicate call which can perform the equivalent iteration just as quickly.

llvm-svn: 339262

4d4220fa

[PowerPC] Improve codegen for vector loads using scalar_to_vector · b2595b98

Zaara Syeda authored Aug 08, 2018

This patch aims to improve the codegen for vector loads involving the
scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used
for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X)
to utilize:

LXSD and LXSDX for i64 and f64
LXSIWAX for i32 (sign extension to i64)
LXSIWZX for i32 and f64

Committing on behalf of Amy Kwan.
Differential Revision: https://reviews.llvm.org/D48950

llvm-svn: 339260

b2595b98

[CodeGen] emit inline asm clobber list warnings for reserved · 52f3631f

Ties Stuij authored Aug 08, 2018

Summary:
Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results.

For example:


```
extern int bar(int[]);

int foo(int i) {
  int a[i]; // VLA
  asm volatile(
      "mov r7, #1"
    :
    :
    : "r7"
  );

  return 1 + bar(a);
}
```

Compiled for thumb, this gives:
```
$ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb
...
foo:
        .fnstart
@ %bb.0:                                @ %entry
        .save   {r4, r5, r6, r7, lr}
        push    {r4, r5, r6, r7, lr}
        .setfp  r7, sp, #12
        add     r7, sp, #12
        .pad    #4
        sub     sp, #4
        movs    r1, #7
        add.w   r0, r1, r0, lsl #2
        bic     r0, r0, #7
        sub.w   r0, sp, r0
        mov     sp, r0
        @APP
        mov.w   r7, #1
        @NO_APP
        bl      bar
        adds    r0, #1
        sub.w   r4, r7, #12
        mov     sp, r4
        pop     {r4, r5, r6, r7, pc}
...
```

r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block.

This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward.  Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now.

The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter.

If we find a reserved register, we print a warning:
```
repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm]
      "mov r7, #1"
      ^
```

Reviewers: eli.friedman, olista01, javed.absar, efriedma

Reviewed By: efriedma

Subscribers: efriedma, eraman, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D49727

llvm-svn: 339257

52f3631f

[RISCV] Add mnemonic alias: move, sbreak and scall. · 07224dfb

Alex Bradbury authored Aug 08, 2018

Further improve compatibility with the GNU assembler.

Differential Revision: https://reviews.llvm.org/D50217
Patch by Kito Cheng.

llvm-svn: 339255

07224dfb

[TargetLowering] BuildUDIV - Add support for divide by one (PR38477) · 164e8b0b

Simon Pilgrim authored Aug 08, 2018

Provide a pass-through of the numerator for divide by one cases - this is the same approach we take in DAGCombiner::visitSDIVLike.

I investigated whether we could achieve this by magic MULHU/SRL values but nothing appeared to work as we don't have a way for MULHU(x,c) -> x

llvm-svn: 339254

164e8b0b

[RISCV] Add InstAlias definitions for add[w], and, xor, or, sll[w], srl[w],... · 7d8d87c1

Alex Bradbury authored Aug 08, 2018

[RISCV] Add InstAlias definitions for add[w], and, xor, or, sll[w], srl[w], sra[w], slt and sltu with immediate

Match the GNU assembler in supporting immediate operands for these 
instructions even when the reg-reg mnemonic is used.

Differential Revision: https://reviews.llvm.org/D50046
Patch by Kito Cheng.

llvm-svn: 339252

7d8d87c1

[InstCombine] fold fneg into constant operand of fmul/fdiv · a194b2d2

Sanjay Patel authored Aug 08, 2018

This accounts for the missing IR fold noted in D50195. We don't need any fast-math to enable the negation transform.
FP negation can always be folded into an fmul/fdiv constant to eliminate the fneg.

I've limited this to one-use to ensure that we are eliminating an instruction rather than replacing fneg by a
potentially expensive fdiv or fmul.

Differential Revision: https://reviews.llvm.org/D50417

llvm-svn: 339248

a194b2d2

[TargetLowering] Remove APInt divisor argument from BuildExactSDIV (NFCI). · e4a4cf5a

Simon Pilgrim authored Aug 08, 2018

As requested in D50392, this is a minor refactor to BuildExactSDIV to stop taking the uniform constant APInt divisor and instead extract it locally.

I also cleanup the operands and valuetypes to better match BuildUDiv (and BuildSDIV in the near future).

llvm-svn: 339246

e4a4cf5a