Commits · ae5ab2f40a35b87eb173b8b7e8738c589c3d3555 · Lorenzo Albano / LLVM bpEVL

May 21, 2020

[LegalizeDAG] Modify ExpandLegalINT_TO_FP to swap data for little/big endian... · ae5ab2f4

Craig Topper authored May 20, 2020

[LegalizeDAG] Modify ExpandLegalINT_TO_FP to swap data for little/big endian instead of the pointers.

Will make it easier to pass the pointer info and alignment
correctly to the loads/stores.

While there also make the i32 stores independent and use a token
factor to join before the load.

ae5ab2f4

Add CanonicalizeFreezeInLoops pass · d9a4a244

Juneyoung Lee authored May 08, 2020

Summary:
If an induction variable is frozen and used, SCEV yields imprecise result
because it doesn't say anything about frozen variables.

Due to this reason, performance degradation happened after
https://reviews.llvm.org/D76483 is merged, causing
SCEV yield imprecise result and preventing LSR to optimize a loop.

The suggested solution here is to add a pass which canonicalizes frozen variables
inside a loop. To be specific, it pushes freezes out of the loop by freezing
the initial value and step values instead & dropping nsw/nuw flags from instructions used by freeze.
This solution was also mentioned at https://reviews.llvm.org/D70623 .

Reviewers: spatel, efriedma, lebedev.ri, fhahn, jdoerfert

Reviewed By: fhahn

Subscribers: nikic, mgorny, hiraditya, javed.absar, llvm-commits, sanwou01, nlopes

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77523

d9a4a244

[AArch64] Fix unwind info generated by outliner. · b4f9b347

Eli Friedman authored May 19, 2020

The offsets were wrong. The result is now the same as what the compiler
would generate for a function that spills lr normally.

Differential Revision: https://reviews.llvm.org/D80238

b4f9b347

Make Value::getPointerAlignment() return an Align, not a MaybeAlign. · f26bdb53

Eli Friedman authored May 16, 2020

If we don't know anything about the alignment of a pointer, Align(1) is
still correct: all pointers are at least 1-byte aligned.

Included in this patch is a bugfix for an issue discovered during this
cleanup: pointers with "dereferenceable" attributes/metadata were
assumed to be aligned according to the type of the pointer.  This
wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
stop making this assumption. Frontends may need to be updated.  I
updated clang's handling of C++ references, and added a release note for
this.

Differential Revision: https://reviews.llvm.org/D80072

f26bdb53

[AArch64] Provide Darwin variants of most calling conventions · 161122ea

Francis Visoiu Mistrih authored Jan 31, 2020

With the new SVE stack layout, we now need to provide a Darwin variant
for all the calling conventions based on the main AAPCS CSR save order.

This also changes APCS_SwiftError to have a Darwin and a non-Darwin
version, assuming it could be used on other platforms these days, and
restricts the AArch64_CXX_TLS calling convention to Darwin.

Differential Revision: https://reviews.llvm.org/D73805

161122ea

[AMDGPU] Always expand ext/insertelement with divergent idx · 4eecf171

Stanislav Mekhanoshin authored May 15, 2020

Even though series of cmd/cndmask can produce quite a lot of
code that is still better than a loop. In case of doubles we
would even produce two loops.

Differential Revision: https://reviews.llvm.org/D80032

4eecf171

[LegalizeVectorTypes] Create correct memoperands in SplitVecRes_INSERT_SUBVECTOR. · 17bd86bc

Craig Topper authored May 20, 2020

Previously this code just used a default constructed
MachinePointerInfo. But we know the accesses are to a fixed stack
object or at least somewhere on the stack.

While there fix the alignment passed to the full vector load/stores.

I don't think this function is currently exercised in tree so I
don't know how to test it. I just noticed it when I removed
non-constant index support in this function.

Differential Revision: https://reviews.llvm.org/D80058

17bd86bc

May 20, 2020

Give microsoftDemangle() an outparam for how many input bytes were consumed. · bc1c3655

Nico Weber authored May 18, 2020

Demangling Itanium symbols either consumes the whole input or fails,
but Microsoft symbols can be successfully demangled with just some
of the input.

Add an outparam that enables clients to know how much of the input was
consumed, and use this flag to give llvm-undname an opt-in warning
on partially consumed symbols.

Differential Revision: https://reviews.llvm.org/D80173

bc1c3655

[InstCombine] `insertelement` is negatible if both sources are negatible · 55430f53

Roman Lebedev authored May 20, 2020

----------------------------------------
define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) {
%0:
  %t0 = sub <2 x i4> { 0, 0 }, %src
  %t1 = sub i4 0, %a
  %t2 = insertelement <2 x i4> %t0, i4 %t1, i32 %x
  %t3 = sub <2 x i4> %b, %t2
  ret <2 x i4> %t3
}
=>
define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) {
%0:
  %t2.neg = insertelement <2 x i4> %src, i4 %a, i32 %x
  %t3 = add <2 x i4> %t2.neg, %b
  ret <2 x i4> %t3
}
Transformation seems to be correct!

55430f53

[InstCombine] Negator: `extractelement` is negatible if src is negatible · ebed96fd

Roman Lebedev authored May 20, 2020

----------------------------------------
define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) {
%0:
  %t0 = sub <2 x i4> { 0, 0 }, %x
  call void @use_v2i4(<2 x i4> %t0)
  %t1 = extractelement <2 x i4> %t0, i32 %y
  %t2 = sub i4 %z, %t1
  ret i4 %t2
}
=>
define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) {
%0:
  %t0 = sub <2 x i4> { 0, 0 }, %x
  call void @use_v2i4(<2 x i4> %t0)
  %t1.neg = extractelement <2 x i4> %x, i32 %y
  %t2 = add i4 %t1.neg, %z
  ret i4 %t2
}
Transformation seems to be correct!

ebed96fd

[llvm] [CodeGen] [X86] Fix issues with v4i1 instruction selection · 645bba8d

aartbik authored May 19, 2020

Summary:
Fixes issue
https://bugs.llvm.org/show_bug.cgi?id=45995

Reviewers: mehdi_amini, nicolasvasilache, reidtatge, craig.topper, ftynse, bkramer

Reviewed By: craig.topper

Subscribers: RKSimon, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80231

645bba8d

Reland [X86] Codegen for preallocated · 8a887556

Arthur Eubanks authored Mar 16, 2020

See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689

8a887556

Revert "[X86] Codegen for preallocated" · b8cbff51
Arthur Eubanks authored May 20, 2020
```
This reverts commit 810567dc.

Some tests are unexpectedly passing
```
b8cbff51

[ProfileSummary] Refactor getFromMD to prepare for another optional field. NFC. · f9a6163f

Hiroshi Yamauchi authored May 19, 2020

Summary:
Rename 'i' to 'I'.
Factor out the optional field handling to getOptionalVal().
Split out of D79951.

Reviewers: davidxl

Subscribers: eraman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80230

f9a6163f

[X86] Codegen for preallocated · 810567dc

Arthur Eubanks authored Mar 16, 2020

See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689

810567dc

AMDGPU/GlobalISel: Fix splitting 64-bit extensions · e8f6b0e5
Matt Arsenault authored May 18, 2020
```
This was replicating the low bits into the high bits for G_ZEXT,
rather than using 0.
```
e8f6b0e5

[Target][ARM] Make Low Overhead Loops coexist with VPT blocks. · 835251f7

Pierre-vh authored Apr 08, 2020

Previously, the LowOverheadLoops pass couldn't handle VPT blocks
with conditions, or with multiple VCTPs. This patch improves the
LowOverheadLoops pass so it can handle those cases.

It also adds support for VCMPs before the VCTP.

Differential Revision: https://reviews.llvm.org/D78206

835251f7

[NFCI][CostModel] Refactor getIntrinsicInstrCost · 8cc911fa

Sam Parker authored May 20, 2020

Combine the two API calls into one by introducing a structure to hold
the relevant data. This has the added benefit of moving the boiler
plate code for arguments and flags, into the constructors. This is
intended to be a non-functional change, but the complicated web of
logic involved here makes it very hard to guarantee.

Differential Revision: https://reviews.llvm.org/D79941

8cc911fa

[yaml2obj] - Implement the "Offset" property for the Fill Chunk. · baf32259

Georgii Rymar authored May 18, 2020

Similar to a regular section chunk, a Fill should have this property.
This patch implements it.

Differential revision: https://reviews.llvm.org/D80190

baf32259

[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC). · bcbd26bf

Florian Hahn authored May 20, 2020

SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.

This patch was originally committed as b8a3c34e, but broke the
modules build, as LoopAccessAnalysis was using the Expander.

The code-gen part of LAA was moved to lib/Transforms recently, so this
patch can be landed again.

Reviewers: sanjoy.google, efriedma, reames

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D71537

bcbd26bf

[PowerPC] Enable machine verification for 3 passes · 3f376eca

Kang Zhang authored May 20, 2020

Summary:
For PowerPC, there are 3 passes has disabled the machine verification.
```
PPCTargetMachine.cpp:    addPass(&LiveVariablesID, false);
PPCTargetMachine.cpp:    addPass(createPPCEarlyReturnPass(), false);
PPCTargetMachine.cpp:  addPass(createPPCBranchSelectionPass(), false);
```
This patch is to enable machine verification for above three passes.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D79840

3f376eca

CommandFlags.h - remove unnecessary includes. NFC. · d9b9ce6c

Simon Pilgrim authored May 19, 2020

Replace with forward declarations and move necessary includes down to source files.

Exposes an implicit dependency on TargetMachine.h in llvm-opt-fuzzer.cpp

d9b9ce6c

[IR] Simplify BasicBlock::removePredecessor. NFCI. · e5fc9a36

Jay Foad authored May 18, 2020

This is the second attempt at landing this patch, after fixing the
KeepOneInputPHIs behaviour to also keep zero input PHIs.

Differential Revision: https://reviews.llvm.org/D80141

e5fc9a36

Revert "[IR] Simplify BasicBlock::removePredecessor. NFCI." · b42b30c3
Jay Foad authored May 20, 2020
```
This reverts commit 59f49f7e.

It was causing buildbot failures.
```
b42b30c3
[AMDGPU] Process V_MOV_B32_indirect in SET_GPR_IDX optimization · 677929e3
Stanislav Mekhanoshin authored May 19, 2020
```
Differential Revision: https://reviews.llvm.org/D80256
```
677929e3

[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression · 2b59e9f1

QingShan Zhang authored May 20, 2020

We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
However, during negating the expression, the cost might change as we are changing the DAG,
and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.

This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
and check the cost during negating the expression. It also reduce the duplicated code between
getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638

Reviewed By: RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D77319

2b59e9f1

AMDGPU: Annotate functions that have stack objects · 21d2884a

Matt Arsenault authored May 19, 2020

Relying on any MachineFunction state in the MachineFunctionInfo
constructor is hazardous, because the construction time is unclear and
determined by the first use. The function may be only partially
constructed, which is part of why we have many of these hacky string
attributes to track what we need for ABI lowering.

For SelectionDAG, all stack objects are created up-front before
calling convention lowering so stack objects are visible at
construction time. For GlobalISel, none of the IR function has been
visited yet and the allocas haven't been added to the MachineFrameInfo
yet. This should fix failing to set flat_scratch_init in GlobalISel
when needed.

This pass really needs to be turned into some kind of analysis, but I
haven't found a nice way use one here.

21d2884a

GlobalISel: Copy correct flags to select · 08ae9453

Matt Arsenault authored May 19, 2020

This was looking for a compare condition, and copying the compare
flags. I don't think this was ever correct outside of certain min/max
patterns which aren't checked, but this probably predates select
instructions having fast math flags.

08ae9453

AMDGPU: Fix DAG divergence for implicit function arguments · 074b8026

Matt Arsenault authored May 13, 2020

This should be directly implied from the register class, and there's
no need to special case live ins here. This was getting the wrong
answer for the queue ptr argument in callable functions, since it's
not an explicit IR argument and is always uniform.

Fixes not using scalar loads for the aperture in addrspacecast
lowering, and any other places that use implicit SGPR arguments.

074b8026

AMDGPU: Use member initializers in MFI · 61813b80
Matt Arsenault authored May 19, 2020

61813b80

[Hexagon] pX.new cannot be used with p3:0 as producer · cfba1a96

Brian Cain authored May 15, 2020

Writes to p3:0 do not produce new values, we should bar any .new
consumer trying to use it as a producer.

cfba1a96

May 19, 2020

GlobalISel: Remove unused include · e6658079
Matt Arsenault authored May 19, 2020

e6658079
CodeGen: Use Register · 4dad4914
Matt Arsenault authored May 19, 2020

4dad4914
[AArch64] Disable MachineOutliner on Windows. · 5d2c3a0b
Eli Friedman authored May 19, 2020
```
The handling of unwind info is broken, so disable it for now.
```
5d2c3a0b
Give helpers internal linkage. NFC. · 350dadaa
Benjamin Kramer authored May 19, 2020

350dadaa

[PowerPC][NFC] Cleanup load/store spilling code · 2e6e2758

Lei Huang authored May 11, 2020

Summary: Cleanup and commonize code used for spilling to the stack.

Reviewers: stefanp, nemanjai, #powerpc, kamaub

Reviewed By: nemanjai, #powerpc, kamaub

Subscribers: kamaub, hiraditya, wuzish, shchenz, llvm-commits, kbarton

Tags: #llvm, #powerpc

Differential Revision: https://reviews.llvm.org/D79736

2e6e2758

[WebAssembly] Fix bug in custom shuffle combine · 8a43d41a

Thomas Lively authored May 19, 2020

Summary:
The code previously assumed the source of the bitcast in the combined
pattern was a vector type, but this is not always true. This patch
adds a check to avoid an assertion failure in that case.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80164

8a43d41a

[WebAssembly] Implement i64x2.mul and remove i8x16.mul · 3181273b

Thomas Lively authored May 19, 2020

Summary:
This reflects changes in the spec proposal made since basic arithmetic
was first implemented.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80174

3181273b

[IR] Simplify BasicBlock::removePredecessor. NFCI. · 59f49f7e
Jay Foad authored May 18, 2020
```
Differential Revision: https://reviews.llvm.org/D80141
```
59f49f7e

[LVI] Don't require DominatorTree in LVI (NFC) · 5fae613a

Nikita Popov authored Mar 25, 2020

After D76797 the dominator tree is no longer used in LVI, so we
can remove it as a pass dependency, and also get rid of the
dominator tree enabling/disabling logic in JumpThreading.

Apart from cleaning up the code, this also clarifies LVI
cache consistency, in that the LVI cache can no longer
depend on whether the DT was or wasn't enabled due to
pending DT updates at any given time.

Differential Revision: https://reviews.llvm.org/D76985

5fae613a