Commits · 2e94c2bd75367a8a26ac71560963ecb9f0bf6bea · Lorenzo Albano / LLVM bpEVL

Oct 25, 2021

[CodeGen] Fix dependence breaking for tied operands · 7b102fcc
Danila Malyutin authored Aug 05, 2021
```
Differential Revision: https://reviews.llvm.org/D107582
```
7b102fcc

[DebugInfo][InstrRef] Track values fused into stack spills · ee3eee71

Jeremy Morse authored Oct 25, 2021

During register allocation, some instructions can have stack spills fused
into them. It means that when vregs are allocated on the stack we can
convert:

    SETCCr %0
    DBG_VALUE %0

to

    SETCCm %stack.0
    DBG_VALUE %stack.0

Unfortunately instruction referencing finds this harder: a store to the
stack doesn't have a specific operand number, therefore we don't substitute
the old operand for a new operand, and the location is dropped. This patch
implements a solution: just recognise the memory operand attached to an
instruction with a Special Number (TM), and record a substitution between
the old value and the new one.

This patch adds substitution code to InlineSpiller to record such fused
spills, and tracking in InstrRefBasedLDV to recognise such values, and
produce the value numbers for them. Everything to do with the movement of
stack-defined values is already handled in InstrRefBasedLDV.

Differential Revision: https://reviews.llvm.org/D111317

ee3eee71

[DebugInfo][NFC] Avoid a use-after-free · 2eb96e17

Jeremy Morse authored Oct 25, 2021

This patch swaps two lines -- the CurSucc reference can be invalidated
by the call to DFS.push_back, therefore that should happen last. The
usual hat-tip to asan for catching this.

This patch also swaps an ealier call to ToAdd.insert and DFS.push_back,
where a stable iterator (from successors()) is being used. This isn't
strictly necessary, but is good for consistency and avoiding readers
asking themselves why the two code portions have a different order.

2eb96e17

[DAGCombiner] make matching bit-hack form of usubsat more flexible · 6e46b66e

Sanjay Patel authored Oct 25, 2021

(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

As suggested in D112085, we can substitute 'xor' with 'add'
in this pattern, and it is logically equivalent:
https://alive2.llvm.org/ce/z/eJtWWC

We canonicalize to 'xor' in IR, but SDAG does not do that
(and it probably should not - https://llvm.org/PR52267 ), so
it is possible to see either pattern in codegen. Note that
'sub' is a another potential pattern, but that is
canonicalized to 'add' in DAGCombiner, so we don't need to
worry about that variation.

Differential Revision: https://reviews.llvm.org/D112377

6e46b66e

CodeGenPrep: remove all copies of GEP from list if there are duplicates. · f9089acc

Tim Northover authored Oct 25, 2021

Unfortunately ToT has changed enough from the revision where this actually
caused problems that the test no longer triggers an assertion failure.

f9089acc

Use llvm::any_of and llvm::none_of (NFC) · 4bd46501
Kazu Hirata authored Oct 24, 2021

4bd46501

Oct 24, 2021
- [llvm] Call *(Set|Map)::erase directly (NFC) · 1c35973c
  Kazu Hirata authored Oct 24, 2021
```
We can erase an item in a set or map without checking its membership
first.
```
  1c35973c
Oct 23, 2021
- Ensure newlines at the end of files (NFC) · d8e4170b
  Kazu Hirata authored Oct 23, 2021
  
  d8e4170b
- [llvm] Use StringRef::contains (NFC) · d14d7068
  Kazu Hirata authored Oct 23, 2021
  
  d14d7068
Oct 22, 2021

[ScheduleDAGInstrs] Call adjustSchedDependency in more cases · 2915889d

Jay Foad authored Oct 22, 2021

This removes a condition and the corresponding FIXME comment, because
the Hexagon assertion it refers to has apparently been fixed, probably
by D76134.

NFCI. This just gives targets the opportunity to adjust latencies that
were set to 0 by the generic code because they involve "implicit pseudo"
operands.

Differential Revision: https://reviews.llvm.org/D112306

2915889d

[DebugInfo][Instr] Track subregisters across stack spills/restores · e7084cea

Jeremy Morse authored Oct 22, 2021

Sometimes we generate code that writes to a subregister, then spills /
restores a super-register to the stack, for example:

    $eax = MOV32ri 0
    MOV64mr $rsp, 1, $noreg, 16, $noreg, $rax
    $rcx = MOV64rm $rsp, 1, $noreg, 8, $noreg

This patch takes a different approach: it adds another index to
MLocTracker that identifies a size/offset within a stack slot. A location
on the stack is then a pari of {FrameIndex, SlotNum}. Spilling and
restoring now involves pairing up the src/dest register numbers, and the
dest/src stack position to be transferred to/from. Location coverage
improves as a result, compile-time performance decreases, alas.

One limitation is that if a PHI occurs inside a stack slot:

    DBG_PHI %stack.0, 1

We don't know how large the resulting value is, and so might have
difficulty picking which value to use. DBG_PHI might need to be augmented
in the future with such a size.

Unit tests added ensure that spills and restores correctly transfer to
positions in the Location => Value map, and that different register classes
written to the stack will correctly clobber all other positions in the
stack slot.

Differential Revision: https://reviews.llvm.org/D112133

e7084cea

[LegalizeTypes] Only expand CTLZ/CTTZ/CTPOP during type promotion if the new type is legal. · 93139a3c

Craig Topper authored Oct 22, 2021

We might be promoting a large non-power of 2 type and the new type
may need to be split. Once we split it we may have a ctlz/cttz/ctpop
instruction for the split type.

I'm also concerned that we may create large shifts with shift amounts
that are too small.

93139a3c

[DAG] narrowExtractedVectorLoad - EXTRACT_SUBVECTOR indices are always constant · a5f56342

Simon Pilgrim authored Oct 22, 2021

EXTRACT_SUBVECTOR indices are always constant, we don't need to check for ConstantSDNode, we should just use getConstantOperandVal which will assert for the constant.

a5f56342

[DebugInfo][InstrRef] Add unit tests for transfer-function building · d9eebe3c

Jeremy Morse authored Oct 22, 2021

This patch adds some unit tests for the machine-location transfer-function
building parts of InstrRefBasedLDV: i.e., test that if we feed some MIR
into the transfer-function building code, does it create the correct
transfer function.

There are a number of minor defects that get corrected in the process:
 * The unit test was selecting the x86 (i.e. 32 bit) backend rather than
   x86_64's 64 bit backend,
 * COPY instructions weren't actually having their subregister values
   correctly represented in the transfer function. Subregisters were being
   defined by the COPY, rather than taking the value in the source register.
 * SP aliases were at risk of being clobbered, if an SP subregister was
   clobbered.

Differential Revision: https://reviews.llvm.org/D112006

d9eebe3c

[TargetLowering] Simplify the interface of expandABS. NFC · 04c184bb

Craig Topper authored Oct 22, 2021

Instead of returning a bool to indicate success and a separate
SDValue, return the SDValue and have the callers check if it is
null.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112331

04c184bb

[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if... · 0766aef3

Craig Topper authored Oct 22, 2021

[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later.

Expanding these requires multiple constants. If we promote during type
legalization when they'll end up getting expanded in LegalizeDAG, we'll
use larger constants. These constants may be harder to materialize.
For example, 64-bit constants on 64-bit RISCV are very expensive.

This is similar to what has already been done to BSWAP and BITREVERSE.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112268

0766aef3

[clang/llvm] Inclusive language: replace segregate with separate · 0bd6a9f2
Zarko Todorovski authored Oct 22, 2021

0bd6a9f2

[TargetLowering] Simplify the interface for expandCTPOP/expandCTLZ/expandCTTZ. · 996123e5

Craig Topper authored Oct 21, 2021

There is no need to return a bool and have an SDValue output
parameter. Just return the SDValue and let the caller check if it
is null.

I have another patch to add more callers of these so I thought
I'd clean up the interface first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112267

996123e5

[LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG. · ff37b110

Craig Topper authored Oct 21, 2021

By expanding early it allows the shifts to be custom lowered in
LegalizeVectorOps. Then a DAG combine is able to run on them before
LegalizeDAG handles the BUILD_VECTORS for the masks used.

v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16
shift. The BITREVERSE expansion applied an AND mask before SHL ops and
after SRL ops. This was done to share the same mask constant for both shifts.
It looks like this patch allows DAG combine to remove the AND mask added
after v16i8 SHL by X86 lowering. This maintains the mask sharing that
BITREVERSE was trying to achieve. Prior to this patch it looks like
we kept the mask after the SHL instead which required an extra constant
pool or a PANDN to invert it.

This is dependent on D112248 because RISCV will end up scalarizing the BSWAP
portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in
LegalizeVectorOps first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112254

ff37b110

Oct 21, 2021

[TargetLowering][RISCV] Prevent scalarization of fixed vector bswap. · 458ed5fc

Craig Topper authored Oct 21, 2021

It's better to do the ands, shifts, ors in the vector domain than
to scalarize it and do those operations on each element.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112248

458ed5fc

[DebugInfo] Support typedef with btf_decl_tag attributes · f6811cec

Yonghong Song authored Sep 20, 2021

Clang patch ([1]) added support for btf_decl_tag attributes with typedef
types. This patch added llvm support including dwarf generation.
For example, for typedef
   typedef unsigned * __u __attribute__((btf_decl_tag("tag1")));
   __u u;
the following shows llvm-dwarfdump result:
   0x00000033:   DW_TAG_typedef
                   DW_AT_type      (0x00000048 "unsigned int *")
                   DW_AT_name      ("__u")
                   DW_AT_decl_file ("/home/yhs/work/tests/llvm/btf_tag/t.c")
                   DW_AT_decl_line (1)

   0x0000003e:     DW_TAG_LLVM_annotation
                     DW_AT_name    ("btf_decl_tag")
                     DW_AT_const_value     ("tag1")

   0x00000047:     NULL

  [1] https://reviews.llvm.org/D110127

Differential Revision: https://reviews.llvm.org/D110129

f6811cec

[DAGCombiner] fold bit-hack form of usubsat · d2198771

Sanjay Patel authored Oct 21, 2021

(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

I haven't found a generalization of this identity:
https://alive2.llvm.org/ce/z/_sriEQ

Note: I was actually looking at the first form of the pattern in that link,
but that's part of a long chain of potential missed transforms in codegen
and IR....that I hope ends here!

The predicates for when this is profitable are a bit tricky. This version of
the patch excludes multi-use but includes custom lowering (as opposed to
legal only).

On x86 for example, we have custom lowering for some vector types, and that
uses umax and sub. So to enable that fold, we need add use checks to avoid
regressions. Even with legal-only lowering, we could see code with extra
reg move instructions for extra uses, so that constraint would have to be
eased very carefully to avoid penalties.

Differential Revision: https://reviews.llvm.org/D112085

d2198771

[SVE] Fix selection failure when splitting extended masked loads · 0d153df6

Kerry McLaughlin authored Oct 21, 2021

When splitting a masked load, `GetDependentSplitDestVTs` is used to get the
MemVTs of the high and low parts. If the masked load is extended, this
may return VTs with different element types which are used to create the
high & low masked load instructions.
This patch changes `GetDependentSplitDestVTs` to ensure we return VTs with
the same element type.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111996

0d153df6

[SelectionDAG] Bail out of mergeTruncStores when not optimizing · 6ea7437c

Arthur Eubanks authored Oct 11, 2021

With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores.

Fixes PR51827.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D111596

6ea7437c

Oct 20, 2021

[AArch64][GlobalISel] combine (and (or x, c1), c2) => (and x, c2) iff c1 & c2 == 0 · b046eb19
Jon Roelofs authored Oct 14, 2021
```
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111856
```
b046eb19

[AMDGPU] MachineLICM cannot hoist VALU · c80d8a8c

Stanislav Mekhanoshin authored Jul 23, 2021

MachineLoop::isLoopInvariant() returns false for all VALU
because of the exec use. Check TII::isIgnorableUse() to
allow hoisting.

That unfortunately results in higher register consumption
since MachineLICM does not adequately estimate pressure.
Therefor I think it shall only be enabled after D107677 even
though it does not depend on it.

Differential Revision: https://reviews.llvm.org/D107859

c80d8a8c

[IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol · 08ed2160

Itay Bookstein authored Oct 20, 2021

As discussed in:
* https://reviews.llvm.org/D94166
* https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html

The GlobalIndirectSymbol class lost most of its meaning in
https://reviews.llvm.org/D109792, which disambiguated getBaseObject
(now getAliaseeObject) between GlobalIFunc and everything else.
In addition, as long as GlobalIFunc is not a GlobalObject and
getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee
is a GlobalIFunc cannot currently be modeled properly. Creating
aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition,
calling getAliaseeObject on a GlobalIFunc will currently return nullptr,
which is undesirable because it should return the object itself for
non-aliases.

This patch refactors the GlobalIFunc class to inherit directly from
GlobalObject, and removes GlobalIndirectSymbol (while inlining the
relevant parts into GlobalAlias and GlobalIFunc). This allows for
calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc
itself, making getAliaseeObject() more consistent and enabling
alias-to-ifunc to be properly modeled in the IR.

I exercised some judgement in the API clients of GlobalIndirectSymbol:
some were 'monomorphized' for GlobalAlias and GlobalIFunc, and
some remained shared (with the type adapted to become GlobalValue).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108872

08ed2160

[CodeGenPrepare] Avoid a scalable-vector crash in ctlz/cttz · eabf11f9

Fraser Cormack authored Oct 20, 2021

This patch fixes a crash when despeculating ctlz/cttz intrinsics with
scalable-vector types. It is not safe to speculatively get the size of
the vector type in bits in case the vector type is not a fixed-length type. As
it happens this isn't required as vector types are skipped anyway.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112141

eabf11f9

[RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on... · fe1f0de0

Craig Topper authored Oct 19, 2021

[RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on CTPOP expansion for vectors.

Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP
isn't legal or custom for a vector type we would scalarize the
CTLZ/CTTZ. This is different than CTPOP itself which would use a
vector expansion.

This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP
expansion instead of scalarizing. To do this I had to add additional
checks to make sure the operations used by CTPOP expansions are all
supported. Some of the operations were already needed for the CTLZ/CTTZ
expansion.

This is a huge improvement to the RISCV which doesn't have a scalar
ctlz or cttz in the base ISA.

For WebAssembly, I've added Custom lowering to keep the scalarizing
behavior. I've also extended the scalarizing to CTPOP.

Differential Revision: https://reviews.llvm.org/D111919

fe1f0de0

[DebugInfo][InstrRef] Track a single variable at a time · 89950ade

Jeremy Morse authored Oct 20, 2021

Here's another performance patch for InstrRefBasedLDV: rather than
processing all variable values in a scope at a time, instead, process one
variable at a time. The benefits are twofold:
 * It's easier to reason about one variable at a time in your mind,
 * It improves performance, apparently from increased locality.

The downside is that the value-propagation code gets indented one level
further, plus there's some churn in the unit tests.

Differential Revision: https://reviews.llvm.org/D111799

89950ade

[SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors. · be6c8dc7

Sander de Smalen authored Oct 12, 2021

When inserting a scalable subvector into a scalable vector through
the stack, the index to store to needs to be scaled by vscale.
Before this patch, that didn't yet happen, so it would generate the
wrong offset, thus storing a subvector to the incorrect address
and overwriting the wrong lanes.

For some insert:
  nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2)

The offset was not scaled by vscale:
  orr     x8, x8, #0x4
  st1h    { z0.h }, p0, [sp]
  st1h    { z1.d }, p1, [x8]
  ld1h    { z0.h }, p0/z, [sp]

And is changed to:
  mov x8, sp
  st1h { z0.h }, p0, [sp]
  st1h { z1.d }, p1, [x8, #1, mul vl]
  ld1h { z0.h }, p0/z, [sp]

Differential Revision: https://reviews.llvm.org/D111633

be6c8dc7

Oct 19, 2021

[ADT] Add APInt::isNegatedPowerOf2() helper · 71e39e3f

Simon Pilgrim authored Oct 19, 2021

Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....

Differential Revision: https://reviews.llvm.org/D111998

71e39e3f

[DebugInfo][InstrRef] Avoid un-necessary densemap copies and comparisons · 849b1794

Jeremy Morse authored Oct 19, 2021

This is purely a performance patch: InstrRefBasedLDV used to use three
DenseMaps to store variable values, two for long term storage and one as a
working set. This patch eliminates the working set, and updates the long
term storage in place, thus avoiding two DenseMap comparisons and two
DenseMap assignments, which can be expensive.

Differential Revision: https://reviews.llvm.org/D111716

849b1794

[DebugInfo][NFC] Zero-initialize a class field · cf033bb2

Jeremy Morse authored Oct 19, 2021

This field gets assigned when the relevant object starts being used; but it
remains uninitialized beforehand. This risks introducing hard-to-detect
bugs if something changes, so zero-initialize the field.

cf033bb2

Oct 18, 2021

[DebugInfo][ARM] Fix incorrect debug information for RWPI accessed globals · 04dc6871

Alexandros Lamprineas authored Oct 18, 2021

When compiling for the RWPI relocation model the debug information is wrong:

* the debug location is described as { DW_OP_addr Var }
  instead of { DW_OP_constNu Var DW_OP_bregX 0 DW_OP_plus }
* the relocation type is R_ARM_ABS32 instead of R_ARM_SBREL32

Differential Revision: https://reviews.llvm.org/D111404

04dc6871

[ExpandMemCmp] Update CFG before DTU · 54d86899

Nikita Popov authored Oct 18, 2021

The applyUpdates() API requires that the CFG is already updated,
so make sure to insert the new terminator first.

54d86899

[AArch64][GlobalISel] combine and + [la]sr => ubfx · 1300677f

Jon Roelofs authored Oct 14, 2021

https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111839

1300677f

Use llvm::erase_if (NFC) · 8568ca78
Kazu Hirata authored Oct 18, 2021

8568ca78

[Analysis] add utility function for unary shuffle mask creation · 2a3cc4d4

Sanjay Patel authored Oct 18, 2021

This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )

Differential Revision: https://reviews.llvm.org/D111891

2a3cc4d4

Fix signed/unsigned comparison after · ea970661

Jeremy Morse authored Oct 18, 2021

gcc11 warns that this counter causes a signed/unsigned comaprison when it's
later compared with a SmallVector::difference_type. gcc appears to be
correct, clang does not warn one way or the other.

ea970661