Commits · 04c184bba7d7f3827dd12cdbdd734f8aabb99e86 · Lorenzo Albano / LLVM bpEVL

Oct 22, 2021

[TargetLowering] Simplify the interface of expandABS. NFC · 04c184bb

Craig Topper authored Oct 22, 2021

Instead of returning a bool to indicate success and a separate
SDValue, return the SDValue and have the callers check if it is
null.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112331

04c184bb

[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if... · 0766aef3

Craig Topper authored Oct 22, 2021

[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later.

Expanding these requires multiple constants. If we promote during type
legalization when they'll end up getting expanded in LegalizeDAG, we'll
use larger constants. These constants may be harder to materialize.
For example, 64-bit constants on 64-bit RISCV are very expensive.

This is similar to what has already been done to BSWAP and BITREVERSE.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112268

0766aef3

[clang/llvm] Inclusive language: replace segregate with separate · 0bd6a9f2
Zarko Todorovski authored Oct 22, 2021

0bd6a9f2

[TargetLowering] Simplify the interface for expandCTPOP/expandCTLZ/expandCTTZ. · 996123e5

Craig Topper authored Oct 21, 2021

There is no need to return a bool and have an SDValue output
parameter. Just return the SDValue and let the caller check if it
is null.

I have another patch to add more callers of these so I thought
I'd clean up the interface first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112267

996123e5

[LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG. · ff37b110

Craig Topper authored Oct 21, 2021

By expanding early it allows the shifts to be custom lowered in
LegalizeVectorOps. Then a DAG combine is able to run on them before
LegalizeDAG handles the BUILD_VECTORS for the masks used.

v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16
shift. The BITREVERSE expansion applied an AND mask before SHL ops and
after SRL ops. This was done to share the same mask constant for both shifts.
It looks like this patch allows DAG combine to remove the AND mask added
after v16i8 SHL by X86 lowering. This maintains the mask sharing that
BITREVERSE was trying to achieve. Prior to this patch it looks like
we kept the mask after the SHL instead which required an extra constant
pool or a PANDN to invert it.

This is dependent on D112248 because RISCV will end up scalarizing the BSWAP
portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in
LegalizeVectorOps first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112254

ff37b110

Oct 21, 2021

[TargetLowering][RISCV] Prevent scalarization of fixed vector bswap. · 458ed5fc

Craig Topper authored Oct 21, 2021

It's better to do the ands, shifts, ors in the vector domain than
to scalarize it and do those operations on each element.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112248

458ed5fc

[DebugInfo] Support typedef with btf_decl_tag attributes · f6811cec

Yonghong Song authored Sep 20, 2021

Clang patch ([1]) added support for btf_decl_tag attributes with typedef
types. This patch added llvm support including dwarf generation.
For example, for typedef
   typedef unsigned * __u __attribute__((btf_decl_tag("tag1")));
   __u u;
the following shows llvm-dwarfdump result:
   0x00000033:   DW_TAG_typedef
                   DW_AT_type      (0x00000048 "unsigned int *")
                   DW_AT_name      ("__u")
                   DW_AT_decl_file ("/home/yhs/work/tests/llvm/btf_tag/t.c")
                   DW_AT_decl_line (1)

   0x0000003e:     DW_TAG_LLVM_annotation
                     DW_AT_name    ("btf_decl_tag")
                     DW_AT_const_value     ("tag1")

   0x00000047:     NULL

  [1] https://reviews.llvm.org/D110127

Differential Revision: https://reviews.llvm.org/D110129

f6811cec

[DAGCombiner] fold bit-hack form of usubsat · d2198771

Sanjay Patel authored Oct 21, 2021

(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

I haven't found a generalization of this identity:
https://alive2.llvm.org/ce/z/_sriEQ

Note: I was actually looking at the first form of the pattern in that link,
but that's part of a long chain of potential missed transforms in codegen
and IR....that I hope ends here!

The predicates for when this is profitable are a bit tricky. This version of
the patch excludes multi-use but includes custom lowering (as opposed to
legal only).

On x86 for example, we have custom lowering for some vector types, and that
uses umax and sub. So to enable that fold, we need add use checks to avoid
regressions. Even with legal-only lowering, we could see code with extra
reg move instructions for extra uses, so that constraint would have to be
eased very carefully to avoid penalties.

Differential Revision: https://reviews.llvm.org/D112085

d2198771

[SVE] Fix selection failure when splitting extended masked loads · 0d153df6

Kerry McLaughlin authored Oct 21, 2021

When splitting a masked load, `GetDependentSplitDestVTs` is used to get the
MemVTs of the high and low parts. If the masked load is extended, this
may return VTs with different element types which are used to create the
high & low masked load instructions.
This patch changes `GetDependentSplitDestVTs` to ensure we return VTs with
the same element type.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111996

0d153df6

[SelectionDAG] Bail out of mergeTruncStores when not optimizing · 6ea7437c

Arthur Eubanks authored Oct 11, 2021

With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores.

Fixes PR51827.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D111596

6ea7437c

Oct 20, 2021

[AArch64][GlobalISel] combine (and (or x, c1), c2) => (and x, c2) iff c1 & c2 == 0 · b046eb19
Jon Roelofs authored Oct 14, 2021
```
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111856
```
b046eb19

[AMDGPU] MachineLICM cannot hoist VALU · c80d8a8c

Stanislav Mekhanoshin authored Jul 23, 2021

MachineLoop::isLoopInvariant() returns false for all VALU
because of the exec use. Check TII::isIgnorableUse() to
allow hoisting.

That unfortunately results in higher register consumption
since MachineLICM does not adequately estimate pressure.
Therefor I think it shall only be enabled after D107677 even
though it does not depend on it.

Differential Revision: https://reviews.llvm.org/D107859

c80d8a8c

[IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol · 08ed2160

Itay Bookstein authored Oct 20, 2021

As discussed in:
* https://reviews.llvm.org/D94166
* https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html

The GlobalIndirectSymbol class lost most of its meaning in
https://reviews.llvm.org/D109792, which disambiguated getBaseObject
(now getAliaseeObject) between GlobalIFunc and everything else.
In addition, as long as GlobalIFunc is not a GlobalObject and
getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee
is a GlobalIFunc cannot currently be modeled properly. Creating
aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition,
calling getAliaseeObject on a GlobalIFunc will currently return nullptr,
which is undesirable because it should return the object itself for
non-aliases.

This patch refactors the GlobalIFunc class to inherit directly from
GlobalObject, and removes GlobalIndirectSymbol (while inlining the
relevant parts into GlobalAlias and GlobalIFunc). This allows for
calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc
itself, making getAliaseeObject() more consistent and enabling
alias-to-ifunc to be properly modeled in the IR.

I exercised some judgement in the API clients of GlobalIndirectSymbol:
some were 'monomorphized' for GlobalAlias and GlobalIFunc, and
some remained shared (with the type adapted to become GlobalValue).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108872

08ed2160

[CodeGenPrepare] Avoid a scalable-vector crash in ctlz/cttz · eabf11f9

Fraser Cormack authored Oct 20, 2021

This patch fixes a crash when despeculating ctlz/cttz intrinsics with
scalable-vector types. It is not safe to speculatively get the size of
the vector type in bits in case the vector type is not a fixed-length type. As
it happens this isn't required as vector types are skipped anyway.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112141

eabf11f9

[RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on... · fe1f0de0

Craig Topper authored Oct 19, 2021

[RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on CTPOP expansion for vectors.

Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP
isn't legal or custom for a vector type we would scalarize the
CTLZ/CTTZ. This is different than CTPOP itself which would use a
vector expansion.

This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP
expansion instead of scalarizing. To do this I had to add additional
checks to make sure the operations used by CTPOP expansions are all
supported. Some of the operations were already needed for the CTLZ/CTTZ
expansion.

This is a huge improvement to the RISCV which doesn't have a scalar
ctlz or cttz in the base ISA.

For WebAssembly, I've added Custom lowering to keep the scalarizing
behavior. I've also extended the scalarizing to CTPOP.

Differential Revision: https://reviews.llvm.org/D111919

fe1f0de0

[DebugInfo][InstrRef] Track a single variable at a time · 89950ade

Jeremy Morse authored Oct 20, 2021

Here's another performance patch for InstrRefBasedLDV: rather than
processing all variable values in a scope at a time, instead, process one
variable at a time. The benefits are twofold:
 * It's easier to reason about one variable at a time in your mind,
 * It improves performance, apparently from increased locality.

The downside is that the value-propagation code gets indented one level
further, plus there's some churn in the unit tests.

Differential Revision: https://reviews.llvm.org/D111799

89950ade

[SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors. · be6c8dc7

Sander de Smalen authored Oct 12, 2021

When inserting a scalable subvector into a scalable vector through
the stack, the index to store to needs to be scaled by vscale.
Before this patch, that didn't yet happen, so it would generate the
wrong offset, thus storing a subvector to the incorrect address
and overwriting the wrong lanes.

For some insert:
  nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2)

The offset was not scaled by vscale:
  orr     x8, x8, #0x4
  st1h    { z0.h }, p0, [sp]
  st1h    { z1.d }, p1, [x8]
  ld1h    { z0.h }, p0/z, [sp]

And is changed to:
  mov x8, sp
  st1h { z0.h }, p0, [sp]
  st1h { z1.d }, p1, [x8, #1, mul vl]
  ld1h { z0.h }, p0/z, [sp]

Differential Revision: https://reviews.llvm.org/D111633

be6c8dc7

Oct 19, 2021

[ADT] Add APInt::isNegatedPowerOf2() helper · 71e39e3f

Simon Pilgrim authored Oct 19, 2021

Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....

Differential Revision: https://reviews.llvm.org/D111998

71e39e3f

[DebugInfo][InstrRef] Avoid un-necessary densemap copies and comparisons · 849b1794

Jeremy Morse authored Oct 19, 2021

This is purely a performance patch: InstrRefBasedLDV used to use three
DenseMaps to store variable values, two for long term storage and one as a
working set. This patch eliminates the working set, and updates the long
term storage in place, thus avoiding two DenseMap comparisons and two
DenseMap assignments, which can be expensive.

Differential Revision: https://reviews.llvm.org/D111716

849b1794

[DebugInfo][NFC] Zero-initialize a class field · cf033bb2

Jeremy Morse authored Oct 19, 2021

This field gets assigned when the relevant object starts being used; but it
remains uninitialized beforehand. This risks introducing hard-to-detect
bugs if something changes, so zero-initialize the field.

cf033bb2

Oct 18, 2021

[DebugInfo][ARM] Fix incorrect debug information for RWPI accessed globals · 04dc6871

Alexandros Lamprineas authored Oct 18, 2021

When compiling for the RWPI relocation model the debug information is wrong:

* the debug location is described as { DW_OP_addr Var }
  instead of { DW_OP_constNu Var DW_OP_bregX 0 DW_OP_plus }
* the relocation type is R_ARM_ABS32 instead of R_ARM_SBREL32

Differential Revision: https://reviews.llvm.org/D111404

04dc6871

[ExpandMemCmp] Update CFG before DTU · 54d86899

Nikita Popov authored Oct 18, 2021

The applyUpdates() API requires that the CFG is already updated,
so make sure to insert the new terminator first.

54d86899

[AArch64][GlobalISel] combine and + [la]sr => ubfx · 1300677f

Jon Roelofs authored Oct 14, 2021

https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111839

1300677f

Use llvm::erase_if (NFC) · 8568ca78
Kazu Hirata authored Oct 18, 2021

8568ca78

[Analysis] add utility function for unary shuffle mask creation · 2a3cc4d4

Sanjay Patel authored Oct 18, 2021

This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )

Differential Revision: https://reviews.llvm.org/D111891

2a3cc4d4

Fix signed/unsigned comparison after · ea970661

Jeremy Morse authored Oct 18, 2021

gcc11 warns that this counter causes a signed/unsigned comaprison when it's
later compared with a SmallVector::difference_type. gcc appears to be
correct, clang does not warn one way or the other.

ea970661

Remove the verifyAfter mechanism that was replaced by D111397 · 012248b0
Jay Foad authored Oct 15, 2021
```
Differential Revision: https://reviews.llvm.org/D111872
```
012248b0

Add new MachineFunction property FailsVerification · 36deb9a6

Jay Foad authored Oct 15, 2021

TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets
you skip machine verification after a particular pass. Unfortunately
this is used in generic code in TargetPassConfig itself to skip
verification after a generic pass, only because some previous target-
specific pass damaged the MIR on that specific target. This is bad
because problems in one target cause lack of verification for all
targets.

This patch replaces that mechanism with a new MachineFunction property
called "FailsVerification" which can be set by (usually target-specific)
passes that are known to introduce problems. Later passes can reset it
again if they are known to clean up the previous problems.

Differential Revision: https://reviews.llvm.org/D111397

36deb9a6

[SelectionDAG] Fix illegal widening of scalable-vector loads · 3d850d03

Fraser Cormack authored Oct 15, 2021

The process of widening simple vector loads attempts to use a load of a
wider vector type if the original load is sufficiently aligned to avoid
memory faults.

However this optimization is only legal when performed on fixed-length
vector types. For scalable vector types this is invalid (unless vscale
happens to be 1).

This patch does increase the likelihood of compiler crashes (from
`FindMemType` failing to find a suitable type) but this now better
matches how widening non-simple loads, insufficiently-aligned loads, and
scalable-vector stores are handled.

Patches will be introduced later by which loads and stores can be
widened on targets with support for masked or predicated operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111885

3d850d03

[MachineSink] Compile time improvement for large testcases which has many kill flags · f383c533

Bing1 Yu authored Oct 18, 2021

We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags.
Before:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags(ms):1.607509e+05

After:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags:3.987371e+03

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D111688

f383c533

Oct 15, 2021

[SelectionDAG] Fix typo in option help · cfd155c4
Mingming Liu authored Oct 15, 2021
```
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D111867
```
cfd155c4

[DebugInfo] retainedTypes should not have subprograms · aa80034a

Ellis Hoag authored Oct 15, 2021

After D80369, the retainedTypes in CU's should not have any subprograms
so we should not handle that case when emitting debug info.

Differential Revision: https://reviews.llvm.org/D111593

aa80034a

[X86] Enable promotion of i16 popcnt (PR52056) · 6678db00

Dávid Bolvanský authored Oct 15, 2021

Solves https://bugs.llvm.org/show_bug.cgi?id=52056

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111507

6678db00

[llvm] Use llvm::is_contained (NFC) · 81e9c906
Kazu Hirata authored Oct 14, 2021

81e9c906

Oct 14, 2021

[DebugInfo][InstrRef] Place variable-values PHI using LLVM utilities · b5426ced

Jeremy Morse authored Oct 14, 2021

This patch is very similar to D110173 / a3936a6c, but for variable
values rather than machine values. This is for the second instr-ref
problem, calculating the correct variable value on entry to each block.
The previous lattice based implementation was broken; we now use LLVMs
existing PHI placement utilities to work out where values need to merge,
then eliminate un-necessary ones through value propagation.

Most of the deletions here happen in vlocJoin: it was trying to pick a
location for PHIs to happen in, badly, leading to an infinite loop in the
MIR test added, where it would repeatedly switch between register
locations. The new approach is simpler: either PHIs can be eliminated, or
they can't, and the location of the value is a different problem.

Various bits and pieces move to the header so that they can be tested in
the unit tests. The DbgValue class grows a "VPHI" kind to represent
variable value PHIS that haven't been eliminated yet.

Differential Revision: https://reviews.llvm.org/D110630

b5426ced

[Codegen] TargetLowering::getCanonicalIndexType - early out scaled MVT::i8 indices. NFCI. · 88487662
Simon Pilgrim authored Oct 14, 2021
```
Avoids unused assignment scan-build warning.
```
88487662

Follow up to , correctly select LiveDebugValues implementation · e3e1da20

Jeremy Morse authored Oct 14, 2021

Some functions get opted out of instruction referencing if they're being
compiled with no optimisations, however the LiveDebugValues pass picks one
implementation and then sticks with it through the rest of compilation.
This leads to a segfault if we encounter a function that doesn't use
instr-ref (because it's optnone, for example), but we've already decided
to use InstrRefBasedLDV which expects to be passed a DomTree.

Solution: keep both implementations around in the pass, and pick whichever
one is appropriate to the current function.

e3e1da20

Oct 13, 2021

[DebugInfo][InstrRef] Only calculate IDF for reg units · fbf269c7

Jeremy Morse authored Oct 13, 2021

In D110173 we start using the existing LLVM IDF calculator to place PHIs as
we reconstruct an SSA form of machine-code program. Sadly that's slower
than the old (but broken) way, this patch attempts to recover some of that
performance.

The key observation: every time we def a register, we also have to def it's
register units. If we def'd $rax, in the current implementation we
independently calculate PHI locations for {al, ah, ax, eax, hax, rax}, and
they will all have the same PHI positions. Instead of doing that, we can
calculate the PHI positions for {al, ah} and place PHIs for any aliasing
registers in the same positions. Any def of a super-register has to def
the unit, and vice versa, so this is sound. It cuts down the SSA placement
we need to do significantly.

This doesn't work for stack slots, or registers we only ever read, so place
PHIs normally for those. LiveDebugValues choses to ignore writes to SP at
calls, and now have to ignore writes to SP register units too.

Differential Revision: https://reviews.llvm.org/D111627

fbf269c7

Follow up to work around an old compiler bug · e845ca2f

Jeremy Morse authored Oct 13, 2021

Old versions of gcc want template specialisations to happen within the
namespace where the template lives; this is still present in gcc 5.1, which
we officially support, so it has to be worked around.

e845ca2f

[DebugInfo][InstrRef] Use PHI placement utilities for machine locations · a3936a6c

Jeremy Morse authored Oct 13, 2021

InstrRefBasedLDV used to try and determine which values are in which
registers using a lattice approach; however this is hard to understand, and
broken in various ways. This patch replaces that approach with a standard
SSA approach using existing LLVM utilities. PHIs are placed at dominance
frontiers; value propagation then eliminates un-necessary PHIs.

This patch also adds a bunch of unit tests that should cover many of the
weirder forms of control flow.

Differential Revision: https://reviews.llvm.org/D110173

a3936a6c