Commits · 4136897bd4541f5c5e9677e297dabcfe9e367010 · Lorenzo Albano / LLVM bpEVL

Oct 25, 2021

[DebugInfo][InstrRef][NFC] Switch to using DenseMaps and similar · 4136897b

Jeremy Morse authored Oct 25, 2021

There are a few STL containers hanging around that can become DenseMaps,
SmallVectors and similar. This recovers a modest amount of compile time
performance.

While I'm here, adjust the bit layout of ValueIDNum: this was always
supposed to act like a value type, however it seems that clang doesn't
compile the comparison functions to act that way. Add a uint64_t to a
union that explicitly aliases the bitfields, so that we can compare the
whole value as a single integer.

Differential Revision: https://reviews.llvm.org/D112333

4136897b

[WebAssembly] support Memory64 in target_features section · 5694dbcc
Wouter van Oortmerssen authored Oct 21, 2021
```
Differential Revision: https://reviews.llvm.org/D112266
```
5694dbcc

[DebugInfo][InstrRef] Recover stack-slot tracking performance · 97ddf49e

Jeremy Morse authored Oct 25, 2021

This patch is like D111627 -- instead of calculating IDF for every location
on the stack, only do it for the smallest units of interference, and copy
the PHIs for those units to any aliases.

The test added runs placeMLocPHIs directly, and tests that:
 * A def of the lower 8 bits of a stack slot causes all aliasing regs to
   have PHIs placed,
 * It doesn't cause the equivalent location to x86's $ah, which isn't
   aliased, to have a PHI placed.

Differential Revision: https://reviews.llvm.org/D112324

97ddf49e

[indvars] Fix pr52276 (missing one use check) · f82cf618

Philip Reames authored Oct 25, 2021

The recently added logic to canonicalize exit conditions to unsigned relies on facts which hold about the use (i.e. exit test).  Applying this blindly to the icmp is not legal, as there may be another use which never reaches the exit.  Restrict ourselves to case where we have a single use.

f82cf618

[RISCV] Reduce the number of RISCV vector builtins by an order of magnitude. · e2b7aabb

Craig Topper authored Oct 25, 2021

All but 2 of the vector builtins are only used by clang_builtin_alias.
When using clang_builtin_alias, the type string of the builtin is never
checked. Only the types in the function definition used for the alias
are checked.

This patch takes advantage of this to share a single builtin for
many different types. We already used type overloads on the IR intrinsic
so the codegen for the builtins that are being merge were already
the same. This extends the type overloading to the builtins.

I had to make a few tweaks to make this work.
-Floating point vector-vector vmerge now uses the vmerge intrinsic
 instead of the vfmerge intrinsic. New isel patterns and tests are
 added to support this.
-The SemaChecking for the immediate of vset_v/vget_v has been removed.
 Determining the valid range is harder now. I've added masking to
 ManualCodegen to ensure valid IR for invalid input.

This reduces the number of builtins from ~25000 to ~1100.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D112102

e2b7aabb

[RISCV] Add vcsr CSR name for V extension. · 210b586a

Craig Topper authored Oct 25, 2021

Reviewed By: frasercrmck, kito-cheng

Differential Revision: https://reviews.llvm.org/D112342

210b586a

[CodeGen] Fix dependence breaking for tied operands · 7b102fcc
Danila Malyutin authored Aug 05, 2021
```
Differential Revision: https://reviews.llvm.org/D107582
```
7b102fcc

[DebugInfo][InstrRef] Track values fused into stack spills · ee3eee71

Jeremy Morse authored Oct 25, 2021

During register allocation, some instructions can have stack spills fused
into them. It means that when vregs are allocated on the stack we can
convert:

    SETCCr %0
    DBG_VALUE %0

to

    SETCCm %stack.0
    DBG_VALUE %stack.0

Unfortunately instruction referencing finds this harder: a store to the
stack doesn't have a specific operand number, therefore we don't substitute
the old operand for a new operand, and the location is dropped. This patch
implements a solution: just recognise the memory operand attached to an
instruction with a Special Number (TM), and record a substitution between
the old value and the new one.

This patch adds substitution code to InlineSpiller to record such fused
spills, and tracking in InstrRefBasedLDV to recognise such values, and
produce the value numbers for them. Everything to do with the movement of
stack-defined values is already handled in InstrRefBasedLDV.

Differential Revision: https://reviews.llvm.org/D111317

ee3eee71

[AArch64] Handle ST1iN instructions in isAArch64FrameOffsetLegal · 2d9ee590

Danila Malyutin authored Oct 22, 2021

Before the code would crash with "unhandled opcode in
isAArch64FrameOffsetLegal" when there was a spill from extractelement.
Fixes pr52249

Differential Revision: https://reviews.llvm.org/D112311

2d9ee590

[BasicAA] Use ranges for more than one index · 0d20ebf6

Nikita Popov authored Oct 14, 2021

D109746 made BasicAA use range information to determine the
minimum/maximum GEP offset. However, it was limited to the case of
a single variable index. This patch extends support to multiple
indices by adding all the ranges together.

Differential Revision: https://reviews.llvm.org/D112378

0d20ebf6

[SLP]Change the order of the reduction/binops args pair vectorization attempts. · eb9b75dd

Alexey Bataev authored Oct 21, 2021

Need to change the order of the reduction/binops args pair vectorization
attempts. Need to try to find the reduction at first and postpone
vectorization of binops args. This may help to find more reduction
patterns and vectorize them.
Part of D111574.

Differential Revision: https://reviews.llvm.org/D112224

eb9b75dd

[DebugInfo][NFC] Avoid a use-after-free · 2eb96e17

Jeremy Morse authored Oct 25, 2021

This patch swaps two lines -- the CurSucc reference can be invalidated
by the call to DFS.push_back, therefore that should happen last. The
usual hat-tip to asan for catching this.

This patch also swaps an ealier call to ToAdd.insert and DFS.push_back,
where a stable iterator (from successors()) is being used. This isn't
strictly necessary, but is good for consistency and avoiding readers
asking themselves why the two code portions have a different order.

2eb96e17

[DAGCombiner] make matching bit-hack form of usubsat more flexible · 6e46b66e

Sanjay Patel authored Oct 25, 2021

(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

As suggested in D112085, we can substitute 'xor' with 'add'
in this pattern, and it is logically equivalent:
https://alive2.llvm.org/ce/z/eJtWWC

We canonicalize to 'xor' in IR, but SDAG does not do that
(and it probably should not - https://llvm.org/PR52267 ), so
it is possible to see either pattern in codegen. Note that
'sub' is a another potential pattern, but that is
canonicalized to 'add' in DAGCombiner, so we don't need to
worry about that variation.

Differential Revision: https://reviews.llvm.org/D112377

6e46b66e

CodeGenPrep: remove all copies of GEP from list if there are duplicates. · f9089acc

Tim Northover authored Oct 25, 2021

Unfortunately ToT has changed enough from the revision where this actually
caused problems that the test no longer triggers an assertion failure.

f9089acc

[SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrt · 1f49b71f

Kerry McLaughlin authored Oct 25, 2021

This patch enables the use of reciprocal estimates for SVE
when both the -Ofast and -mrecip flags are used.

Reviewed By: david-arm, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D111657

1f49b71f

[SimplifyCFG] Sanity assert in iterativelySimplifyCFG · a9b0776a

Max Kazantsev authored Oct 25, 2021

We observe a hang within iterativelySimplifyCFG due to infinite
loop execution. Currently, there is no limit to this loop, so
in case of bug it just works forever. This patch adds an assert
that will break it after 1000 iterations if it didn't converge.

a9b0776a

[InstSimplify] Refactor invariant.group load folding · 75384ecd

Nikita Popov authored Oct 23, 2021

Currently strip.invariant/launder.invariant are handled by
constructing constant expressions with the intrinsics skipped.
This takes an alternative approach of accumulating the offset
using stripAndAccumulateConstantOffsets(), with a flag to look
through invariant.group intrinsics.

Differential Revision: https://reviews.llvm.org/D112382

75384ecd

[VPlan] Do not create dummy entry block (NFC). · a6c4969f

Florian Hahn authored Oct 25, 2021

At the moment a dummy entry block is created at the beginning of VPlan
construction. This dummy block is later removed again.

This means it is not easy to identify the VPlan header block in a
general fashion, because during recipe creation it is the single
successor of the entry block, while later it is the entry block.

To make getting the header easier, just skip creating the dummy block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D111299

a6c4969f

[AArch64] Remove redundant ORRWrs which is generated by zero-extend · a5024362

Jingu Kang authored Sep 30, 2021

%3:gpr32 = ORRWrs $wzr, %2, 0
%4:gpr64 = SUBREG_TO_REG 0, %3, %subreg.sub_32

If AArch64's 32-bit form of instruction defines the source operand of ORRWrs,
we can remove the ORRWrs because the upper 32 bits of the source operand are
set to zero.

Differential Revision: https://reviews.llvm.org/D110841

a5024362

[SCEVExpander] Minor cleanup in value reuse (NFC) · 477551fd

Nikita Popov authored Oct 25, 2021

Use dyn_cast_or_null and convert one of the checks into an
assertion. SCEV is a per-function analysis.

477551fd

[SCEV] Fix a warning on an unused lambda capture · 3729a5ab

Kazu Hirata authored Oct 25, 2021

This patch fixes:

  llvm/lib/Analysis/ScalarEvolution.cpp:12770:37: error: lambda
  capture 'this' is not used [-Werror,-Wunused-lambda-capture]

3729a5ab

[SCEV][NFC] Win some compile time from mass forgetMemoizedResults · f8623b07

Max Kazantsev authored Oct 25, 2021

Mass forgetMemoizedResults can be done more efficiently than bunch
of individual invocations of helper because we can traverse maps being
updated just once, rather than doing this for each invidivual SCEV.

Should be NFC and supposedly improves compile time.

Differential Revision: https://reviews.llvm.org/D112294
Reviewed By: reames

f8623b07

[SCEV][NFC] Apply mass forgetMemoizedResults queries where possible · dbab339e

Max Kazantsev authored Oct 25, 2021

When forgetting multiple SCEVs, rather than doing this one by one, we can
instead use mass updates. We plan to make them more efficient than they
are now, potentially improving compile time.

Differential Revision: https://reviews.llvm.org/D111602
Reviewed By: reames

dbab339e

[SCEV][NFC] Introduce API for mass forgetMemoizedResults query · a6096b7f

Max Kazantsev authored Oct 25, 2021

This patch changes signature of forgetMemoizedResults to be able to work with
multiple SCEVs. Usage will come in follow-ups. We also plan to optimize it in the
future to work faster than individual invalidation updates. Should not change
behavior in any sense.

Split-off from D111602.

Differential Revision: https://reviews.llvm.org/D112293
Reviewed By: reames

a6096b7f

[NFC][SCEV] Do not track users of SCEVConstants · 1c18ebb2

Max Kazantsev authored Oct 25, 2021

Follow-up from D112295, suggested by Nikita: we can avoid tracking
users of SCEVConstants because dropping their cached info is unlikely
to give any new prospects for fact inference, and it should not introduce
any correctness problems.

1c18ebb2

[SCEV][NFC] API for tracking of SCEV users · fea4a48c

Max Kazantsev authored Oct 25, 2021

This patch introduces API that keeps track of SCEVs users of
another SCEVs, required to handle invalidations of users along
with operands that comes in follow-up patches.

Differential Revision: https://reviews.llvm.org/D112295
Reviewed By: reames

fea4a48c

[PowerPC] common chains to reuse offsets to reduce register pressure. · 80e6aff6

Chen Zheng authored Oct 25, 2021

Add a new preparation pattern in PPCLoopInstFormPrep pass to reduce register
pressure.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D108750

80e6aff6

[Target, Transforms] Use predecessors instead of pred_begin and pred_end (NFC) · 98007313
Kazu Hirata authored Oct 24, 2021

98007313
Use llvm::any_of and llvm::none_of (NFC) · 4bd46501
Kazu Hirata authored Oct 24, 2021

4bd46501

X86InstrInfo: Look across basic blocks in optimizeCompareInstr · 4b75d674

Matthias Braun authored Sep 24, 2021

This extends `optimizeCompareInstr` to continue the backwards search
when it reached the beginning of a basic block. If there is a single
predecessor block then we can just continue the search in that block and
mark the EFLAGS register as live-in.

Differential Revision: https://reviews.llvm.org/D110862

4b75d674

X86InstrInfo: Refactor and cleanup optimizeCompareInstr · 683994c8

Matthias Braun authored Sep 24, 2021

This changes the first part of `optimizeCompareInstr` being split into a
loop with a forward scan for cases that re-use zero flags from a
producer in case of compare with zero and a backward scan for finding an
instruction equivalent to a compare.

The code now uses a single backward scan searching for the next
instructions that reads or writes EFLAGS.

Also:
- Add comments giving examples for the 3 cases handled.
- Check `MI` which contains the result of the zero-compare cases,
  instead of re-checking `IsCmpZero`.
- Tweak coding style in some loops.
- Add new MIR based tests that test the optimization in isolation.

This also removes a check for flag readers in situations like this:
```
= SUB32rr %0, %1, implicit-def $eflags
...  we no longer stop when there are $eflag users here
CMP32rr %0, %1   ; will be removed
...
```

Differential Revision: https://reviews.llvm.org/D110857

683994c8

Oct 24, 2021

Treat branch on poison as immediate UB (under an off by default flag) · a461fa64

Philip Reames authored Oct 24, 2021

The LangRef clearly states that branching on a undef or poison value is immediate undefined behavior, but historically, we have not been consistent about implementing that interpretation in the optimizer. Historically, we used (in some cases) a more relaxed model which essentially looked for provable UB along both paths which was control dependent on the condition. However, we've never been 100% consistent here. For instance SCEV uses the strong model for increments which form AddRecs (and only addrecs).

At the moment, the last big blocker for finally making this switch is enabling the fix landed in D106041. Loop unswitching (in it's classic form) is incorrect as it creates many "branch on poisons" when unswitching conditions originally unreachable within the loop.

This change adds a flag to value tracking which allows to easily test the optimization potential of treating branch on poison as immediate UB. It's intended to help ease work on getting us finally through this transition and avoid multiple independent rediscovers of the same issues.

Differential Revision: https://reviews.llvm.org/D112026

a461fa64

[instcombine] Fix oss-fuzz 39934 (mul matcher can match non-instruction) · 3c06ecaa

Philip Reames authored Oct 23, 2021

Fixes a crash observed by oss-fuzz in 39934.  Issue at hand is that code expects a pattern match on m_Mul to imply the operand is a mul instruction, however mul constexprs are also valid here.

3c06ecaa

[ARC] Fix -Wunused-variable. NFC · 54405a49
Fangrui Song authored Oct 24, 2021

54405a49
[llvm] Call *(Set|Map)::erase directly (NFC) · 1c35973c
Kazu Hirata authored Oct 24, 2021
```
We can erase an item in a set or map without checking its membership
first.
```
1c35973c

Oct 23, 2021
- [X86] findEltLoadSrc - fix shift amount variable name. NFCI. · b09f2ee5
  Simon Pilgrim authored Oct 23, 2021
```
Fix the copy + paste, renaming shift amt from Idx to Amt
```
  b09f2ee5
- [InstSimplify] Simplify fetching of index size (NFC) · 0c7f85d7
  Nikita Popov authored Oct 23, 2021
```
Directly fetch the size instead of going through the index type
first.
```
  0c7f85d7
- [ConstantFolding] Accept offset in ConstantFoldLoadFromConstPtr (NFCI) · 710596a1
  Nikita Popov authored Oct 23, 2021
```
As this API is now internally offset-based, we can accept a starting
offset and remove the need to create a temporary bitcast+gep
sequence to perform an offset load. The API now mirrors the
ConstantFoldLoadFromConst() API.
```
  710596a1
- Ensure newlines at the end of files (NFC) · d8e4170b
  Kazu Hirata authored Oct 23, 2021
  
  d8e4170b
- [llvm] Use StringRef::contains (NFC) · d14d7068
  Kazu Hirata authored Oct 23, 2021
  
  d14d7068