Commits · 3333244d77c44e8bb5af57027646596f7714ff62 · Lorenzo Albano / LLVM bpEVL

Jan 25, 2021

[LSR] Drop potentially invalid nowrap flags when switching to post-inc IV (PR46943) · 835104a1

Nikita Popov authored Jan 23, 2021

When LSR converts a branch on the pre-inc IV into a branch on the
post-inc IV, the nowrap flags on the addition may no longer be valid.
Previously, a poison result of the addition might have been ignored,
in which case the program was well defined. After branching on the
post-inc IV, we might be branching on poison, which is undefined behavior.

Fix this by discarding nowrap flags which are not present on the SCEV
expression. Nowrap flags on the SCEV expression are proven by SCEV
to always hold, independently of how the expression will be used.
This is essentially the same fix we applied to IndVars LFTR, which
also performs this kind of pre-inc to post-inc conversion.

I believe a similar problem can also exist for getelementptr inbounds,
but I was not able to come up with a problematic test case. The
inbounds case would have to be addressed in a differently anyway
(as SCEV does not track this property).

Fixes https://bugs.llvm.org/show_bug.cgi?id=46943.

Differential Revision: https://reviews.llvm.org/D95286

835104a1

Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV" · 925ae8c7
Richard Smith authored Jan 25, 2021
```
This reverts commit 53176c16, which
introduceed a layering violation. LLVM's IR library can't include
headers from Analysis.
```
925ae8c7

[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV · 53176c16

Akira Hatanaka authored Jan 25, 2021

or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end annotates calls with attribute "clang.arc.rv"="retain"
  or "clang.arc.rv"="claim", which indicates the call is implicitly
  followed by a marker instruction and a retainRV/claimRV call that
  consumes the call result. This is currently done only when the target
  is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the
  annotated calls in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the annotated
  calls. It doesn't remove the attribute on the call since the backend
  needs it to emit the marker instruction. The retainRV/claimRV calls
  are emitted late in the pipeline to prevent optimization passes from
  transforming the IR in a way that makes it harder for the ARC
  middle-end passes to figure out the def-use relationship between the
  call and the retainRV/claimRV calls (which is the cause of PR31925).

- The function inliner removes the autoreleaseRV call in the callee that
  returns the result if nothing in the callee prevents it from being
  paired up with the calls annotated with "clang.arc.rv"="retain/claim"
  in the caller. If the call is annotated with "claim", a release call
  is inserted since autoreleaseRV+claimRV is equivalent to a release. If
  it cannot find an autoreleaseRV call, it tries to transfer the
  attributes to a function call in the callee. This is important since
  ARC optimizer can remove the autoreleaseRV call returning the callee
  result, which makes it impossible to pair it up with the retainRV or
  claimRV call in the caller. If that fails, it simply emits a retain
  call in the IR if the call is annotated with "retain" and does nothing
  if it's annotated with "claim".

- This patch teaches dead argument elimination pass not to change the
  return type of a function if any of the calls to the function are
  annotated with attribute "clang.arc.rv". This is necessary since the
  pass can incorrectly determine nothing in the IR uses the function
  return, which can happen since the front-end no longer explicitly
  emits retainRV/claimRV calls in the IR, and change its return type to
  'void'.

Future work:

- Use the attribute on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the attributes.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808

53176c16

[VPlan] Replace uses with new value in VPInstructionsToVPRecipe (NFC). · 76afbf60

Florian Hahn authored Jan 25, 2021

Now that VPRecipeBase inherits from VPDef, we can always use the new
VPValue for replacement, if the recipe defines one. Given the recipes
that are supported at the moment, all new recipes must have either 0 or
1 defined values.

76afbf60

[GVN] do not repeat PRE on failure to split critical edge · d3681289

Nick Desaulniers authored Jan 25, 2021

Fixes an infinite loop encountered in GVN.

GVN will delay PRE if it encounters critical edges, attempt to split
them later via calls to SplitCriticalEdge(), then restart.

The caller of GVN::splitCriticalEdges() assumed a return value of true
meant that critical edges were split, that the IR had changed, and that
PRE should be re-attempted, upon which we loop infinitely.

This was exposed after D88438, by compiling the Linux kernel for s390,
but the test case is reproducible on x86.

Fixes: https://github.com/ClangBuiltLinux/linux/issues/1261

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D94996

d3681289

[SampleFDO] Report error when reading a bad/incompatible profile instead of · c9cd9a00

Wei Mi authored Jan 22, 2021

turning off SampleFDO silently.

Currently sample loader pass turns off SampleFDO optimization silently when
it sees error in reading the profile. This behavior will defeat the tests
which could have caught those bad/incompatible profile problems. This patch
change the behavior to report error.

Differential Revision: https://reviews.llvm.org/D95269

c9cd9a00

Revert "Fix unused variable in CoroFrame.cpp when building Release with GCC 10" · 17c3538a
Xun Li authored Jan 25, 2021
```
This reverts commit ff5e8964.
```
17c3538a

[VPlan] Handle scalarized values in VPTransformState. · 3201274d

Florian Hahn authored Jan 25, 2021

This patch adds plumbing to handle scalarized values directly in
VPTransformState.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D92282

3201274d

[InstCombine] narrow min/max intrinsics with extended inputs · 09a136bc

Sanjay Patel authored Jan 24, 2021

We can sink extends after min/max if they match and would
not change the sign-interpreted compare. The only combo
that doesn't work is zext+smin/smax because the zexts
could change a negative number into positive:
https://alive2.llvm.org/ce/z/D6sz6J

Sext+umax/umin works:

  define i32 @src(i8 %x, i8 %y) {
  %0:
    %sx = sext i8 %x to i32
    %sy = sext i8 %y to i32
    %m = umax i32 %sx, %sy
    ret i32 %m
  }
  =>
  define i32 @tgt(i8 %x, i8 %y) {
  %0:
    %m = umax i8 %x, %y
    %r = sext i8 %m to i32
    ret i32 %r
  }
  Transformation seems to be correct!

09a136bc

[SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost. · 171d1248

Sander de Smalen authored Jan 20, 2021

This change also changes getReductionCost to return InstructionCost,
and it simplifies two expressions by removing a redundant 'isValid' check.

171d1248

Jan 24, 2021

[Utils] Use NoAliasScopeDeclInst in a few more places (NFC) · 8b9df70b

Nikita Popov authored Jan 24, 2021

In the cloning infrastructure, only track an MDNode mapping,
without explicitly storing the Metadata mapping, same as is done
during inlining. This makes things slightly simpler.

8b9df70b

[SLP] fix fast-math requirements for fmin/fmax reductions · 77adbe6a

Sanjay Patel authored Jan 23, 2021

a6f02212 enabled intersection of FMF on reduction instructions,
so it is safe to ease the check here.

There is still some room to improve here - it looks like we
have nearly duplicate flags propagation logic inside of the
LoopUtils helper but it is limited targets that do not form
reduction intrinsics (they form the shuffle expansion).

77adbe6a

[InstCombine] Remove unused llvm.experimental.noalias.scope.decl · dcc7706f

Jeroen Dobbelaere authored Jan 24, 2021

A @llvm.experimental.noalias.scope.decl is only useful if there is !alias.scope and !noalias metadata that uses the declared scope.
When that is not the case for at least one of the two, the intrinsic call can as well be removed.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95141

dcc7706f

[LoopRotate] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed · 659c7bcd

Jeroen Dobbelaere authored Jan 24, 2021

Similar to D92887, LoopRotation also needs duplicate the noalias scopes when rotating a `@llvm.experimental.noalias.scope.decl` across a block boundary.
This is based on the version from the Full Restrict paches (D68511).

The problem it fixes also showed up in Transforms/Coroutines/ex5.ll after D93040 (when enabling strict checking with -verify-noalias-scope-decl-dom).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94306

659c7bcd

[LoopUnroll] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed · 77462964

Jeroen Dobbelaere authored Jan 24, 2021

This is a fix for https://bugs.llvm.org/show_bug.cgi?id=39282. Compared to D90104, this version is based on part of the full restrict patched (D68484) and uses the `@llvm.experimental.noalias.scope.decl` intrinsic to track the location where !noalias and !alias.scope scopes have been introduced. This allows us to only duplicate the scopes that are really needed.

Notes:
- it also includes changes and tests from D90104

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D92887

77462964

Jan 23, 2021

[NFC][SimplifyCFG] Extract... · 6f275327

Roman Lebedev authored Jan 24, 2021

[NFC][SimplifyCFG] Extract CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses() out of PerformBranchToCommonDestFolding()

To be used in PerformValueComparisonIntoPredecessorFolding()

6f275327

[NFC][SimplifyCFG] Perform early-continue in FoldValueComparisonIntoPredecessors() per-pred loop · 67f9c87a
Roman Lebedev authored Jan 23, 2021

67f9c87a

[NFC][SimplifyCFG] Extract PerformValueComparisonIntoPredecessorFolding() out... · a4e6c2e6

Roman Lebedev authored Jan 23, 2021

[NFC][SimplifyCFG] Extract PerformValueComparisonIntoPredecessorFolding() out of FoldValueComparisonIntoPredecessors()

Less nested code is much easier to follow and modify.

a4e6c2e6

[IR] Add NoAliasScopeDeclInst (NFC) · c83cff45

Nikita Popov authored Jan 23, 2021

Add an intrinsic type class to represent the
llvm.experimental.noalias.scope.decl intrinsic, to make code
working with it a bit nicer by hiding the metadata extraction
from view.

c83cff45

[llvm] Use pop_back_val (NFC) · 1238378f
Kazu Hirata authored Jan 23, 2021

1238378f

[InstCombine] Set MadeIRChange in replaceInstUsesWith. · d60b74c2

Florian Hahn authored Jan 23, 2021

Some utilities used by InstCombine, like SimplifyLibCalls, may add new
instructions and replace the uses of a call, but return nullptr because
the inserted call produces multiple results.

Previously, the replaced library calls would get removed by
InstCombine's deleter, but after
29207707 this may not happen, if the
willreturn attribute is missing.

As a work-around, update replaceInstUsesWith to set MadeIRChange, if it
replaces any uses. This catches the cases where it is used as replacer
by utilities used by InstCombine and seems useful in general; updating
uses will modify the IR.

This fixes an expensive-check failure when replacing
@__sinpif/@__cospifi with @__sincospif_sret.

d60b74c2

[SLP] fix fast-math-flag propagation on FP reductions · a6f02212

Sanjay Patel authored Jan 23, 2021

As shown in the test diffs, we could miscompile by
propagating flags that did not exist in the original
code.

The flags required for fmin/fmax reductions will be
fixed in a follow-up patch.

a6f02212

[Local] Treat calls that may not return as being alive. · 29207707

Florian Hahn authored Jan 23, 2021

With the addition of the `willreturn` attribute, functions that may
not return (e.g. due to an infinite loop) are well defined, if they are
not marked as `willreturn`.

This patch updates `wouldInstructionBeTriviallyDead` to not consider
calls that may not return as dead.

This patch still provides an escape hatch for intrinsics, which are
still assumed as willreturn unconditionally. It will be removed once
all intrinsics definitions have been reviewed and updated.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94106

29207707

[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set,... · 022da61f

Roman Lebedev authored Jan 23, 2021

[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers

If i change it to AssertingVH instead, a number of existing tests fail,
which means we don't consistently remove from the set when deleting blocks,
which means newly-created blocks may happen to appear in that set
if they happen to occupy the same memory chunk as did some block
that was in the set originally.

There are many places where we delete blocks,
and while we could probably consistently delete from LoopHeaders
when deleting a block in transforms located in SimplifyCFG.cpp itself,
transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may
delete blocks, and it doesn't seem good to teach them to deal with it.

Since we at most only ever delete from LoopHeaders,
let's just delegate to WeakVH to do that automatically.

But to be honest, personally, i'm not sure that the idea
behind LoopHeaders is sound.

022da61f

[InlineFunction] Use llvm.experimental.noalias.scope.decl for noalias arguments. · 2b9a834c

Jeroen Dobbelaere authored Jan 23, 2021

Insert a llvm.experimental.noalias.scope.decl intrinsic that identifies where a noalias argument was inlined.

This patch includes some refactorings from D90104.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93040

2b9a834c

[InstCombine] remove incompatible attribute when simplifying some lib calls · 867bdfef
Zequan Wu authored Jan 22, 2021
```
Like D95088, remove incompatible attribute in more lib calls.

Differential Revision: https://reviews.llvm.org/D95278
```
867bdfef

[LoopDeletion] Handle inner loops w/untaken backedges · ef51eed3

Philip Reames authored Jan 22, 2021

This builds on the restricted after initial revert form of D93906, and adds back support for breaking backedges of inner loops. It turns out the original invalidation logic wasn't quite right, specifically around the handling of LCSSA.

When breaking the backedge of an inner loop, we can cause blocks which were in the outer loop only because they were also included in a sub-loop to be removed from both loops. This results in the exit block set for our original parent loop changing, and thus a need for new LCSSA phi nodes.

This case happens when the inner loop has an exit block which is also an exit block of the parent, and there's a block in the child which reaches an exit to said block without also reaching an exit to the parent loop.

(I'm describing this in terms of the immediate parent, but the problem is general for any transitive parent in the nest.)

The approach implemented here involves a potentially expensive LCSSA rebuild.  Perf testing during review didn't show anything concerning, but we may end up needing to revert this if anyone encounters a practical compile time issue.

Differential Revision: https://reviews.llvm.org/D94378

ef51eed3

Jan 22, 2021

[Matrix] Propagate shape information through fneg · 0cc38acf

Francis Visoiu Mistrih authored Jan 22, 2021

Similar to binary operators like fadd/fmul/fsub, propagate shape info
through unary operators (fneg is the only one?).

Differential Revision: https://reviews.llvm.org/D95252

0cc38acf

[SimplifyCFG] FoldBranchToCommonDest(): re-lift restrictions on liveout uses of bonus instructions · 17422038

Roman Lebedev authored Jan 22, 2021

I have previously tried doing that in
b33fbbaa / d3820514,
but eventually it was pointed out that the approach taken there
was just broken wrt how the uses of bonus instructions are updated
to account for the fact that they should now use either bonus instruction
or the cloned bonus instruction. In particluar, all that manual handling
of PHI nodes in successors was just wrong.

But, the fix is actually much much simpler than my initial approach:
just tell SSAUpdate about both instances of bonus instruction,
and let it deal with all the PHI handling.

Alive2 confirms that the reproducers from the original bugs (@pr48450*)
are now handled correctly.

This effectively reverts commit 59560e85,
effectively relanding b33fbbaa.

17422038

[NFC][SimplifyCFG] PerformBranchToCommonDestFolding(): move instruction cloning to after CFG update · eae1cc0d
Roman Lebedev authored Jan 22, 2021
```
This simplifies follow-up patch, and is NFC otherwise.
```
eae1cc0d

[NFC][SimplifyCFG] PerformBranchToCommonDestFolding(): fix instruction name preservation · 9bd8bcf9

Roman Lebedev authored Jan 22, 2021

NewBonusInst just took name from BonusInst, so BonusInst has no name,
so BonusInst.getName() makes no sense.
So we need to ask NewBonusInst for the name.

9bd8bcf9

[Analysis] Support AIX vec_malloc routines · 99a0aa07

Shimin Cui authored Jan 22, 2021

This is to support the memory routines vec_malloc, vec_calloc, vec_realloc, and vec_free. These routines manage memory that is 16-byte aligned. And they are only available on AIX.

Differential Revision: https://reviews.llvm.org/D94710

99a0aa07

[SimplifyLibCalls] Skip unused calls in sincos transform · 45b259f9

Nikita Popov authored Jan 22, 2021

If the call result is unused, we should let it get DCEd rather
than replacing it. Also, don't try to replace an existing sincos
with another one (unless it's as part of combining sin and cos).

This avoids an infinite combine loop if the calls are not DCEd
as expected, which can happen with D94106 and lack of willreturn
annotation in hand-crafted IR.

45b259f9

[InstCombine] narrow abs with sign-extended input · 411c144e

Sanjay Patel authored Jan 22, 2021

In the motivating cases from https://llvm.org/PR48816 ,
we have a trailing trunc. But that is not required to
reduce the abs width:
https://alive2.llvm.org/ce/z/ECaz-p
...as long as we clear the int-min-is-poison bit (nsw).

We have some existing tests that are affected, and I'm
not sure what the overall implications are, but in general
we favor narrowing operations over preserving nsw/nuw.

If that causes problems, we could restrict this transform
based on type (shouldChangeType() and/or vector vs. scalar).

Differential Revision: https://reviews.llvm.org/D95235

411c144e

[LoopUnswitch] Fix logic to avoid unswitching with atomic loads. · 86991d32

Florian Hahn authored Jan 22, 2021

The existing code did not deal with atomic loads correctly. Such loads
are represented as MemoryDefs. Bail out on any MemoryAccess that is not
a MemoryUse.

86991d32

[coro.async] Make sure we process async coroutines · 87b628da

Arnold Schwaighofer authored Jan 21, 2021

Because we were not looking for the llvm.coro.id.async intrinsic in the
early coro pass which triggers follow-up passes we relied on the
llvm.coro.end intrinsic being present. This might not be the case in
functions that end in unreachable code.

Differential Revision: https://reviews.llvm.org/D95144

87b628da

Revert "[NFCI-ish][SimplifyCFG] FoldBranchToCommonDest(): really don't deal with uncond branches" · 85e7578c

Roman Lebedev authored Jan 22, 2021

Does not build in XCode:
http://green.lab.llvm.org/green/job/clang-stage1-RA/17963/consoleFull#-1704658317a1ca8a51-895e-46c6-af87-ce24fa4cd561

This reverts commit aabed371.

85e7578c

[InstCombine] Fold `(~x) | y` --> `~(x & (~y))` iff it is free to do so · d1a6f92f

Roman Lebedev authored Jan 22, 2021

Iff we know we can get rid of the inversions in the new pattern,
we can thus get rid of the inversion in the old pattern,
this decreasing instruction count.

Note that we could position this transformation as just hoisting
of the `not` (still, iff y is freely negatible), but the test changes
show a number of regressions, so let's not do that.

d1a6f92f

[InstCombine] Fold `(~x) & y` --> `~(x | (~y))` iff it is free to do so · 79b0d21c

Roman Lebedev authored Jan 22, 2021

Iff we know we can get rid of the inversions in the new pattern,
we can thus get rid of the inversion in the old pattern,
this decreasing instruction count.

79b0d21c

[NFC][InstCombine] Extract freelyInvertAllUsersOf() out of canonicalizeICmpPredicate() · 4ed0d8f2
Roman Lebedev authored Jan 22, 2021
```
I'd like to use it in an upcoming fold.
```
4ed0d8f2