Commits · 8138487468e22cf8fa1a86816a1e3247b8010760 · Lorenzo Albano / LLVM bpEVL

May 21, 2020

[BrachProbablityInfo] Set edge probabilities at once and fix calcMetadataWeights() · 81384874

Yevgeny Rouban authored May 21, 2020

Hide the method that allows setting probability for particular edge
and introduce a public method that sets probabilities for all
outgoing edges at once.
Setting individual edge probability is error prone. More over it is
difficult to check that the total probability is 1.0 because there is
no easy way to know when the user finished setting all
the probabilities.

Related bug is fixed in BranchProbabilityInfo::calcMetadataWeights().
Changing unreachable branch probabilities to raw(1) and distributing
the rest (oldProbability - raw(1)) over the reachable branches could
introduce total probability inaccuracy bigger than 1/numOfBranches.

Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79396

81384874

Add CanonicalizeFreezeInLoops pass · d9a4a244

Juneyoung Lee authored May 08, 2020

Summary:
If an induction variable is frozen and used, SCEV yields imprecise result
because it doesn't say anything about frozen variables.

Due to this reason, performance degradation happened after
https://reviews.llvm.org/D76483 is merged, causing
SCEV yield imprecise result and preventing LSR to optimize a loop.

The suggested solution here is to add a pass which canonicalizes frozen variables
inside a loop. To be specific, it pushes freezes out of the loop by freezing
the initial value and step values instead & dropping nsw/nuw flags from instructions used by freeze.
This solution was also mentioned at https://reviews.llvm.org/D70623 .

Reviewers: spatel, efriedma, lebedev.ri, fhahn, jdoerfert

Reviewed By: fhahn

Subscribers: nikic, mgorny, hiraditya, javed.absar, llvm-commits, sanwou01, nlopes

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77523

d9a4a244

Make Value::getPointerAlignment() return an Align, not a MaybeAlign. · f26bdb53

Eli Friedman authored May 16, 2020

If we don't know anything about the alignment of a pointer, Align(1) is
still correct: all pointers are at least 1-byte aligned.

Included in this patch is a bugfix for an issue discovered during this
cleanup: pointers with "dereferenceable" attributes/metadata were
assumed to be aligned according to the type of the pointer.  This
wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
stop making this assumption. Frontends may need to be updated.  I
updated clang's handling of C++ references, and added a release note for
this.

Differential Revision: https://reviews.llvm.org/D80072

f26bdb53

May 20, 2020

[InstCombine] `insertelement` is negatible if both sources are negatible · 55430f53

Roman Lebedev authored May 20, 2020

----------------------------------------
define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) {
%0:
  %t0 = sub <2 x i4> { 0, 0 }, %src
  %t1 = sub i4 0, %a
  %t2 = insertelement <2 x i4> %t0, i4 %t1, i32 %x
  %t3 = sub <2 x i4> %b, %t2
  ret <2 x i4> %t3
}
=>
define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) {
%0:
  %t2.neg = insertelement <2 x i4> %src, i4 %a, i32 %x
  %t3 = add <2 x i4> %t2.neg, %b
  ret <2 x i4> %t3
}
Transformation seems to be correct!

55430f53

[InstCombine] Negator: `extractelement` is negatible if src is negatible · ebed96fd

Roman Lebedev authored May 20, 2020

----------------------------------------
define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) {
%0:
  %t0 = sub <2 x i4> { 0, 0 }, %x
  call void @use_v2i4(<2 x i4> %t0)
  %t1 = extractelement <2 x i4> %t0, i32 %y
  %t2 = sub i4 %z, %t1
  ret i4 %t2
}
=>
define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) {
%0:
  %t0 = sub <2 x i4> { 0, 0 }, %x
  call void @use_v2i4(<2 x i4> %t0)
  %t1.neg = extractelement <2 x i4> %x, i32 %y
  %t2 = add i4 %t1.neg, %z
  ret i4 %t2
}
Transformation seems to be correct!

ebed96fd

Reland [X86] Codegen for preallocated · 8a887556

Arthur Eubanks authored Mar 16, 2020

See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689

8a887556

Revert "[X86] Codegen for preallocated" · b8cbff51
Arthur Eubanks authored May 20, 2020
```
This reverts commit 810567dc.

Some tests are unexpectedly passing
```
b8cbff51

[X86] Codegen for preallocated · 810567dc

Arthur Eubanks authored Mar 16, 2020

See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689

810567dc

[NFCI][CostModel] Refactor getIntrinsicInstrCost · 8cc911fa

Sam Parker authored May 20, 2020

Combine the two API calls into one by introducing a structure to hold
the relevant data. This has the added benefit of moving the boiler
plate code for arguments and flags, into the constructors. This is
intended to be a non-functional change, but the complicated web of
logic involved here makes it very hard to guarantee.

Differential Revision: https://reviews.llvm.org/D79941

8cc911fa

[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC). · bcbd26bf

Florian Hahn authored May 20, 2020

SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.

This patch was originally committed as b8a3c34e, but broke the
modules build, as LoopAccessAnalysis was using the Expander.

The code-gen part of LAA was moved to lib/Transforms recently, so this
patch can be landed again.

Reviewers: sanjoy.google, efriedma, reames

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D71537

bcbd26bf

May 19, 2020

Give helpers internal linkage. NFC. · 350dadaa
Benjamin Kramer authored May 19, 2020

350dadaa

[LVI] Don't require DominatorTree in LVI (NFC) · 5fae613a

Nikita Popov authored Mar 25, 2020

After D76797 the dominator tree is no longer used in LVI, so we
can remove it as a pass dependency, and also get rid of the
dominator tree enabling/disabling logic in JumpThreading.

Apart from cleaning up the code, this also clarifies LVI
cache consistency, in that the LVI cache can no longer
depend on whether the DT was or wasn't enabled due to
pending DT updates at any given time.

Differential Revision: https://reviews.llvm.org/D76985

5fae613a

[LV] Remove duplicated return stmt (NFC). · 7cefd1b4
Florian Hahn authored May 19, 2020

7cefd1b4

[InstCombine] Remove hasNoInfs check for pow(C,y) -> exp2(log2(C)*y) · 9bc989a4

Jay Foad authored May 05, 2020

We already check hasNoNaNs and that x is finite and strictly positive.
That only leaves the following special cases (taken from the Linux man
page for pow):

If x is +1, the result is 1.0 (even if y is a NaN).
If the absolute value of x is less than 1, and y is negative infinity, the result is positive infinity.
If the absolute value of x is greater than 1, and y is negative infinity, the result is +0.
If the absolute value of x is less than 1, and y is positive infinity, the result is +0.
If the absolute value of x is greater than 1, and y is positive infinity, the result is positive infinity.

The first case is handled elsewhere, and this transformation preserves
all the others, so there is no need to limit it to hasNoInfs.

Differential Revision: https://reviews.llvm.org/D79409

9bc989a4

[VPlan] Fix comment for User in VPWidenSelectRecipe (NFC). · cff9399f
Florian Hahn authored May 19, 2020
```
The comment was referring the arguments of the call, but the recipe
widens a select.
```
cff9399f

[VPlan] Add & use VPValue operands for VPReplicateRecipe (NFC). · f828d75b

Florian Hahn authored May 19, 2020

This patch adds VPValue version of the instruction operands to
VPReplicateRecipe and uses them during code-generation.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D80114

f828d75b

[VPlan] Remove unique_ptr from VPBranchOnRecipeMask (NFC). · 66ad1074

Florian Hahn authored May 19, 2020

We can remove a dynamic memory allocation, by checking the number of
operands: no operands = all true, 1 operand = mask.

Reviewers: Ayal, gilr, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D80110

66ad1074

[LoopSimplify] don't separate nested loops with convergent calls · 6c848843

Sameer Sahasrabuddhe authored May 19, 2020

Summary:
When a loop has multiple backedges, loop simplification attempts to
separate them out into nested loops. This results in incorrect control
flow in the presence of some functions like a GPU barrier. This change
skips the transformation when such "convergent" function calls are
present in the loop body.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D80078

6c848843

[NFC] Replace MaybeAlign with Align in TargetTransformInfo. · 27b4e693
Eli Friedman authored May 18, 2020

27b4e693

[LV] Fix FoldTail under user VF and UF · 682e7396

Ayal Zaks authored May 17, 2020

LV considers an internally computed MaxVF to decide if a constant trip-count is
a multiple of any subsequently chosen VF, and conclude that no scalar remainder
iterations (tail) will be left for Fold Tail to handle. If an external VF is
provided via -force-vector-width, it must be considered instead of the internal
MaxVF.
If an external UF is provided via -force-vector-interleave, it too must be
considered in addition to MaxVF or user VF.

Fixes PR45679.

Differential Revision: https://reviews.llvm.org/D80085

682e7396

May 18, 2020

Fix several places that were calling verifyFunction or verifyModule without... · c9f63297

Craig Topper authored May 18, 2020

Fix several places that were calling verifyFunction or verifyModule without checking the return value.

verifyFunction/verifyModule don't assert or error internally. They
also don't print anything if you don't pass a raw_ostream to them.
So the caller needs to check the result and ideally pass a stream
to get the messages. Otherwise they're just really expensive no-ops.

I've filed PR45965 for another instance in SLPVectorizer
that causes a lit test failure.

Differential Revision: https://reviews.llvm.org/D80106

c9f63297

[Sanitizers] Use getParamByValType() (NFC) · 47a0e9f4
Nikita Popov authored May 18, 2020
```
Instead of fetching the pointer element type.
```
47a0e9f4

LoadStoreVectorizer: Match nested adds to prove vectorization is safe · 63081dc6

Volkan Keles authored May 18, 2020

If both OpA and OpB is an add with NSW/NUW and with the same LHS operand,
we can guarantee that the transformation is safe if we can prove that OpA
won't overflow when IdxDiff added to the RHS of OpA.

Review: https://reviews.llvm.org/D79817

63081dc6

[Loads] Require Align in isSafeToLoadUnconditionally() (NFC) · 736db2f7

Nikita Popov authored May 18, 2020

Now that load/store have required alignment, accept Align here.
This also avoids uses of getPointerElementType(), which is
incompatible with opaque pointers.

736db2f7

[llvm][NFC] Fixed non-compliant style in InlineAdvisor.h · 691980eb
Mircea Trofin authored May 18, 2020
```
Changed OnPass{Entry|Exit} -> onPass{Entry|Exit}

Also fixed a small typo in a comment.
```
691980eb

[Local] Do not ignore zexts in salvageDebugInfo, PR45923 · 623b2542

Vedant Kumar authored May 15, 2020

Summary:
When salvaging a dead zext instruction, append a convert operation to
the DIExpressions of the debug uses of the instruction, to prevent the
salvaged value from being sign-extended.

I confirmed that lldb prints out the correct unsigned result for "f" in
the example from PR45923 with this changed applied.

rdar://63246143

Reviewers: aprantl, jmorse, chrisjackson, davide

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80034

623b2542

[InstCombine][NFC] Simplify check in sinking · e47c101e

Max Kazantsev authored May 18, 2020

We just need to check that the only predecessor of user parent is
BB, we don't need to iterate through BB's successors for it.

e47c101e

May 17, 2020

ValueMapper does not preserve inline assembly dialect when remapping the type · 5f65faef

Craig Topper authored May 17, 2020

Bug report: https://bugs.llvm.org/show_bug.cgi?id=45291

Patch by Tomasz Miąsko

Differential Revision: https://reviews.llvm.org/D80066

5f65faef

[Alignment] Remove unnecessary getValueOrABITypeAlignment calls (NFC) · 52e98f62

Nikita Popov authored May 17, 2020

Now that load/store alignment is required, we no longer need most
of them. Also switch the getLoadStoreAlignment() helper to return
Align instead of MaybeAlign.

52e98f62

[InstCombine] visitMaskedMerge(): when unfolding, sanitize undef constants (PR45955) · fde8eb00

Roman Lebedev authored May 17, 2020

We can't leave undef vector element constants as-is,
it is a miscompile, so we need to sanitize them.

We have two vectors (C and ~C):
* We can't replace undef with 0 in both of them
* We can't replace undef with 0 in only one of them
* We could replace undef with -1 in both of them
* We could replace undef with -1 in only one(!) of them
* We could replace undef with -1 in one and 0 in another one of them.

Therefore, it seems best to go with the last option, since otherwise
we'd loose knowledge that C and ~C have no common bits set,
which seems more important than preserving partial undef knowledge.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45955

fde8eb00

[InstCombine] improve analysis of FP->int->FP to eliminate fpextend · bfd51216

Sanjay Patel authored May 17, 2020

This was originally in D79116.
Converting from a narrow-enough FP source value to integer and
back to FP guarantees that the conversion to FP is exact because
of UB/poison-on-overflow.

This was suggested in PR36617:
https://bugs.llvm.org/show_bug.cgi?id=36617#c19

bfd51216

May 16, 2020

AllocaInst should store Align instead of MaybeAlign. · 4f04db4b

Eli Friedman authored May 15, 2020

Along the lines of D77454 and D79968. Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.

Differential Revision: https://reviews.llvm.org/D80044

4f04db4b

[VectorCombine] forward walk through instructions to improve chaining of transforms · 81e9ede3

Sanjay Patel authored May 16, 2020

This is split off from D79799 - where I was proposing to fully iterate
over a function until there are no more transforms. I suspect we are
still going to want to do something like that eventually.

But we can achieve the same gains much more efficiently on the current
set of regression tests just by reversing the order that we visit the
instructions.

This may also reduce the motivation for D79078, but we are still not
getting the optimal pattern for a reduction.

81e9ede3

[InstCombine] Clean up alignment handling (NFC) · 604f4497
Nikita Popov authored May 16, 2020
```
Now that load/store alignment is required, we can simplify code
in some places.
```
604f4497

May 15, 2020

Revert "Revert "[llvm][NFC] Cleanup uses of std::function in Inlining-related APIs"" · 08e2386d

Mircea Trofin authored May 14, 2020

This reverts commit 454de99a.

The problem was that one of the ctor arguments of CallAnalyzer was left
to be const std::function<>&. A function_ref was passed for it, and then
the ctor stored the value in a function_ref field. So a std::function<>
would be created as a temporary, and not survive past the ctor
invocation, while the field would.

Tested locally by following https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

Original Differential Revision: https://reviews.llvm.org/D79917

08e2386d

StoreInst should store Align, not MaybeAlign · 11aa3707

Eli Friedman authored May 14, 2020

This is D77454, except for stores. All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.

Differential Revision: https://reviews.llvm.org/D79968

11aa3707

[NFC] Deduplicate comment in PromoteMemoryToRegister.cpp · 03c44c75

Scott Linder authored May 15, 2020

This has been duplicated since before
2372a193, but that commit has it
appearing twice in the space of 10 lines of the same function body. It
could also be hoisted up to the point just after where the last
special-case is considered, but I want to keep the intent of the
original authors.

Committed as obvious without a review.

03c44c75

[IR] Convert null-pointer-is-valid into an enum attribute · f89f7da9

Nikita Popov authored Apr 25, 2020

The "null-pointer-is-valid" attribute needs to be checked by many
pointer-related combines. To make the check more efficient, convert
it from a string into an enum attribute.

In the future, this attribute may be replaced with data layout
properties.

Differential Revision: https://reviews.llvm.org/D78862

f89f7da9

[VectorUtils] Expose vector-function-abi-variant mangling as a utility. · 7cc3769a

Anna Thomas authored May 13, 2020

Summary:
This change exposes the vector name mangling with LLVM ISA (used as part
of vector-function-abi-variant) as a utility.
This can then be used by front-ends that add this attribute.
Note that all parameters passed in to the function will be mangled with
the "v" token to identify that they are of of vector type. So, it is the
responsibility of the caller to confirm that all parameters in the
vectorized variant is of vector type.

Added unit test to show vector name mangling.

Reviewed-By: fpetrogalli, simoll

Differential Revision: https://reviews.llvm.org/D79867

7cc3769a

[TSAN] Add option to allow instrumenting reads of reads-before-writes · 151ed6aa

Dmitry Vyukov authored May 15, 2020

Add -tsan-instrument-read-before-write which allows instrumenting reads
of reads-before-writes.

This is required for KCSAN [1], where under certain configurations plain
writes behave differently (e.g. aligned writes up to word size may be
treated as atomic). In order to avoid missing potential data races due
to plain RMW operations ("x++" etc.), we will require instrumenting
reads of reads-before-writes.

[1] https://github.com/google/ktsan/wiki/KCSAN

Author: melver (Marco Elver)
Reviewed-in: https://reviews.llvm.org/D79983

151ed6aa